VARIABILITY OF GLNPO ZOOPLANKTON DATA Variability of Crustacean Zooplankton Data Generated by the Great Lakes National Program Office's Annual Water Quality Survey Richard P. Barbiero DynCorp Science and Engineering Group 1359 West Elmdale Avenue Suite #2 Chicago, Illinois 60660 Prepared for: United States Environmental Protection Agency Great Lakes National Program Office 77 West Jackson Boulevard Chicago, Illinois 60604 Louis Blume, Project Officer September 2003 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA Acknowledgements This report was prepared under the direction of Louis Blume, Project Officer, Great Lakes Na- tional Program Office (EPA Contract No. 68-C-01-091). Assistance with ANOVA calculations was provided by Ken Miller, DynCorp Science and Engineering Group. ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA Summary A Data Quality Objective (DQO) has been developed by the Great Lakes National Program Office (GLNPO) to ensure that data collected from their Water Quality Surveys are of suitable quality to provide decision makers with sufficient certainty to make educated ecological management deci- sions. The current GLNPO DQO states that data quality should be sufficient for there to be an 80% chance of detecting a 20% change, at the 90% confidence level, between current and historical measurements of a variable made in a particular lake during a particular season. This report determines the extent to which zooplankton data comply with the GLNPO DQO, and assesses the relative contribution of different sources of variability to the overall uncertainty of zooplankton data. The most important findings are summarized below: • Data quality of zooplankton data falls far short of the current DQO. In only 3 of 184 cases examined was the DQO criterion met. • Minimum detectable differences for the major taxonomic groups and the most common species were largely between 40 and 190%. • Estimates of cladoceran densities were most variable; estimates of calanoid co- pepod densities were least variable. • It is unclear if the current data quality is sufficient to detect ecologically impor- tant trends. A recent study show that in at least some cases it is (Barbiero and Tuchman, in press). • Relatively little variability is due to analyst error in counting/identification. • About 25% of variability is introduced during the field sampling and/or labo- ratory subsampling stages. • The majority of uncertainty in zooplankton data is due to station-to-station (within basin) variability. Reducing this source of variability would entail in- creasing the number of sampling stations. • The most practical way to reduce variability is to ensure proper functioning/ reading of the flow meter. • Since the variability introduced into the analysis by subsampling in the labora- tory is unknown, a study quantifying this source of uncertainty could point to further means of reducing variability. • An appropriate QC criterion for relative species composition of duplicate labo- ratory analyses, using the PSc index, is 0.92. • An appropriate RPD QC criterion for total organism counts in duplicate labo- ratory analyses is 4%. 4 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA 1 Introduction 1.1 GLNPO water quality survey The Great Lakes National Program Office (GLNPO) of the U.S. EPA has been involved in regular surveillance monitoring of the open waters of the Laurentian Great Lakes since 1983. This surveillance monitoring is meant to satisfy the provisions of the Great Lakes Water Quality Agreement (International Joint Commission 1978), which calls for periodic monitoring of the lakes to evaluate the effec- tiveness of pollution control/reduction strate- gies in the Great Lakes, recognize emerging problems, and identify the need for new or re- vised strategies and further research. Accord- ing to GLNPO (2003), the water quality sur- veys have been specifically designed to: • Focus on key physical, chemical, and biological indicators of lake health • Evaluate the health of each lake under different conditions (stratified and un- stratified) • Allow for real-time detection of signifi- cant changes in water quality, as indi- cated by significant changes in one or more parameters • Provide data that can be compared from year to year • Provide data to support decisions re- garding the need for further study or new pollution control strategies In order to ensure that data collected from GLNPO's water quality surveys fulfill these requirements, a data quality objective (DQO) has been developed to be applied to all water quality survey data. Management of data qual- ity is an important aspect of the larger mission of the water quality surveys, and requires an understanding both of the overall magnitude of variability, and of the relative contributions of individual components of sample collection and analysis to total variability. More funda- mentally, it is also necessary that the DQO be sufficiently explicit to enable its unambiguous application to water quality survey data, and that it be appropriate to the type of data col- lected by the water quality survey. Recognition of the importance of open wa- ter planktonic communities in the overall as- sessment of ecosystem health led to the inclu- sion of sampling for zooplankton communi- ties at the inception of the monitoring pro- gram. However, data generated from the sam- pling of biological communities poses special challenges for the application of the DQO and for assessments of variability. DQOs are typi- cally developed in relation to chemical vari- ables, which are characteristically univariate, unlike biological community data, which are multivariate. It is important, therefore, to as- sess both the extent to which the DQO is ap- plicable to biological data, and whether or not that data satisfies the DQO. 1.2 Objectives of study The overall purpose of the present study was to provide an assessment of the variability of data generated by GLNPO's zooplankton monitoring program. The specific goals of the study were several fold: 1. To determine the minimum detectable differences under the current sampling regime; 2. To determine if the current level of ef- fort satisfies the GLNPO DQO; 3. To determine the relative contribution to overall variability of different stages of sample collection and analysis; 4. To determine appropriate analysis crite- ria for duplicate laboratory (QC) analy- ses. In addition, the applicability of the current DQO to zooplankton data is discussed. ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA 2 GLNPO's Data Quality Objectives 2.1 Current ambiguities The assessment of lake health using data gen- erated from the water quality survey requires that sufficient data quality be obtained to per- mit detection of 'significant' changes in the variables under consideration. For the pur- poses of the water quality surveys, GLNPO has defined a significant change as a 20% dif- ference between current and 'historical' meas- urements, made for a particular variable in a particular lake during a particular season. The DQO for GLNPO's water quality survey is stated as the ability to "collect measurements that will yield an 80% chance of detecting a change of 20% or more within a particular lake and season, at the 90% confidence level" (p. 15; GLNPO 2003). This formula- tion of the DQO, however, contains several ambiguities, particularly as it relates to multi- variate data such as that generated from zoo- plankton analyses. First, as currently stated the DQO does not indicate what the detection target of a 20% change is in relation to. Else- where in the same document, both a compari- son to 'historical' values (p. 15; GLNPO 2003) and comparisons between two years (p. 27; GLNPO 2003) are referred to. As pointed out elsewhere (Barbiero, 2003), the detection of a change between a given season's data and 'historical' values can be variously interpreted to mean a change in relation to the previous year's data, a change in relation to a pooling of all previous years' data, or a change in relation to any previous year's data. An additional possible interpretation of the DQO would be to permit the detection of a trend in historical data, although this would not seem to be com- pletely consistent with its current formulation. The DQO also appears to be at variance with the basic statistical design of the water quality surveys, in that the target change is de- fined in the DQO on a lake-wide basis, while the statistical design of the survey is based on replication at the level of two or three homo- geneous basins within each lake (p. 27, GLNPO 2003). This can be accommodated for by employing a stratified statistical design in assessing changes in variables, i.e., by first computing the values of each variable on a ba- sin-wide basis, and then combining those esti- mates in proportion to how much of the lake each basin accounts for to arrive at a lake-wide estimate. Under this scenario, variance would also have to be calculated proportionately. In- terpreting the DQO in this way, however, as- sumes that changes can only take place on a lake-wide basis. In a case where the timing and/or magnitude of change differed from ba- sin to basin, as for instance might be expected in Lake Erie where differences in morphome- try result in vast differences in the chemical and biological characteristics of the three ba- sins, limiting the detection of changes to a lake-wide basis could obscure changes taking place only within a given basin. While it is not within the scope of this re- port to clarify the ambiguities of the current GLNPO DQO, in order to apply it to the zooplankton data, some assumptions had to be made concerning its interpretation. For the purposes of this report, the DQO was as- sumed to denote the requirement of an 80% chance of detecting of a 20% change between two years within a given basin for a particular sea- son at the 90% confidence level. 2.2 Application to multivariate data Resolving the ambiguities in the current for- mulation of the DQO is theoretically possible. More fundamental difficulties exist, however, in the application of the DQO to data gener- ated by the zooplankton sampling program. As with all data generated by the biological monitoring program, zooplankton data are multivariate. Each sample, rather than pro- ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA ducing a single value associated with a single variate, will produce values associated with a varying number of variates. Variates here cor- respond to the different species identified in each sample, and values associated with those variates correspond both to the densities of those species, and to their biomass. The vari- ates produced by a sample will not necessarily be consistent within groups of replicates, nor will they even necessarily be the same between replicate analyses of the same sample. Theo- retically, then, the DQO could apply to each individual variate (i.e., species) identified within a sample. A given sample could there- fore be called upon to satisfy as many DQOs as there are species within that sample, which in the case of zooplankton could be expected to vary between several and several dozen. In addition, it might be of interest to assess changes in broader taxonomic categories of organisms, for example to assess changes at the taxonomic level of order or suborder (e.g., cladocerans, calanoid copepods, etc.), or to assess changes in various functional groups (e. g., grazers, predators, etc.), or indeed to track changes in total zooplankton density or bio- mass. One problem, therefore, arising from the multivariate nature of zooplankton data is de- ciding upon the variate(s) of interest. It is likely that changes in the populations of some species, or certain groupings of species, are of little inherent ecological interest, and therefore do not need to be subject to the DQO. Also, the statistical difficulties associated with esti- mating the abundances of species that typically occur in very small numbers might preclude their ability to conform to the DQO. A more fundamental problem exists, how- ever, if community-level attributes of the zoo- plankton data are of interest. Examination of overall community structure often reveals changes that are not apparent from examina- tion of individual species (Yan et al., 1996), and could provide a more relevant measure of ecosystem health. In this instance, defining an appropriate metric, and quantifying the vari- ability associated with that metric, becomes highly problematic. Changes in community structure are typically quantified using multi- variate techniques, but metrics derived from such techniques are often not easily converti- ble into a single number, nor are there univer- sally accepted methods of quantifying the vari- ance of such metrics, and they thus would not be easily amenable to assessment in terms of the current DQO. There are currently no guidelines in place to enable the application of the GLNPO DQO to multivariate community level data. 3 Zooplankton Program 3.1 Overview GLNPO's regular surveillance monitoring of the open waters of the Laurentian Great Lakes began in 1983. Initially, only the open waters of Lakes Michigan, Huron and Erie were in- cluded in GLNPO's monitoring program. In 1986, monitoring of Lake Ontario was added, and in 1992, Lake Superior was included. Sampling protocols have undergone some changes since the beginning of the program. In 1983 and 1984, two vertical zooplankton tows were taken at each site with a 63-jam mesh net: one from 2 m above the bottom to the surface, and a second from 20 m to the surface (Makarewicz, 1987; Makarewicz, 1988). In 1985, the deeper tow was apparently discontinued (Makarewicz and Bertram, 1991), leaving just the 20-m tow. Concerns about the representativeness of samples collected from just the upper 20 m of the water column led to a further change in the zooplankton sampling protocol. Starting in the summer of 1997, a second tow was added to the sampling 7 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA regime. This tow was taken from a depth of 100 m, or 2 m from the bottom, whichever was shallower. Unlike previous deep tows, the 100-m tows were taken using a net with a lar- ger mesh size (153-um) to prevent clogging and to reduce the pressure wave created by the net during sampling. Also, the time of day at which the tows were taken was recorded from 1996 on, something which had not been done earlier. There are two main consequences of taking zooplankton tows from relatively shallow depths. In species that undergo diurnal verti- cal migration, 20-m tows taken during the day, when such species are typically below the epilimnion, can result in an underestimation of abundances. This would lead both to unrepre- sentative samples, and also to an increase in both inter- and intra-annual variability. If rep- licate sites are sampled at different times of day during a cruise, as is often the case, intra- annual variability would increase, while if sites are visited at different times of day from year to year, as is also likely, this would result in an increase in apparent inter-annual variability. Secondly, populations of deeper-living zoo- plankton that rarely migrate above 20 m would be consistently underestimated in 20-m tows, whether taken during the day or at night. Because of the problems inherent in the inter- pretation of shallow, 63-um mesh tows, em- phasis in this report will be on the deeper, 153-|J,m mesh tows. 3.2 Field methods Currently, two sampling tows are performed at each station. The first tow is 20 meters below water surface using a 63-um mesh net. The second tow is a 'full' water column tow, to 2 meters above the bottom of the lake or 100 m, whichever is less, using a 153-um mesh net. If the station depth is less than 20 m, both tows are taken from one meter above the bottom. Tows are taken with a 0.5-m diameter conical net (D:L=1:3) equipped with a flowmeter. Once on station, the biology technician resets the flowmeter dials to zero, and has the winch operator lower the net so the rim of the net is at the surface of water. The net is then low- ered to the appropriate depth as indicated by a winch meter on deck, and raised it at a con- stant speed (at or close to 0.5 meter/second) until the rim of the net is approximately eye- level. Upon retrieval the flowmeter meters are read and the net is rinsed with a hose from the outside to wash all of the organisms off of the net cloth inside and into the sample bucket. The sample is concentrated into the sample bucket, which is then detached from the net and its contents rinsed and poured three times into a pre-labeled 500-mL sample bottle. The organisms are then narcotized with soda water and preserved with sucrose formalin solution. Triplicate tows of each depth are taken at the master stations. 3.3 Laboratory methods Microcrustacea are examined in four stratified aliquots under a stereoscopic microscope. The sample is subsampled using a Folsom plank- ton splitter, with half of each split set aside, and the other half returned to the splitter to be split again. Successive splits are made until the last 2 subsamples contain between 200 and 400 microcrustaceans each (not including nau- plii). In total, four subsamples are examined and enumerated. Each is removed, in turn, with a condensing tube and placed in a circular counting chamber. All microcrustaceans within each subsample are identified and enu- merated under a stereozoom microscope. The four subsamples are: the final two, most dilute subsamples which contain 200-400 organisms, in which all microcrustaceans are examined and enumerated; a third subsample equal in fraction to the sum of the first two subsam- ples, which is examined for subdominant taxa (taxa enumerated less than 40 times in the first two subsamples combined); and a fourth sub- 8 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA sample equal in fraction to the sum of the first three from which rare taxa are enumerated. In general, ten percent of all samples analyzed are analyzed in duplicate by a second analyst. If a given lake/cruise has less than 10 samples, at least one sample from that data set is also ana- lyzed in duplicate. Duplicate analyses are per- formed after subsamples are placed into the counting chamber, and thus quantify variation associated with enumeration and identifica- tion, but not with subsampling. 4 Sources of Variability 4.1 Levels of replication The statistical design of the zooplankton pro- gram follows that of the broader water quality monitoring program, with each lake divided into statistically homogeneous basins (Fig. 1; Table 1). Within each basin stations function as replicates, and provide an indication of large-scale spatial heterogeneity. Each basin contains a master station, usually located at the deepest point in the basin, at which triplicate zooplankton tows are taken. These tows function as field replicates and are meant to quantify the variability within' each station as- sociated with sample collection, including vari- ability associated with lowering and raising the net, the angle and actual (as opposed to nomi- nal) depth of the tow, the functioning/reading of the flow meter, and the washing of the net bucket contents into the collection bottle. These field replicates also capture the variabil- ity due to smaller scale zooplankton patchi- ness. In the laboratory each sample is subsampled, and subsamples from successive dilutions are counted to ensure accurate estimation of rarer species. There is no replication at this stage, so there is no way to estimate the amount of error introduced into the analysis by sub- sampling. The entire contents of each of four sub-samples are placed successively into the microscope chamber and identified and enu- merated by the analyst. A second analyst pro- Table 1. Assignment of GLNPO water quality survey stations to homogeneous basins with the five Laurentian Great Lakes. Lake Michigan Huron Erie Ontario Superior Basin southern lake central lake northern lake northern lake central lake southern lake western lake central lake eastern lake western lake eastern lake western lake central lake eastern lake Stations MI 11, MI 17, MI 18, MI 19 MI 23 MI 27, MI 32, MI 34 MI 40. MI 41. MI 47 HU 45, HU 48, HU 53, HU 54, HU 61 HU 32, HU 37, HU 38 HU 06. HU 09. HU 12. HU 15. HU 27. HU 93 ER 58, ER 59, ER 60, ER 61, ER 91, ER 92 ER 30, ER 31, ER 32, ER 36, ER 37, ER 38, ER 42, ER 43, ER 73, ER 78 ER 09. ER 10. ER 15. ER 63 ON 12, ON 25, ON 33, ON 41 ON 49. ON 55. ON 60. ON 63 SU 15, SU 16, SU 17, SU 18, SU 19 SU 06, SU 07, SU 08, SU 09, SU 10, SU 11, SU 12, SU 13, SU 14 SU 01, SU 02, SU 03, SU 04, SU 05 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA Fig. 1. Locations of GLNPO's water quality survey (WQS) sampling stations within homogeneous basins, as defined by 2003 quality assurance program plan. Master stations indicated in red. vides duplicate counts and identifications of 10% of the samples. Duplicate analyses cap- ture variability associated with species identifi- cations and with the counting of animals within the chambers. These duplicate analyses are conducted after the subsamples are placed into the counting chambers, so as noted, no estimate of subsampling variability is possible. A summary of the main sources of variation is given in Table 2, along with the measures cur- rently in place to estimate their magnitude. 4.2 Compliance with DQO Assessing the degree to which the current sampling effort satisfies the DQO required that some assumptions be made in order to resolve the ambiguities in the DQO pointed out in Section 2.1. As stated earlier, it was as- sumed that the DQO required data of ade- quate quality to permit an 80% chance of de- tecting a 20% change in a given variable be- tween two years within a given basin and season with 90% confidence. Basins were defined according to GLNPO (2003) as listed in Table 1. Assessment of such a change can be accom- plished with a two sample /-test. Therefore, determination of the minimum detectable dif- ference currently permitted by the data can be computed using the following formula: where: Sp2 = sample estimate of pooled population variance; and 8 = the minimum detectable difference. It was also necessary to make some assump- tions about which variates should be subject to the DQO. In this report, the following ma- jor taxonomic groupings were assessed: total cladocerans, total adult cyclopoids, total 10 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA Table 2. Sources of variability in zooplankton analysis. Source of Variability Current Measure Within-Basin Spatial Heterogeneity Sample Collection, Small-Scale Patchiness Sub-Sampling Laboratory Analysis Replicate Stations Within Basin Replicate Field Tows at Master Stations None Duplicate QC Counts on 10% of Samples cyclopoid copepodites, total adult calanoids, total calanoid copepodites and total crusta- ceans excluding nauplii. In all cases, density rather than biomass was used. Groups consti- tuting less than 20% of total density for any basin/season combination were excluded from the analysis. In addition, minimum de- tectable differences were calculated for several of the most common species. These included the cladocerans Daphnia galeata mendotae and Bosmina longirostris, the cyclopoid copepod Dia- cyclops thomasi, and the calanoid copepods Lep- todiaptomus minutus and Leptodiaptomus ashlandi. Only data generated from the deeper, 153-um mesh tows were assessed. Estimates of vari- ance were calculated from 1998 data using only regular field samples. 4.3 Sources of variability There are problems posed in trying to assess the variability of multivariate data. Conven- tional indices of dispersion, e.g., standard de- viation, interquartile range, etc., are strictly speaking not applicable to multivariate data, and therefore if used must be applied either to broad summations of the data (e.g., total num- bers of crustaceans, total numbers of cladocer- ans, etc.), or must be calculated separately for each individual variate (i.e., each taxonomic group). This results in a multitude of esti- mates of variability for each sample, the exact number of which depends upon the number of species encountered in that sample. The collective interpretation for a given sample of these estimates of variability is problematic. Alternatively, recourse can be made to mul- tivariate techniques. A number of different numerical techniques have been developed in ecology to quantify degrees of identity be- tween pairs or groups of samples which treat this multivariate data as a whole. Among these techniques, measures of similarity seek to provide objective measures of the degree of identity in the structure of two communities. Typically these indices involve summing up the differences in the abundances or bio- volumes of individual species between two samples/sites, which reduces these differences to a single number scaled between 0 and 1. The inverse of these measures, i.e., dissimilar- ity, can also be computed to quantify the dis- tance of two samples from each other. Where a number of samples are assumed to represent the same 'population' (used here in a statistical sense), then the calculation of a matrix of similarity values between these samples can be used to represent the degree of variability among those replicate samples. While this ap- proach has the dual advantage of treating mul- tivariate data in its entirety, and of reducing comparisons between samples to a single number, the drawbacks are that these tech- niques, when used as measures of variability, are not strictly comparable with more standard methods, and furthermore, the characteristics (e.g., expected distributions) of the numbers generated by these comparisons are not fully defined, as is the case with, for example, esti- mates of parametric variance. Also, when more than two samples are compared, the re- sulting similarity comparisons produce a ma- trix of values rather than a single value, and 11 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA thus the necessity of reducing these to a single number remains. Unlike the multiplicity of variances produced by analyzing each variate separately by more conventional means, though, the values of a similarity matrix all es- timate the same thing, namely the degree of dispersion amongst a set of replicates. In spite of these drawbacks, the benefits provided by a technique capable of fully comparing sets of multivariate data recommend its use in the present context. Here, both approaches (i.e., calculation of parametric variance on individual variates and comparison of samples using similarity indi- ces) were used to assess the variability of GLNPO's zooplankton data. While these two methods are complementary, their results are also largely incommensurate, quantitatively, and this should be borne in mind when inter- preting the results presented here. 4.3.1 ANOVA analyses To assess the relative contributions of the various stages of sample collection and analy- sis outlined in Table 2 to the overall variability of zooplankton data, analyses of variance were conducted. The sample analysis scheme of the zooplankton program can be thought of as being comprised of a number of hierarchical stages. Within each lake, basins have been de- fined by GLNPO to be statistically homoge- neous regions. Within basins, stations serve as replicates. Multiple tows, performed at master stations, in turn serve as subsamples within those stations. Duplicate laboratory analyses, finally, serve as 'subsamples' of sample analy- sis. The variance associated with each of these hierarchical levels can be estimated using a multi-factor nested analysis of variance (ANOVA). The theoretical factor structure of the GLNPO zooplankton data is illustrated in Fig. 2. In fact, though, the zooplankton data pre- sents an extremely unbalanced statistical de- sign. Field replicates are only nested within one station per basin (the master station), and duplicate laboratory analyses are conducted, on average, on only one sample per lake, and are rarely nested within field replicates. This both complicates the calculation of the ANOVA, and can also lead to anomalous re- sults. Specifically, an unusually high degree of variability in a single pair of analyses at one level of replication (e.g., laboratory duplicate analysis) can mask the variability in the next higher level of subsampling (e.g., field replica- tion). ANOVA analyses were carried out on six variates: total adult calanoids, total calanoid copepodites, total adult cyclopoid copepods, total cyclopoid copepodites, total cladocerans, and total crustaceans, exclusive of nauplii. Only data generated from the deeper, 153-um mesh nets were used. Data were natural log transformed prior to analysis; where zeros oc- curred in the data, 1 was added to all values prior to transformation. Separate analyses were conducted for the two years examined (1998, 1999) and the two seasons (spring, BASIN 1 Sitel FD 1 x| FD2 1 FD3 1 Site 2 FD 1 x| FD2 1 FD3 1 SiteS FD 1 x| FD2 x| FD3 x| BASIN 2 Sitel FD 1 XX FD2 1 FD3 1 Site 2 FD 1 x| FD2 x| FD3 x| SiteS FD 1 x| FD2 1 FD3 1 Fig. 2. Illustration of factor structure for hierarchical analysis of variance of GLNPO zooplankton data for hypothetical two basin lake. FD indicates field replicate; cells for laboratory replicates are 12 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA summer). Cladocerans were not analyzed in spring samples due to low numbers. In all a total of 22 analyses were performed. Sources of variation included between basin variance, between station within basin), be- tween field replicate within station) and be- tween laboratory duplicate (within field repli- cate) variance. The structure of the analysis assumed that the amount of variance contrib- uted by each factor was similar for all levels of that factor, so that, for instance, between sta- tion variability was similar within all basins. However, it was noted that the variability be- tween stations in the western and central ba- sins of Lake Erie was extremely high. In order that this not exert an undue influence, these two basins were removed from the analysis. The magnitude of the different variance com- ponents was computed as a percentage of the total variance minus between basin variance, i. e., variance components were calculated as a percentage of within basin variance. This approach can provide information about the amount of variability involved in es- timating densities of major taxonomic groups. However, it cannot address variability in esti- mates of species composition. This distinc- tion should be borne in mind when interpret- ing the results. If the species composition of the zooplankton community within a basin is consistent from site to site, but the total num- bers of organisms vary widely, an ANOVA will indicate high levels of variability. On the other hand, if the species composition of the community is vastly different from site to site, but densities of individuals are similar within each broad taxonomic category, then an ANOVA will indicate low variability. 4.3.2 Similarity analyses As indicated earlier, special problems are posed in trying to quantify the variability of multivariate data. While the data can be sum- marized by broad taxonomic category into a smaller number of individual variates, and variance calculated using univariate methods as outlined above, this approach will not be able to detect compositional shifts at lower taxonomic levels, and thus cannot give a true picture of variability at the community level. It is desirable, instead, to use a measure of variability that can simultaneously compare all the variates within samples, and which can produce a single number to quantify the de- gree to which the samples diverge. The approach adopted here involves meas- ures of similarity/dissimilarity. These meas- ures compare two multivariate samples and produce a single number indicating to what extent the two samples share the same species, and optionally to what extent those species are present in similar densities in the two samples. It is important to bear in mind that a similarity value is the result of a comparison between two samples. To compare a set of replicates, then, each replicate must be compared with each other replicate, and a matrix of similarity val- ues obtained, from which some measure of central tendency (e.g., median, mean) can be computed. Thus for N samples, [N(N-l)]/2 comparisons would be performed. The primary differences between most simi- larity indices have to do with whether each species will be compared on the basis of pres- ence/absence, relative abundance, or absolute abundance. Where relative abundances are compared, the similarity measure will be sensi- tive to differences in species composition, but not to variability associated with estimating overall densities. Where absolute abundances are used, variability in both species composi- tion and densities will be quantified with the similarity measure. Using both types of simi- larity measures in tandem, therefore, provides a means of assessing whether the variability between two samples is due primarily to dif- ferences in species composition, or differences in densities. Of the similarity measures based on com- 13 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA parisons of relative abundances, one of the most intuitive and most commonly used is the Percentage Similarity of Community (PSc) in- dex of Whittaker (1952; Whittaker and Fair- banks, 1958). As suggested by its name, this index compares percent abundances of species in two samples. Therefore, if two samples have vastly differing total numbers of indi- viduals, but the species within each sample contribute exactly the same proportion of in- dividuals, then the PSc index will indicate that the two samples are identical. The index is calculated as: where a and b are, for a given species, the rela- tive proportions of the total samples A. and B, respectively, which that species represents. The absolute value of their difference is then summed over all K species. Two samples in which all species are present in identical pro- portions will result in a score of 1 (or 100%), while two samples sharing no species in com- mon will produce a score of 0. Another widely used index, but one which compares absolute abundances of species in two samples, is the so-called Bray-Curtis in- dex. Originally developed by Kulczynski (1927), and subsequently modified by Motyka et al. (1950), this index provides a number from 0 (no species in common) to 1.0 (identical samples) similar to that of Whittaker's PSc index. The index is calculated where a = the sum of all species abundances W C = 2—— a + b in sample in sample, b = the sum of all species abundances in the other sample, W = the smaller of the two abundances for each spe- cies, summed over all species. In this report, this index will be referred to as C, in accor- dance with its presentation in Motyka et al. (1950). When these two indices are used to- gether, they can provide both qualitative (i.e., relative) and quantitative information about the similarity of two samples. Specifically, when C values are substantially lower than PSc values, this indicates that differences between the two samples derive at least in part from differences in absolute numbers of individuals in the two samples. Where the two values are substantially the same, then differences be- tween the two samples are due primarily to differences in species composition. To quantify levels of variability associated with natural variation and different sample collection/analysis activities, similarity matri- ces were computed between samples taken within each basin (separated by season and mesh size), between sets of field replicates, and between duplicate laboratory analyses. Separate matrices were generated for spring and summer, and 63- and 153-um mesh tows. Differences in similarity values generated from the two different measures, as well as differences in values from each measure due to season and mesh size, were assessed using a Mann Whitney rank sum test. While it would have been preferable to use a multifactor ANOVA to assess all factors simultaneously, no transformation was found that could stabi- lize variance and ameliorate the non-normality of the data, and formulations for a non- parametric, multifactor ANOVA type test could not be found. To estimate the relative contributions of within basin spatial heterogeneity, sample col- lection, and laboratory analysis to the variabil- ity of the data, similarity values were con- verted to dissimilarity values by subtracting them from 1. To determine the relative mag- nitudes of each source of uncertainty, the mean dissimilarity associated with each stage 14 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA was subtracted from that of the previous stage. For example, to determine the amount of dissimilarity contributed by sample collec- tion, the dissimilarity estimates generated from QC analyses were subtracted from those gen- erated from field replicates. Likewise, an esti- mate of the amount of dissimilarity contrib- uted by site to site variability was obtained by subtracting the dissimilarity of field replicates from within-basin dissimilarity values. Results 5.1 Minimum detectable differences The percent minimum detectable differences for total crustaceans ranged between 31% (southern basin of Lake Michigan, spring) and 176% (western basin of Lake Erie, spring), with a median of 63% (Fig. 3). For this re- sponse variable, no basin/season met the DQO. The highest values were seen in Lake Erie, although all lakes had at least one value approaching or exceeding 100%. For these basin/seasons, therefore, the current sampling 250 i i Spring ^^m Summer 20% Diff. Total Crustaceans Total Cladocerans W C E Superior N C S Michigan N C S Huron W C E Erie W E Ontario Fig. 3. Percent minimum detectable differences for total crustaceans and total cladocerans. Vari- ances calculated from 1998 data. 15 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA CD O c CD I b 0) .Q CD "O 3 "CD Q E 'c Spring ^^f Summer 20% Diff W Superior Michigan N C S Huron W E Ontario Fig. 4. Percent minimum detectable differences for adult calanoid copepods, immature (copepodite) calanoid copepods, cyclopoid copepods and immature (copepodite) cyclopoid cope- pods. 16 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA regime would have an 80% chance of detect- ing a change in total crustacean density with 90% confidence only if that change consti- tuted at least a doubling in density. Percent minimum detectable differences for total cladocerans could only be assessed for sum- mer samples, due to low numbers in spring samples. These were substantially higher than for total crustaceans, with basin-wide values ranging from 44% (eastern basin of Lake On- tario) to 262% (northern basin of Lake Michi- gan), and an overall median value of 143% (Fig. 3). Again, no basin met the DQO re- quirements. Minimum detectable differences were lower for both adult and immature calanoid cope- pods (Fig. 4), and this was probably due at least in part to the great numbers of these in- dividuals found at most sites. Median percent minimum detectable differences were 59% and 60%, respectively, for these groups. Per- cent minimum detectable differences for cyclopoids were intermediate between clado- cerans and calanoids, again probably due in part to their relative abundances (Fig. 4). Me- dian percent minimum detectable differences for adult and immature cyclopoids were 86% and 96%, respectively. Among the copepod groups, the DQO was met in only two cases: calanoid immatures in the central basin of Lake Michigan in the summer and cyclopoid immatures in the eastern basin of Lake On- tario in the spring. Overall, percent minimum 300 § CD I b 200 - 100 - 0 CLA CAL CALIM CYC CYCIM TOT Fig. 5. Percent minimum detectable differences for major taxonomic groups CLA - total clado- cerans; CAL= total adult calanoids; CALIM = total calanoid copepodites; CYC = total adult cyclopoids; CYCIM= total cyclopoid copepodites; TOT = total crustaceans, exclusive of nauplii. Boxes indicate 25th and 75th percentiles; whiskers denote 10th and 90th percentiles; lines denote me- dian; symbols denote outliers. 17 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA 250 200 150 H CD O c CD I b _CD .Q S "O 3 "CD Q E D E 0 200 150 100 0 150 Leptodiaptomus ashlandi Leptodiaptomus minutus Limnocalanus macrurus Diacyclops thomasi i i Spring ^^f Summer 20% Diff. W C E Superior N C S Michigan N C S Huron W C E Erie J W E Ontario Fig. 6. Percent minimum detectable differences for the most common adult calanoid copepod spe- cies. 18 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA CD O c CD I b E D E Spring ^^f Summer 20% Diff. D. galeata mendotae Bosmina longirosths W C E Superior N C S Michigan N C S Huron W E Ontario Fig. 7. Percent minimum detectable differences for the most common cladoceran species. detectable differences were highest for clado- cerans, lowest for calanoids, and intermediate for cyclopoids (Fig. 5). Of the six individual species examined, in only one case was the DQO requirement met (-L. ashlandi, Lake Michigan, northern basin, summer). Percent minimum detectable differ- ences ranged from 12% to 256%, with an overall median of 94% (Figs 6 and 7). This suggests that, on average, the density of a spe- cies would have to double from one year to another in order for the current sampling re- gime to be able to detect the change as statisti- cally significant. Overall, the two cladocerans (D. galeata mendotae and B. longiwstris) had higher percent minimum detectable differ- ences than the copepods examined. As with the larger taxonomic groupings, there were no clear lake to lake differences in percent mini- mum detectable differences for the individual species. When considered in aggregate on the basis of lake basin, minimum detectable differences were consistently higher in the western basin of Lake Erie than in the other basins (Fig. 8). The eastern basin of Lake Superior and the southern basin of Lake Huron exhibited con- sistently low minimum detectable differences. 19 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA 3.0 CD O c CD I b E | 'c 2.0 - 1.0 - 0.0 -t- ± WCE Superior NCS Michigan NCS Huron WCE Erie WE Ontario Fig. 8. Percent minimum detectable differences basins. Box plots as in Fig. 5. Aside from these instances, though, percent minimum detectable differences were highly variable, and clear basin-to-basin differences were not seen. 5.2 Sources of variability of zooplank- ton data - ANOVA analyses In almost all cases, the largest source of vari- ance in the estimation of within-basin abun- dances of major taxonomic groups was associ- ated with between-station variability (Table 3). This contributed from 23% (summer, 1999, total crustaceans) to nearly 95% (summer, 1998, adult cyclopoids) of the within-basin variance. On average, between-station vari- ance made up about 70% of the total within basin variance. This suggests that large scale spatial heterogeneity in abundances is the main source of uncertainty in developing ba- sin-wide estimates of crustacean abundances. Variances associated with field replicates contributed on average 26% to total within- basin variance, and ranged from 2.4% (summer 1998, cyclopoids) to 76.7% (summer 1999, total crustaceans). It should be borne in mind that since replicates are not taken at the point of subsampling in the laboratory, vari- ances calculated from field replicates would also incorporate that variance component. The least amount of variance was contributed by duplicate laboratory (QC) analyses, which on average contributed less than 4% of total within-basin variance. Relatively few duplicate 20 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA Table 3. Relative contributions of different sources of variance to the estimation of within-basin abundances of major taxonomic grouping, as determined by multi-stage hierarchical ANOVA. Variance Comp Spring 1998 Between Station Field Reps Lab Dups Summer 1998 Between Station Field Reps Lab Dups Spring 1999 Between Station Field Reps Lab Dups Summer 1999 Between Station Field Reps Lab Dups Cal 42.8% 47.9% 9.3% 48.7% 50.0% 1.3% 69.3% 28.0% 2.7% 64.8% 34.9% 0.4% Cal Imm 80.7% 17.8% 1.5% 51.8% 19.8% 28.4% 78.6% 20.1% 1.3% 41.6% 58.2% 0.2% Cla 80.7% 19.2% 0.1% 82.5% 17.5% 0.0% Cvc 81.5% 17.9% 0.6% 94.9% 2.4% 2.7% 79.6% 19.7% 0.7% 72.8% 26.3% 0.8% Cvc Imm 76.2% 15.6% 8.2% 87.1% 0.0% 12.9% 79.2% 17.0% 3.9% 79.0% 20.1% 0.9% Total 83.6% 16.1% 0.4% 70.3% 25.3% 4.3% 78.5% 21.3% 0.1% 23.2% 76.7% 0.1% Cal - total adult calanoids; Cal Imm - total calanoid copepodites; Cla - total cladocerans; Cyc = total adult cyclopoids; Cyc Imm = total cyclopoid copepodites; Total = total crusta- ceans, exclusive of nauplii. laboratory analyses are carried out, so a single aberrant counts can have a large impact on this analysis. This was the case in Summer, 1998, when one set of duplicate laboratory analyses from Lake Ontario yielded highly di- vergent estimates of immature copepod densi- ties. This resulted in both unusually inflated variance estimates for laboratory duplicates for immature calanoids and immature cyclopoids, and anomalously low error estimates of field replicate variance for those two variates. In summary, then, it appears that the major- ity of uncertainty involved in the estimation of crustacean abundances, at least viewed at the level of order and suborder, results from large scale (i.e., station to station) spatial heteroge- neity, while a relatively minor amount is due to inaccuracies in counting on the part of labora- tory analysts. Somewhat less than one third comes from errors associated with sample col- lection and/or subsampling in the laboratory. Given the broad taxonomic groupings used in this analysis, error due to taxonomic inaccura- cies would not be included in these estimates of variance. 21 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA 5.3 Sources of variability of zooplank- ton data - similarity analyses 5.3.1 Duplicate laboratory (QC) analyses A total of 74 sets of duplicate laboratory (QC) analyses from both 63-um and 153-um mesh nets, and both spring and summer cruises, were assessed, using both PSc and C similarity indices. Data were from 1998 and 1999, the only two years for which full datasets of 153- |j,m mesh net tows are currently available. It will be recalled that these values quantify the similarity between tabulated species composi- tion estimates generated by two different ana- lysts counting the same sample, subsequent to sample splitting. Similarity values using both measures were uniformly high (Table 4); 95% of PSc values were above 0.91, while 95% of C values were above 0.88. Median values for both measures were 0.97. Statistically signifi- cant differences (a = 0.05) between lakes, mesh size or season were not found, which suggests that taxonomic difficulties are not more marked in any given lake or season, or for shallow or deeper tows. Differences be- tween similarity values calculated using PSc and C also were not apparent. Such differ- ences would arise from discrepancies in abso- lute counts of organisms, and the absence of differences between the two measures indi- cates that analysts have little trouble consis- tently counting all of the organisms in the counting chamber, a conclusion also sup- ported by the ANOVA results. Subsamples are chosen specifically to ensure a relatively narrow range of individual organisms in the counting chambers - generally between 200 and 400 - so large discrepancies in counts of individuals would not be expected. QC limits have as yet not been agreed upon for zooplankton analyses. Based on the pre- sent analysis, if duplicate QC counts are com- pared using the PSc index, a value of 0.92 should be expected in 90% of cases. It is therefore suggested that this value be adopted as a QC limit. This limit should be applicable to both 63-um and 153-um mesh tows taken during both spring and summer. QC criteria based on PSc values would guard against taxo- nomic errors, but not enumeration errors. When all QC analyses from 1998 and 1999 are examined, the majority of discrepancies between total counts of organisms resulting from duplicate laboratory analyses are less than 2% of the average of the two counts Table 4. Percentiles of Whittaker PSc and C similarity values for comparisons between pairs of duplicate laboratory (QC) analyses. Data were from 1998 and 1999, and include data from both spring and summer cruises and both 63- and 153-um mesh net tows. PSc Percentile 95th 75th 50th 25th 5th Similarity 0.99 0.98 0.97 0.96 0.91 C Percentile 95th 75th 50th 25th 5th Similarity 0.99 0.98 0.97 0.95 0.88 22 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA Table 5. Percentiles of relative discrepancies in counts of total organisms between duplicate labo- ratory analyses. Relative discrepancies (A Count %) are calculated as [absolute(count#l-count#2)/ average(count#l, count#2)]*100. Data is from 1998 and 1999, and includes both spring and sum- mer samples, as well as 63-um and 153-um mesh tows. Percentile 95th 75th 50th 25th 5th A Count (%} 5.80% 2.69% 1.53% 0.57% 0.10% (Table 5). In 90% of cases, differences be- tween duplicate counts amounted to just over 4% of the average of the two counts. It is therefore recommended that a relative percent difference of 4% be adopted as a criterion for total organism counts of duplicate QC analy- ses, with those analyses exceeding this limit subject to recounts by both analysts. 5.3.2 Field replicates PSc similarity values between field replicates were on average quite high, with 90% of all values ranging between 0.84 and 0.97, and an overall median of 0.93 (Table 6, Fig. 9). This range is not dramatically lower than similarity values of QC samples, and indicates that rela- tively little variability is introduced during the sampling process as far as relative proportions of taxa are concerned. PSc similarity between field replicates taken during the summer cruises was slightly lower than similarity of spring field replicates, and this difference, though slight, was statistically significant (Table 7). No systematic differences were found between tows using different mesh sizes (i.e., deep and shallow tows) (Table 8). Table 6. Percentiles of Whittaker PSc and C similarity values for comparisons between field repli- cate analyses. Data were from 1998 and 1999, and include both spring and summer cruises and both 63- and 153-um mesh net tows. PSc Percentile Similarity C Percentile Similarity 95th 75th 50th 25th 5th 0.97 0.95 0.93 0.90 0.84 95th 75th 50th 25th 5th 0.95 0.91 0.86 0.78 0.63 23 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA Spring 1.0 0.9 - 1 0.8 H 0.7 - 0.6 AA A-A-::A:A:: AA n AA AA WCE su N C Ml N S HU W C E ER Summer AA W E ON WCE NCS NS WCE WE SU Ml HU ER ON Fig. 9. PSc similarity values between field replicates collected in 1998 and 1999. Bars indicate means; triangles indicate minimum and maximum values for each set of comparisons. Compari- sons between 63-um mesh tows are left (lighter) bars, comparisons between 153-um tows are right (darker) bars). Table 7. Results of Mann Whitney rank sum test comparing effects of season on values of PSc similarity comparisons between field rep- licates. Group Median 25% Spring 0.935 0.900 0.950 Summer 0.920 0.890 0.940 T = 26492.0 P = 0.009 Table 8. Results of Mann Whitney rank sum test comparing effects of mesh size on values of PSc similarity comparisons between field replicates Group Median 153 jam 63 jam 0.928 0.928 0.899 0.894 0.949 0.943 T = 25041.0 P = 0.432 24 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA Similarity between field replicates calculated using the C index were substantially lower than PSc values (Fig. 10, Table 6); this differ- ence was highly statistically significant (Table 9). C values also exhibited a broader range than PSc values, and in particular contained more extremely low values. The difference in similarity values calculated by the two indices indicates that field replicates are more variable in their estimates of zooplankton densities, while being relatively consistent in their esti- mates of percent contributions of individual species. Table 9. Results of Mann Whitney rank sum test between PSc and C similarity values. Group Median 25% C PSc 0860 0.930 0.780 0.900 75% 0.910 0.950 T = 69913.0 P = <0.001 Spring WCE NCS NS WCE WE SU Ml HU ER ON Summer 0.6 W E ON Fig. 10. C similarity values between field replicates collected in 1998 and 1999. Bars indicate means; triangles indicate minimum and maximum values for each set of comparisons. Compari- sons between 63-um mesh tows are left (lighter) bars, comparisons between 153-um tows are right (darker) bars. 25 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA Variability in the estimation of densities be- tween field replicates can result from a num- ber of possible factors. Zooplankton patchi- ness on the spatial scale of the replicate tows - a scale dependent upon how much the vessel drifts between replicate tows - would intro- duce variability into density estimates. Vari- ability could also result from inaccuracies in flowmeter readings, due either to malfunction or to misreading on the part of the technician, or it can be due to differences between repli- cates in the angles at which the net is towed. To test these last two possibilities, regressions were run between the minimum C index val- ues within each set of field replicate compari- sons and the maximum angle of the net for those field replicates, the maximum difference in net angle among the field replicates, the maximum relative difference in flowmeter readings amongst the three field replicates, and the depth specific maximum relative dif- ference in flowmeter readings amongst the three field replicates. These latter two inde- max flowmeter - min flowmeter max flowmeter + min flowmeter max flowmeter min flowmeter depth depth pendent variables were calculated as follows: Prior to analysis, C values were transformed using an arcsin square root transformation to normalize the data. After transformation, data met assumptions of normality and homosce- dasticity. No relationship was found between C values and net angle. However, a highly significant relationship was found between C values and differences in flowmeter readings between field replicates (Table 10). This relationship explained slightly less than a third of the vari- ance in C values. A similar relationship was found when depth specific flowmeter values were examined. Therefore, it appears that a portion of the variability involved in sample collection is due to inconsistencies in flow- meter readings amongst the field replicates. As noted, this could result from variability in the meter itself, or from inconsistencies in reading the meter. The majority of variance in C values, how- ever, was not accounted for by flowmeter readings. This points to patchiness of zoo- plankton populations, other aspects of sample handling, such as washing the net, decanting into bottles, etc., or variance associated with Table 10. Regression results of C values and maximum relative difference in flow meter readings between field replicates. Arcsin sqrt(B-C) = 1.166 - ( 0.380 * Relative diff. in flowmeter readings) Coefficient Std. Error t P_ Constant Rel diff flow 1.166 -0.380 0.0186 0.0555 62.7 -6.8 <0.001 <0.001 Analysis of Variance: DF SS N = 104 MS Regression Residual Total 1 102 103 0.837 1.823 2.661 0.837 0.0179 0.0258 46.8 <0.001 AdjR2= 0.308 26 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA Table 11. Results of Mann Whitney rank sum test comparing effects of season on values of C similarity comparisons between field repli- cates. Group Spring Summer Median 0.870 0.840 25% 0.810 0.760 75% 0.925 0.900 T = 27036.0 P = 0.001 subsampling in the laboratory as potential ma- jor sources of variance for this stage of the analysis. As with PSc values, there was a significant, though somewhat slight, difference between C similarity values generated from spring and summer cruises, with the latter slightly lower on average than the former (Table 12). This was probably due at least in part to the greater species diversity seen in the summer. A sig- nificant difference was also found between Table 12. Results of Mann Whitney rank sum test comparing effects of mesh size on values of C similarity comparisons between field rep- licates. Group 63 jam 153 jam Median 0.830 0.880 25% 0.755 0.820 75% 0.895 0.920 T = 21390.0 P = <0.001 mesh sizes, with the smaller mesh size (i.e., shallower tows) showing somewhat greater variability between field replicates, as meas- ured by the C index (Table 12). This differ- ence, though, was not entirely consistent across all basins. 5.3.3 Between station Within-basin similarity values were only calcu- lated from samples collected with the deeper, 153-um mesh tows. These similarity values should theoretically provide an estimate of the Table 13. Percentiles of similarity values for within-basin samples PSc Percentile Similarity Total 95* 75th 50th 25th 5th 0.93 0.87 0.79 0.69 0.47 C Percentile 95* 75th 50th 25th 5th Similarity 0.90 0.80 0.69 0.54 0.25 Spring 95th 75th 50th 25th 5th 0.94 0.90 0.85 0.76 0.58 95* 75th 50th 25th th 0.92 0.83 0.74 0.60 0.22 Summer 95* 75th 50th 25th 5th 0.89 0.83 0.74 0.61 0.36 95* 75th 50th 25th 5th 0.84 0.75 0.64 0.51 0.27 27 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA Table 14. Results of Mann Whitney rank sum test comparing PSc and C similarity values. Group BC PSc Median 0.690 0.790 25% 0.540 0.690 75% 0.800 0.872 T = 416276.0 P = <0.001 error contributed to the data from within- basin spatial heterogeneity (in addition to sam- ple collection and analysis). However, the shallower 63-um mesh tows would also in- clude variation due to vertical migration, since it is likely that some stations within a basin would be visited at different times during the diurnal cycle of at least some species. In order not to confound these two potential sources of variation, therefore, only deeper tows were considered. As with the field replicate samples, within basin PSc similarity values were higher than C values (Table 13); this difference was statisti- cally significant (Table 14). The differences between these two measures were more pro- nounced for within basin comparisons than for the field replicates, suggesting that, as might be expected, differences in crustacean densities varied more from site to site than within a site. Half of PSc values were between 0.69 and 0.87, while half of Bray Curtis values Table 15. Results of Mann Whitney rank sum test comparing effects of season on values of PSc similarity comparisons between field repli- cates Group Spring Summer Median 0.850 0.740 25% 0.761 0.610 75% 0.900 0.828 Table 16. Results of Mann Whitney rank sum test comparing effects of season on values of Bray-Curtis similarity comparisons between field replicates. Group Spring Summer Median 0.740 0.645 25% 0.600 0.510 75% 0.835 0.750 T = 144516.0 P = <0.001 ranged between 0.54 and 0.80. For both measures, similarity values of com- parisons made during the spring were statisti- cally significantly higher than those of summer comparisons (Tables 15 and 16). During the spring, over 75% of spring PSc values were above 0.75, a value often taken to indicate samples taken from the same community. Fully half were above 0.85. In contrast, less than half of summer PSc values met the 0.75 criterion. The high values in the spring are most likely reflective of the extremely limited species composition of spring samples. For example, average numbers of crustacean taxa per site ranged between 5 and 8 for the five lakes in spring, 1999. C values were lower than PSc values for both seasons (Table 13). Somewhat less than half of spring values were above 0.75, while only a quarter of summer values met or exceeded that value. The differ- ences between the two indices were more pro- nounced in spring than in summer, which again indicates that within-basin species com- position was more variable in summer. Values in the central and western basin of Lake Erie were notably lower than those for other basins, and this was apparent for both PSc and Bray Curtis values, indicating that both species composition and densities varied greatly within these two basins (Figs 11 and 12). Consistent differences were not apparent in other basins. T = 157938.0 P = <0.001 28 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA 1.00 0.75 - : 0.50 E 0.25 - 0.00 Spring 1.00 Summer 0.00 WCENCS NCSWCEWE Superior Michigan Huron Erie Ontario Fig. 11. PSc similarity values for within-basin comparisons. Data from 1998 and 1999; 153-um mesh tows. Boxes as in Fig. 6. 29 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA 1.00 0.75 - 1 0.50 H E 0.25 - 0.00 • '± • _•__!_ • • • Spring 1.00 0.75 = 0.50 E 0.25 - 0.00 . I Summer WCENCS NCSWCEWE Superior Michigan Huron Erie Ontario Fig. 12. C similarity values for within-basin comparisons. Data from 1998 and 1999; 153-fjm mesh tows. Boxes as in Fig. 6. 30 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA 5.3.4 Relative contribution of different sources of error An idea of the relative contribution of the various sources of error can be gained by comparing PSc and C values from the various stages of the analysis. Since what is of interest here is variability, it is more convenient to ex- press these values as dissimilarity values, rather than similarity values. This is accomplished by simply taking their inverse (i.e., 1-PSc; 1-C). As noted, the amount of uncertainty result- ing from laboratory analyses is relatively slight. As measured by dissimilarity this averaged 0.03 (i.e., 1-0.97) for comparisons of spring samples made by both indices, and 0.04 for summer samples (Figs 13 and 14). Values were essentially the same whether relative spe- cies composition or actual densities are con- sidered (i.e., when examining PSc or C values). The amount of dissimilarity resulting from sample collection is only slightly higher than that from sample analysis when relative spe- cies composition is considered. In other words, estimates of relative species composi- tion appear to be fairly robust for each par- ticular site. Again, it is important to remem- ber that variability due to sub-sampling is not captured by replicate QC analyses, and would therefore be incorporated into dissimilarity values from field replicates. When considered in terms of absolute abundances, however, the contribution of sampling variability increases notably. During spring, on average, it is over three times higher than that of sample analy- sis, while in summer it is three and a half times greater than that of sample analysis (Fig. 14). This indicates that the greatest introduction of variability during sample collection is in esti- mation of absolute densities of organisms, while estimates of the relative proportions of constituent species are relatively robust. In all cases the greatest amount of dissimi- larity was a result of site to site variation (Table 17). Even when just relative abun- dances are compared, site to site variation contributes more dissimilarity than both labo- ratory analysis and sample collection com- bined in spring, while this contribution is close to double that of laboratory analysis plus sam- ple collection in summer. When absolute den- sities (i.e., C values) are considered, the contri- bution of site to site variation to dissimilarity doubles in spring, but during summer is essen- tially the same as that of relative proportions of species, indicating that there are substantial differences in species composition from site to site within a basin during the summer, while during spring the majority of dissimilarity re- sults from site to site differences in densities. In all cases, though, dissimilarity values were lower when measuring using the PSc index. The relatively low site to site variability in spe- cies composition in spring is consistent with the highly restricted species richness of most spring communities. As was pointed out pre- viously, site to site variability was particularly high in the western and central basins of Lake Table 17. Relative contributions of different sources of variability to overall within-basin dissimilar- ity, as measured by both PSc and C indices. Variance Component PSc Spring Summer C Spring Summer Between Station Field Reps Lab Dups 51% 27% 23% 64% 19% 17% 54% 35% 10% 45% 42% 14% 31 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA 100 75 - 50 25 - 100 75 - M 50 25 - PSc Spring Basin Field Lab Summer Basin Field Lab Level of Replication 100 75 - 50 25 - 100 75 - 50 25 - Spring Basin Field Lab Summer Basin Field Lab Level of Replication Fig. 13. Comparison of laboratory, field, and basin replicate similarity values. Data for 153-um mesh tows, 1998 and 1999. Boxes as in Fig. 6. 32 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA PSc E (/5 (/5 Q _CD I Lab Reps Field Reps Basin Reps WCENCS NSW SU Ml HU W E ON Fig. 14. Contribution to variability (as quantified by dissimilarity) of between-site heterogeneity, sample collection, and laboratory analysis. Erie, with regard to both species composition and species densities. While such variation was high at different times in other basins, such effects were not consistently noted. The error resulting from laboratory analyses as a percentage of overall dissimilarity (Table 17) was much higher than the error compo- nent of laboratory analyses estimated by ANOVA (Table 3). In the latter case, this source of error rarely exceeded a few percent of total within-basin variability, while dissimi- larity values of duplicate laboratory analyses were approximately 10 to 25% of total within- basin dissimilarity. This in all likelihood does not represent greater variability in the taxo- nomical aspect of laboratory analyses (as quantified by dissimilarity values), but rather indicates that there is less variability overall involved in taxonomic analyses, as compared to estimation of densities. A direct compari- son of the estimates of sources of error from ANOVA and dissimilarity analyses is not pos- sible, however, since these two types of analy- sis yield quantitatively incommensurate results. 33 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA Discussion 6.1 DQO Of the 184 cases examined in this study, mini- mum detectable differences satisfied the crite- rion set by the DQO in only 3 instances. While exhibiting a wide range of values, mini- mum detectable differences in general ranged between 40% and 190%. This means that in order for the current sampling regime to de- tect a change in the densities of major crusta- cean groups, in some cases these would have to nearly triple. Minimum detectable differ- ences were highest for cladocerans, a group of particular ecological and management interest given their importance as fish food items. Of the regions examined, minimum detectable differences were particularly high in the west- ern basin of Lake Erie, an area that is typically subject to high spatial heterogeneity. While clearly not satisfying the current DQO, is the current level of sampling effort adequate to detect ecologically significant changes? Is normal interannual variability greater than the DQO criterion of 20%? Un- fortunately, GLNPO does not currently pos- sess the data necessary to address these ques- tions. Only two years of data collected with 153-um mesh nets is available at present, so statements about year to year variability can not be made with any confidence. While over 15 years of data collected with the 63-um mesh net are available, as pointed out above, interannual variability in this data is con- founded with variability due to diurnal vertical migration. However, a recent study (Barbiero and Tuchman, in press) was able to detect sig- nificant changes in the densities of many cladoceran species, as estimated by 63-um mesh tows, resulting from the invasion of an exotic zooplankton predator in the mid 1980s. These changes in many cases were quite dras- tic, though, and it is unclear if less substantial, but still ecologically significant, changes would be detectable under the current sampling re- gime. A more fundamental shortcoming of the current DQO is that it does not afford a means of assessing community-level data qual- ity. Data quality can only be assessed indi- vidually for each of the numerous variates that collectively make up each zooplankton sam- ple. In spite of falling far short of the DQO, the current sampling program is apparently suc- cessful at measuring community structure, though somewhat less successful at measuring community size. Overall, relative (i.e., PSc) similarity values for within-basin comparisons were high, with most comparisons exceeding Engleberg's (1987) criterion for identical com- munities of 0.60. C similarity values were al- ways lower, though this difference narrowed in the spring, compared to summer. This indi- cates that community structure can be as- sessed with some confidence, somewhat more so in the spring than in summer due to the re- stricted species richness during the former sea- son. Both the lowest similarity values, and the highest variability of similarity values, were ob- served in the western and central basins of Lake Erie. Because of the morphometry of these basins and the relatively high inflow (in comparison to volume) entering the western basin, these areas exhibit substantial spatial heterogeneity in many variables, so the high variability of zooplankton data is not unex- pected. 6.2 Sources of variation Both ANOVA analyses and similarity meas- ures indicate that relatively little uncertainty is contributed by the final stages of zooplankton analysis. The amount of variance contributed to estimation of numbers of broad taxonomic 34 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA categories, as estimated by ANOVA, averaged less than 5%. The amount of dissimilarity contributed by this stage of the analysis was about 3-4%, although this represented on av- erage 20% and 12% of the total measured within basin dissimilarity for the PSc and C indices, respectively. The low variance com- ponent of this part of the analysis is not com- pletely surprising. Duplicate counts are per- formed after subsamples are taken and placed in counting chambers, so discrepancies in counts would arise strictly from miscounts, rather than differences in the numbers of or- ganisms contained in different subsamples. The counting chamber contains a circular groove which allows the sample to be enumer- ated essentially along a continuous transect, with most of the width of the transect remain- ing within the field of vision of the micro- scope. While discrepancies in identifications might occur between analysts, the low PSc dis- similarity values suggest that this does not happen frequently, which again is not surpris- ing given the limited species diversity of most zooplankton samples, and the tendency for samples to be dominated by a small number of the half dozen common species. All three measures of variance suggest that about one quarter of the total within-basin un- certainty in the zooplankton data is apparent in field replicates. Variance between field rep- licates contributed about 25% of within-basin variability, according to the ANOVA analysis; PSc dissimilarity values between field repli- cates contributed, on average, 23% of total within-basin dissimilarity, while variance be- tween field replicates contributed 39% of total within-basin C dissimilarity. Included in this component of variance is small-scale patchi- ness, uncertainty associated with sample col- lection, and also uncertainty resulting from subsampling in the laboratory. Station-to-station variability within a basin contributed the most variance, as quantified by all three measures, which indicates zoo- plankton communities vary considerably within the nominally homogeneous basins. More of this variability appears to be a result of differences in densities, rather than differ- ences in species composition from station to station. Station-to-station variability contrib- uted 70% of the total within-basin variability measured by ANOVA, which specifically quantifies variance in densities, while 40-60% of total within basin dissimilarity, which takes into account differences in species composi- tion, was contributed by station-to-station dif- ferences. A comparison of PSc and C values suggests that during the spring, most of this variability was a result of differences in abun- dances, since C values were substantially and consistently higher than PSc values in this sea- son. However, during summer, the relatively high PSc dissimilarity values and the lack of a substantial difference between PSc and C val- ues indicate that species composition also var- ied from station to station with basins. 6.3 Controlling variation The major source of variation in GLNPO's zooplankton data appears to be basin-scale spatial heterogeneity. The most appropriate way of reducing this source of variability would be to increase the number of stations within each basin. It is recognized that this is probably not a feasible alternative. The error associated with field replicates contributed substantially less variability, but this stage of data generation offers more real- istic possibilities for reducing overall variance. As mentioned, this variance component in- cludes uncertainty due to subsampling in the laboratory, in addition to the uncertainty in- volved in sample collection and small scale spatial heterogeneity. Regression results sug- gest that a significant amount of uncertainty in this stage is associated with variations in flow meter readings between field replicates. This source of variability can be reduced by ensur- 35 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA ing that all flow meters are in a good state of repair through a regular schedule of mainte- nance. Anomalous readings should be recog- nized by field personnel and should result in replacement of faulty meters. Records of me- ter-specific calibrations should be kept on ship so that large divergences from past calibration factors can be recognized. It is also necessary that field personnel be properly trained to en- sure that meters are read correctly and that po- tential problems with meters are recognized early and addressed appropriately. Other actions can be taken in the field to re- duce the level of uncertainty introduced at this stage of data generation. Field personnel should ensure that zooplankton nets are thor- oughly rinsed before decanting contents into sample bottles. They should also exercise care in ensuring that both net speed and depth are kept as close to those specified in the standard operating procedure as possible. The most difficult element of field sampling to control is typically the angle of the net. Interestingly, no relationship was found between variability in net angle between field replicates and levels of dissimilarity, which suggests that the impact of net angle on uncertainty might be relatively slight. Since replicate analyses are not conducted on subsamples taken in the laboratory, the amount of variability contributed by this stage of analysis is unknown. Instead, the variability contributed by subsampling is included in esti- mates of between field replicate variance. Since subsampling represents a source of un- certainty that is particularly amenable to inves- tigator control, it would be helpful to know how substantial it is. This could be accom- plished by analyzing duplicate splits of a single sample. Ways of reducing uncertainty due to subsampling include ensuring that the sample is completely homogenized prior to splitting in the Folsom splitter, and ensuring that all or- ganisms are subsequently transferred to the counting chamber. 36 ------- VARIABILITY OF GLNPO ZOOPLANKTON DATA References Barbiero, R.P. 2003. Application of the Great Lakes National Program Office's Data Quality Ob- jective to Benthos Data Generated by the Annual Water Quality Survey. US EPA, GLNPO, Chicago II. Barbiero, R.P. and M.L. Tuchman. 2003. Changes in the crustacean communities of Lakes Michi- gan, Huron and Erie following the invasion of the predatory cladoceran Eythotrephes ceder- stroemi Can. J. Fish. Aq. Sci. (inpress). Engelberg, K. 1987. Die Diatomeen-Zonose in eimem Mittelgegirgsbach und die Abgrenzung jahreszeitlicher Aspekte mit Hilfe der Dominanz-Identitat. Arch. Hydrobiol. 110:217-236. GLNPO 2003. Sampling and Analytical Procedures for GLNPO's Open Lake Water Quality Sur- vey of the Great Lakes. EPA 905-R-03-002, U.S. EPA, GLNPO, Chicago II. Kulczynski, S. 1927. Die Pflanzenassoziation der Pieninen. Internat. Acad. Polon. Sci., Lettr. Bull., Classe Sei. Math, et Nat., ser. B. Sci. Nat. Suppl. 2:1927:57-203. Makarewicz, J.C. 1987. Phytoplankton and zooplankton composition, abundance and distribution: Lake Erie, Lake Huron an Lake Michigan - 1983 Volume 1 and 2 U.S. Environmental Pro- tection Agency. EPA-905/2-87-002. 183 p. Makarewicz, J.C. 1988. Phytoplankton and zooplankton composition, abundance and distribution: Lakes Erie, Huron and Michigan - 1984 U.S. Environmental Protection Agency. EPA- 905/3-88-001. Makarewicz, J.C. and P. Bertram. 1991. Phytoplankton and zooplankton composition, abundance and distribution Lakes Erie, Huron and Michigan - 1985. U.S. Environmental Protection Agency. EPA- 905/3-85-003. Motyka, J., B. Dobrzanski and S. Zawadzki. 1950. Preliminary studies on meadows in the southeast of the province Lublin. Univ. Mariae Curie-Sklodowska Ann. Sect. E. 5 (13):367-447. Whittaker, R.H. 1952. A study of summer foliage insect communities in the Great Smoky Moun- tains. Ecol. Monogr. 22:6-31. Whittaker, R.H. and C.W. Fairbanks 1958. A study of copepod communities in the Colombia Ba- sins, Southeastern Washington. Ecology 39:46-63 Yan, N.D., W. Keller, N.M. Scully, D.R.S. Lean and P.J. Dillon. 1996. Increased UV-B penetration in a lake owing to drought-induced acidification. Nature. 381:141-143. 37 ------- |