EPA/600/A-96/098 CHAPTER 17: WORKGROUP SUMMARY REPORT ON METHODOLOGICAL UNCERTAINTY IN CONDUCTING SEDIMENT ECOLOGICAL RISK ASSESSMENTS WITH CONTAMINATED SEDIMENTS Keith R. Solomon, Gerald T. Ankley, Renato Baudo, G. Allen Burton, Christopher G. Ingersoll, Wilbert Lick, Samuel N. Luoma, Donald D. MacDonald, Trefor B. Reynoldson, Richard C. Swartz, and William Warren-Hicks 17.1 INTRODUCTION In the following chapter, a range of issues related to the uncertainty associated with sediment ecological risk assessments (SERA) are described including an evaluation of: (1) uncertainty associated with the overall SERA framework, (2) the effects of false positive and false negative errors associated with sediment toxicity tests, (3) spatial and temporal distributions of sediment contamination, (4) sampling errors, and (5) uncertainties associated with transport and fate models. Chapter 18 describes the uncertainty associated with specific measurement endpoints commonly used in SERAs and discusses approaches for addressing these sources of uncertainty. The goal of any uncertainty analysis is to describe and interpret knowledge limitations that may be present in the measurement endpoints used to conduct a SERA analysis, for the purpose of incorporating estimates of uncertainty into management decisions. A number of viewpoints were discussed at the Workshop for defining uncertainty, two of which are described below. In the first viewpoint, uncertainty is considered to be composed of two components: (1) measures of bias (i.e., consistent deviation of measured values from the true value) and (2) measures of precision (i.e., measure of agreement among replicable analyses of a sample). Accuracy is the combination of bias and precision for a procedure which reflects the closeness of a measured value to a true value. Figure 17.1 presents a visual interpretation of these components of uncertainty (Jessen 1978). Note that bias and precision are independent. For example, a method could have low bias and low precision (Figure 17.1 b), or high bias and low precision (Figure 17.1c). Either combination leads to a decline in the overall confidence in the measurement. ------- % F 4 % I Strictly defined estimates of accuracy are limited to formal experiments such as inter-laboratory testing of a blind (but known) chemical concentration. In contrast, many sediment surveys are conducted without the benefit of knowing the "true" value (i.e., accuracy of sediment toxicity tests with field-collected sediments). In these cases, estimates of field precision are limited and a weight-of-evidence is used as a surrogate to estimating bias. Strictly defined, precision is the observed variance of repeated measurements conducted under the same conditions (e.g., the variance associated with repeated ponar grabs at the same location). In practice however, biological and chemical properties are very dynamic, making rigorous estimates of bias and precision difficult to obtain (see Section 17.2.4.3). In the second viewpoint, uncertainty can be evaluated in the context of expert judgement and opinion in the analysis of uncertainty. While determination of accuracy and precision of management tools provides direct information for evaluating uncertainty, many methods are not amenable to this type of assessment. For this reason, less quantitative methods are often used to evaluate uncertainty, such as expert judgement (see Chapter 18). Although we may not have definitive numerical measurements on the ecological relevance of a specific measurement endpoint, a well-designed expert opinion survey can be used to generate knowledge relevant to the issue. Similar to numerical analysis, the larger the opinion survey the greater the information we have to assess uncertainty. In practice, expert opinion may be more available than well conducted numerical analyses of uncertainty and can be a useful source of information. A large statistical literature is available on methods for generating expert opinion in a formal analysis of uncertainty (i.e., Bayes theory; Chapter 13). Bayes theory can be used to combine both subjective and quantitative sources of information in a decision making process. In the following sections, we use both of the viewpoints described above to discuss the sources of uncertainty and the implication of uncertainty in SERAs. We encourage scientists and policy makers to consider uncertainty in risk-based decisions. We hope that by addressing the uncertainty issues, decision makers will have valuable information available for weighing the various options available for risk reduction. ------- Figure 17.1. ------- 17.2 UNCERTAINTY IN THE RISK ASSESSMENT FRAMEWORK Guidelines have been developed for ecological risk assessments to promote consistency in analysis (Chapter 1). These guidelines also allow for the establishment of quality standards and consistent terminology for assessments. Consistency in the use of guidelines can help inform all stakeholders as to the relative degree of confidence and scientific knowledge under which the decision was made (Russell and Gruber 1987). Several guidelines are in use with varying degrees of consistency. Many of these methods are based on similar procedures and principles; therefore, the USEPA Framework for Ecological Risk Assessment (USEPA 1992a) was used in this chapter as a guideline (Chapter 1; Figure 17.2). The ecological risk assessment framework as used in this chapter (Figure 17.2) has five major areas: (1) problem formulation, (2) exposure characterization, (3) effects characterization, (4) risk assessment, and (5) risk management. Each of these areas is discussed in more detail below. ------- 17.2.1 Problem Formulation Problem formulation is the planning or experimental design stage of the overall risk assessment process. In this sense it is similar to posing a question such as: "Is the mean number of species in the benthic community at the exposed area different from the reference area?" Uncertainty in the formal statistical sense is of lesser importance at this stage of the process; however, there is qualitative uncertainty in the appropriate choice of assessment endpoints {objectives or purposes of the risk assessment) and measurement endpoints (indicators or tools used to evaluate risk or effects). The best way to reduce these initial uncertainties in the problem formulation is to involve all interested parties through stakeholder input. This involves asking all interested parties {including the risk managers, the scientific community, and the public) to define the problem in the form of a concise narrative. Once the problem has been identified, appropriate assessment and measurement endpoints can be selected (Figure 17.2). Discussions at the Workshop focused primarily on uncertainty in relation to evaluating effects of chemical stressors. However, non-chemical stressors could be a dominant process influencing the system (e.g., habitat disturbance) or other chemical or non-chemical stressors potentially will interact to produce perturbations (e.g, ammonia and dissolved oxygen in the lower Fox River, Ankley et al. 1992; temperature and metals in the Clark Fork River, Kemble et al. 1994). In the case of retrospective risk assessments (in some instances termed impact assessments), identification of the stressor(s) is a potential source of uncertainty. Identification of stressors should be part of the problem formulation stage and typically consists of: (1) a survey of the natural and anthropogenic stressors which may be associated with the test area, (2) an assessment of the available data relative to quality control and quality assurance, and (3) hypothesizing potential stressors. In some cases, it may be necessary to make use of physical and chemical separation techniques {e.g., toxicity-based fractionation methods) to identify specific classes of contaminant stressors. For example, extraction of pore wafer from sediment may allow partial identification of potential chemical stressors through toxicity identification evaluation (TIE) methods which use physico-chemical manipulations to affect the toxicity of specific contaminants of concern (Chapter 18). 17.2.2 Exposure Characterization ------- The analysis phase of the risk assessment framework consists of two activities: characterization of exposure and characterization of effects (Figure 17.2). The purpose of characterization of exposure is to predict or measure the spatial and temporal distribution of a stressor and its co-occurrence or contact with the ecological components of concern (USEPA 1992a). These uncertainties may influence the planning, execution, or interpretation stage of the exposure characterization. Primary sources of uncertainty in characterizing exposure include: (1) laboratory imprecision, (2) matrix interference errors, (3) sample location biases, (4) sample collection and handling errors, (5) phase distribution of stressor, (6) contamination of the sample with other stressors, (7) spatial and temporal heterogeneity of the stressor, (8) references for comparison of stressor levels, (9) substrate type and interactions with stressor, (10) life history of the organism, (11) non-equilibrium of chemical stressor between sediment and water, (12) response model prediction error, (13) data transformation and normalization errors, (14) exposure pathway analysis errors, and (15) fate analyses errors. 17.2.3 Effects Characterization The purpose of characterization of effects is to identify and quantify the adverse effects resulting from exposure to a stressor (USEPA 1989). The 15 areas of uncertainties listed above for exposure characterization may also influence the planning, execution, or interpretation stage of the effects characterization. Additional sources of uncertainty in effects characterization may include: (1) effects of non-contaminant stressors in toxicity tests, (2) reference comparisons to toxicity or receptor distributions, (3) laboratory to field extrapolations, (4) interpretation and definition of natural variability, (5) differences in receptor species sensitivity, (6) differences in physical alterations of the sediment, and (7) differences in stressor-response relationships. 17.2.4 Risk Characterization Risk characterization may either be prospective (e.g., product hazard assessment as described in Chapters 3 and 4) or retrospective (e.g., impact hazard assessment as described in Chapters 6 and 7). Risk characterization at the organismal level has traditionally been done by comparison of the concentration of the stressor(s) found in the environment to the responses reported for that stressor(s) in the laboratory, field, or by use of the literature. This risk characterization can be performed as described in the following sections: ------- 17.2.4.1 Use of Quotients for Risk Assessment Risk quotients are simple ratios of exposure and effects. For example: Traditionally, the quotient method has been used to compare the effect concentrations for the most sensitive species of concern to the average, median, mean, or highest exposure concentration. In addition, these exposure concentrations may be compared to an effect concentration derived from toxicity tests. This assessment can be made more conservative by the use of safety (application) factors, such as division of the effect level by a number such as 20 (CWQG 1987). Use of safety factors allows for unquantified uncertainty in the effect and the exposure estimations or measurements. Because this uncertainty is unknown and unquantifiable, substantial errors are possible, both in underestimating or overestimating the risk. In the absence of sufficient information from toxicity tests, these risk assessments may be underprotective. Conversely, where a wide range of toxicity data is available, the variation in receptor response may be well defined and further use of safety factors may be overprotective. Use of the quotient approach is acceptable for early tiers of the risk assessment, but the approach fails to consider the range of variation which may exist in terms of exposures and susceptibility (i.e., Chapter 5 dealing with dredging assessments). Recently a method was proposed for using quotients in Tier I risk assessments, which included the incorporation of uncertainty in both the numerator and denominator of the quotient equation (Parkhurst et al. 1995). 17.2.4.2 Probabilistic Risk Assessment A second approach for evaluating risk is to express the results of a refined risk characterization analysis as a distribution of toxicity values rather than a single point estimate (i.e., Chapters 3 and 4 dealing with product assessment). For example, this approach has been proposed or is now being used by: the Dutch government (Health Council of the Netherlands 1993); Cardwell et al. (1993); Graney et al. (1994); Solomon et al. (1996); and Klaine et al. (1996). A major advantage of the probability approach is that it uses all relevant single species ------- toxicity data, and when combined with exposure distributions, allows for quantitative estimation of the risks to receptors. However, the approach is only valid if endpoints used in the assessment are similar. For example, survival data would not be expected to be protective of reproductive effects. The degree of overlap of the exposure curve (drawn as a log-Pearson Type III distribution; McBean and Rovers 1992) with the effects curve can be used to estimate the probability that a certain percentage of receptors may be adversely affected for a percentage of occasions (i.e., Figure 17.3). A similar approach has been used in the derivation of USEPA Water Quality Criteria (Stephan et al. 1985). With the use of overlapping distributions, there is an implicit assumption that protecting a certain percentage of species for a certain proportion of occasions will also preserve ecosystem structure and function. Although this approach to risk characterization takes into account much of the variability with regard to the range of susceptibility in receptor species, it still embodies several uncertainties and limitations. For example, the choice of protection level (e.g., 90% of species) may not be socially acceptable. Some may view 90% as being overprotective, whereas others may find this level of risk unacceptable, especially if the 10% of potentially affected species includes endangered species or other organisms of ecological, commercial, or recreational importance (see Chapter 11). Additionally, risks of persistent, bioaccumulative chemicals to species at the top of the food chain may not be sufficiently addressed by this approach (Graney et al. 1994, Section 18.5). In the situation where there is a desire to protect more sensitive receptors, these species could be identified and appropriate mitigation measures taken. A further issue requiring consideration in probablistic risk assessments is the number of data points required to define the distribution of receptor species for either acute or chronic effects. Additional test species and endpoints beyond those now applied for SERAs may be needed (Burton and Ingersoll 1994). In addition, there is a need for methods such as those proposed by Parkhurst et al. (1995) for calculating the degree of risk associated with exposures to multiple chemicals. 17.2.4.3 Retrospective Risk Assessment Risk assessments based on measurement of current conditions are considered ------- retrospective and typically do not forecast the expected change in risk due to remedial or mitigatory options or changes in the ecosystem in the future. Retrospective risk assessments rely on a number of techniques discussed in more detail in Chapter 5 (dredging assessment) and Chapters 6 and 7 (site clean-up assessment). These assessments may include measurement endpoints such as sediment toxicity tests, assessments of structural or functional changes in the benthic communities, cellular and molecular effects in the receptor species, or the presence of tissue residues of contaminants of concern (Chapter 18). The use of multiple lines of evidence (weight-of-evidence) is particularly important in retrospective risk assessment (USEPA 1992b) and may also be useful in prospective analysis. For example, if the probability that an effect on benthic community structure the result of exposure to a chemical stressor is made more certain if the concentration of the stressor in the area is high enough to have caused the observed effect and also results in overt toxicity in laboratory toxicity testing. ------- Figure 17.3 Graphical representation of the use of probabilistic risk assessment with sediments. Cumulative frequency distributions of concentrations of stressors in sediments are compared with distributions of sensitive benthic organisms. Arrows show probabilities of not exceeding 10th percentile sensitivity concentrations for acute and chronic endpoints at three sites (adapted from Solomon et al. 1996). 17.2.5 Risk Management The outcome of all risk management actions should either be the acceptance or the reduction of the risk. Risk reduction involves many potential actions which range from the technical through the socio-economic to the political. In undertaking risk management, it is necessary to: • Decide which risks must be managed and in what priority. This requires that some method for measuring and comparing risks must be available (e.g., Chapter 11 on ecological relevance), • Maximize the reduction of risk for the available resources (this implies that a system must be in place for assessing the degree of risk reduction and for measuring its cost). 17.2.5.1 Uncertainty in Prioritizing the Risks In general, the first step in ranking risks for management involves evaluation of the harmful effects of the action associated with the production or release of the stressor. In the case of human health, this response may be expressed as a numerical risk. Even though the risk assessment process may have limitations, ------- estimates of relative risk may be comparable if similar processes are used to derive the risks. An additional difficulty is presented by unquantifiable risks. This applies particularly to environmental risk which may have measurement endpoints of an aesthetic nature such as reduced days of recreational fishing or reduced view. Endpoints of this type cannot be quantified in the same terms as, for example, fish mortality. Harwell et al. (1992) proposed a method for evaluating and prioritizing risk to human health and the environment. The system is based on recognition of the issues raised earlier in this document including: • Acknowledging that ecosystems are diverse; • Knowing that ecosystems respond to stress differently and that this response is governed by the type of ecosystem and the type of stressor; • Recognizing that a wide range of temporal, organizational, and spatial scales are involved; • Knowing that the measurement endpoints are relevant to the selected assessment endpoints; • Knowing the normal baseline behavior of the ecosystem; • Having good extrapolation techniques from laboratory and field measurement endpoints to the selected assessment endpoints; and; • Considering uncertainty in all of these issues. The risks to be prioritized are then separated into a series of components which are ranked as follows: • The potential magnitude of the risk. Magnitude is ranked on an ordinal scale of 5 ranging from low to high as follows: Low < Medium < High < Very High < Extremely High. • The geographic extent of the risk. Extent is ranked on an ordinal scale of 3 ranging from low to high as follows: Local < Regional < Biosphere. • The recovery time. Recovery time is ranked on an ordinal scale of 3 ranging from low to high as follows: Short (years) < Medium (decades) < Long (centuries). These scores can then be combined and used for ranking purposes. However, as these ranks are based on expert assessment, they are subject to uncertainty and bias. As suggested above for problem formulation, uncertainty of qualitative ------- assessments may be reduced by involving expert opinion polls and the stakeholders in the process. 17.2.5.2 Uncertainty in Assessing Risk Reduction Strategies Many options for risk reduction may be available to the risk manager; however there are generally two types of tools — technological and regulatory. Technological tools for risk mitigation include a wide range of procedures, many of which are specific to the situation. In the case of sediments which are contaminated by effluent discharges, further treatment of the effluent before release is commonly applied in industrial settings. In the case of in situ contamination of sediments, many cleanup and disposal options are available (Francinques et al. 1985; IJC 1988} once sediment has been identified as containing chemicals at concentrations posing a problem. The sediment can either be removed, stabilized, capped, treated in situ, or "no-action" may be taken (Lynam 1987; Grigalunas and Opaluch 1989). The remediation procedure or combination of procedures chosen is specific to the study area and depends on ecological, chemical, physical, engineering, economic, human health, and political considerations. Furthermore, source control and continued monitoring must be included with any remediation effort to avoid creation of new problem areas. The regulatory tools which may be used for risk mitigation, in increasing order of effectiveness are as follows: • The provision of better information and communication to prevent misuse of stressors that may contaminate sediments. • Better control of discharges and releases of stressors to levels which are judged to present a tolerable risk to the benthic community. • Restrict the use and application of the stressor. • Impose a total manufacturing ban on the stressor. Uncertainties exist which affect the selection of technological or regulatory tools that should be used to mitigate risks. Uncertainty of knowledge {which technological options are available) is best addressed through expert opinion surveys and stakeholder consultations. Uncertainties in the degree of risk reduction are best assessed by reiterating the risk assessment procedure for all the appropriate exposure reduction strategies and then ranking these in terms of both the reductions in risk and the uncertainty in achieving these reductions. This ------- matrix will allow informed choices to be made and the "trading off" of costs with risk reductions and the uncertainties of achieving these reductions, 17.2.5.3 Uncertainty in assessing societal values If ecosystems are viewed as providing services to society, these services can be assessed to have an economic value. All components of the ecosystem can be assigned an economic value; however, this view has been criticized, particularly in the assigning of value to concepts such as species richness and diversity. In assigning economic value to ecological services, the implication is that these services, including physical capital (equipment and technology) or human capital (knowledge and skills) are interchangeable and can be traded in the same way as these commodities, for example, writing off the loss of a species for an increase in corn production. In addition, assignment of economic value is often restricted to only a few components of the system at risk and may ignore temporal and spatial interconnectedness of organisms, populations, and ecosystems. Uncertainty with respect to economic issues which should be considered are as follows (Harwell et al. 1992): • Sustainability: Irreversible resource damage will undermine the sustainability of ecosystems (and by extension, human society). Thus, irreversible damage to an ecosystem should not be economically discounted over a period of years (as in the amortization of equipment and capita! resources), as this devalues the importance of long-term environmental problems. Thus, regulatory agencies or politicians may relegate a problem to a lower level of importance because the effect will only be felt at some time in the future (e.g., global warming). • Willingness to pay: This assumes that market prices can be used to assess the tastes and preferences of society. The problem with this approach is individuals and society may enjoy the services provided by ecosystems (i.e., clean air, water, weather control, food chain maintenance, furnishing genetic diversity) without understanding them or even having knowledge of their existence. Thus, their willingness to pay for them or assign values may be incorrect and inappropriate relative to their real ecological value. • Multipliers: Economic analysis of benefits always includes multipliers (e.g., developing a subdivision results in jobs for construction ------- workers and a demand for building materials). Multipliers should also be used in the risk side of the risk:benefit equation. For example, the loss of a benthic community may result in losses to fisheries, transportation, fishing equipment manufacturers, the accommodation industry and marina operations. 17.3 SPECIAL ISSUES OF UNCERTAINTY IN SEDIMENT ECOLOGICAL RISK ASSESSMENT 17.3.1 Decision Making With Sediment Toxicity Measurement Endpoints: Exploring the Effects of False Positive and False Negatives Sediment risk assessments can be used in a variety of applications, including the assessment of relative risk between an impacted site and a reference site or the reduction in risk associated with a remediation action. A variety of chemical and biological measurement endpoints can be used in the assessment, including laboratory sediment toxicity tests and sediment quality guidelines (see Sections 18.2, 18.3, and 18.7). For example, sediment toxicity tests are now used to evaluate the relative difference in organism survival or growth between sediments from reference areas and dredged material (Chapter 5 on dredging assessment; USEPA-USCOE 1991,1994). Test endpoints such as mortality or growth of organisms exposed in the laboratory to field-collected sediments is assumed to reflect the response of organisms in the field exposed to dredged material. A key issue in the use of sediment endpoints within a regulatory or programmatic environment is the level of confidence in the results of this assessment. Using the above example, in dredging there is uncertainty in determining: (1) the probability of stating that the reference area and the dredged material are different with respect to toxicity when in fact they are the same (false positive) and, (2) the probability of stating that the reference area and the dredged material are the same, when in fact they are different (false negative). The power of the test is the probability of stating that the reference area and dredged material are the same when they are the same. In most regulatory applications we are only interested in a single-directional test: whether the toxicity of the dredged material is greater than the reference are toxicity (e.g., we ignore any information showing the reference are toxicity is greater than the dredged area). This type of statistical ------- approach to environmental decision-making achieves the goal of environmental protection. However, some problems do arise. For example, if enough samples are taken the reference and dredged areas can always be shown to be statistically separable. The degree of separation in toxicity can be very small, but statistically evident. We could frame an alternative null hypothesis to detect a difference of an ecologically significant magnitude. While this has scientific appeal, the degree of difference representing an ecologically significant result could be long debated. In classical statistical terms, the chance of false positive decisions is termed a Type I error (cc), the chance of false negative decisions is considered a Type II error (S), and power is 1-13. Investigations typically focus on establishing rigorous Type I errors. For example, risk managers are often willing to risk a 5% chance that a Type I error occurs. However, Type II errors are often ignored, or no definitive Type II decision criteria are established. In classical hypothesis testing, balancing the potential occurrence of false positive and false negative results is a function of the number of samples collected and the variance of the sample mean. Type I and Type II errors are mathematically linked, for a fixed sample size and variance, so establishing a in the decision criteria determines S (see Steel and Torrie 1980, for a discussion of this topic). The interrelationship of Type I and Type II error require consideration of the relative importance of the risks associated with making false positive and false negative decisions. Risk assessment usually focuses on reducing the risk of false positive results associated with statistical analysis (by establishing a low a level). However, from an environmental protection perspective, emphasis should be placed on reducing the risk of making false negative decisions {i.e., falsely concluding that an area is not contaminated when it actually is). For example, an environmentally conservative approach would emphasize identifying small differences between the reference and test areas. Therefore, it would be desirable to have a high chance of classifying a site as clean, when it is clean (high power), and a small chance of falsely classifying the reference and test site are the same when they are different (small 8). In this conservative approach, one would rather make an error in judging a clean site as contaminated, than misclassify a contaminated site as clean. In contrast, an alternative approach would be to classify the test site different ------- from the reference site only when the data provide a large degree of confidence in the decision. Therefore, one would want to reduce the error in stating that the reference site and test site are different, when they are not. This is accomplished by establishing a small a, with a higher probability of false negative results. 17.3.1.1 Case Study: Inter-laboratory Variability The chance of false positive and false negative results is a function of the number of samples and the variance of the test endpoint. As an example, we will examine variability in sediment toxicity tests. While many sources of error are associated with these tests (see Sections 18.2 and 18.3), the following example focuses only on uncertainty associated with inter-laboratory variability. Inter-laboratory variance (i.e., round-robin or ring testing) has been extensively studied in whole-effluent toxicity testing (Warren-Hicks and Parkhurst 1992; Parkhurst et al. 1992) and we will draw on this earlier work in this analysis. Data for the analysis are obtained from a round-robin study of whole-sediment toxicity tests (USEPA 1994a; ASTM 1995a-e; Burton et al. 1996). A key issue in the use of any method is the number of tests required for a specified decision criteria. In this example, Burton et al. (1996) reported mean survival of Chironomus tentans in 10-day whole-sediment toxicity tests. Data were generated from eight laboratories, each of which tested split samples of field-collected sediment using the toxicity test method described in USEPA (1994a). For one of the sediments evaluated in the study, the mean survival among the eight laboratories was 76% with a standard deviation of 27% (resulting in a coefficient of variation of 37%, which is considered acceptable inter-laboratory precision; Burton et al. 1996). The data consisted of survival measurements reported by each of eight laboratories. From this information, the number of laboratories needed to achieve a specified decision criteria can be estimated based on pre-specified probabilities of either false positive or false negative results. For any one laboratory, the reported survival response was the mean of eight replicate tests, each replicate test consisting of 10 organisms. If the replicate data were available, we would have calculated the number of replicates required by each of the eight participating laboratories for prespecified decision criteria (the data are not available at the time of this analysis). For discussion purposes only, we use the laboratory mean data and present an analysis of inter-laboratory variability. The methods for estimating intra-laboratory replicates ------- is consistent with the following discussion. Suppose that an investigator is faced with determining the sediment toxicity of a potentially impacted site. Also, assume that 90% survival has been established as the acceptable control response. Given that the investigator has no prior knowledge of the site toxicity, an assumption was made that the sediment is about as toxic as that in the above referenced Burton et al. (1996) study. Alternatively, the investigator could conduct a pilot study of the site instead of making this assumption. Given these data, the investigator wishes to determine the number of laboratories necessary to achieve a specified decision criteria based on the chance of false positive and negative results. [Note: A somewhat related concept is minimum detectable difference (MDD; USEPA-USCOE 1991, 1994). The MDD is generally used to establish the minimum difference detectable between a control and response solution for a single toxicity test, given a fixed sample size and variance. This concept may be adaptable to our example by evaluating the MDD between a reference site and a dredge site, for a fixed number of laboratories with known inter-laboratory variance,] The following equation provides a means of determining the desired number of laboratories, while balancing the chance of false negative and false positive test results: where: N = the number of laboratories required to meet specified levels of Type I and Type II errors Z1-B = the critical value of 1-S for the normal distribution (e.g, 1.64 for a 1-sided test with a B= 0.05 error probability), Z1-o = the critical value of 1-a for the normal distribution (e.g, 1.64 for a 1-sided test with a a = 0.05 error probability), Cs = the specified standard (i.e., 90% survival) jx1 = the average percent survival across laboratories. Figure 17.4 presents a plot of the power of a 1-sided test of the null hypothesis: HO: Cs = ^1, against the alternative hypothesis: H1: Cs > y1; [Note: because we are only concerned if the toxicity of ------- the site is less than the standard, a 1-sided test of the hypothesis is appropriate]. Notice that fixing either the power of the test (1-B), or the Type I error rate (a) establishes the other. For example, with a false positive and false negative error probability of 5%, 33 laboratories are required for testing. With a false positive and false negative error rate of 10%, 20 laboratories are required for testing. 17.3.1.2 Summary The above example demonstrates a method for estimating the number of laboratories given Type I and Type II errors. The choice of how much error is acceptable is up to the investigator. The investigator should carefully consider the relative merits and interpretations of Type I and Type II errors, when evaluating the results from any sediment measurements used to establish the possibility of contamination. ------- Figure 17.4 ------- 17.3.2 Uncertainty in Estimating Spatial and Temporal Distributions of Contaminants in Sediment Sediments may be highly variable on both a spatial and temporal basis. Therefore, replicate samples need to be collected at each site to determine variance in sediment characteristics. Sediment should be collected with as little disruption as possible; however, subsampling, compositing, or homogenization of sediment samples may be required for some experimental designs (e.g., USEPA 1994a,b; ASTM 1995a-e; Environment Canada 1996a,b). Sampling locations might be distributed along a known pollution gradient, in relation to the boundary of a disposal area, or sampling locations may be identified as being contaminated in a reconnaissance survey. These comparisons can be made in both space and time. In pre-dredging studies, a sampling design can be developed to assess the contamination of samples representative of the project area to be dredged (Chapter 5 on dredging assessment). Such a design should include subsampling cores taken to the project depth. When dealing with a given sampling area (i.e., river, lake, estuary), the appropriate sampling design is of importance since the goal might be to describe existing conditions for the entire area by collecting a discrete number of samples. The choice of the sampling scheme is also important for the intended data manipulation. Fewer samples are needed if the objective of the study is to just describe the average conditions over the entire area. On the other hand, if the sampling is to be used to draw a map of distribution (i.e., highlight point sources, trends of distribution, location and area of contamination), the choice of the sampling net (regular, random, or fixed grid) may dictate which type of mapping system has to be used (Baudo 1990). In addition, if temporal variability is expected, sampling should be repeated as many times as possible to reduce this source of uncertainty. Sampling frequency is particularly critical since the timing of the successive samples can only be used to evaluate the change on the selected time scale. 17.3.2.1 Estimation and Measurement of Magnitude of Uncertainty The overall uncertainty of sampling depends on several factors including sample: (1) type, (2) volume, (3) equipment, (4) handling, (5) number, and (6) replicates. Sample type: Sediment is a complex mixture of solid, aqueous, and gaseous ------- phases, in addition to biotic compartments. Hence, study objectives must clearly indicate which type of medium is to be sampled. For example, different methods may be needed to sub-sample pore water in sediment or to sample benthic organisms in sediment. On the other hand, the objective of the study may be to sample the whole "active" layer (e.g., to calculate a diversity index) or sample the vertical micro-structure of some sediment characteristic (e.g., the vertical profiles for redox or oxygen). Once the type of sample required has been identified, the choice of the sampling gear must be made accordingly. It is often difficult to determine the appropriate depth of sediment to sample (e.g., "How deep must a core be? How deep is the bioturbation? Where is the boundary between the oxic and the anoxic layers?"). A common mistake is to sample at the maximum depth in the sediment which will potentially provide an unrealistic estimate of exposure (i.e., sampling below the biologically active zone). Finally, the performance of the selected sampling gear is not constant (depending on the kind of substrate and the operating conditions, including skill of operators). In most cases, when sampling is made without actually seeing what is being collected, the uncertainty can be assessed only after the sample is recovered (e.g., via visual observation of core length, texture, or color) or processed in the laboratory. As a consequence, the degree of uncertainty is different when sampling is done blind or is done visually by checking the performance of the sampling gear (Tables 17.1 and 17.2). Sample volume: Some variables may exhibit pseudo-continuous distribution in space (e.g., grain size where particles are sorted according to the hydraulics of the system); whereas, other variables often have a pronounced patchiness (e.g., benthic invertebrate distributions). Hence, the sample volume should account for the known or estimated local micro-spatial heterogeneity (both horizontal and vertical) at the scale of the sampling tool. Larger samplers will provide an "averaged" sample with an increased uncertainty of measuring smaller-scale heterogeneity. If variables with substantially different distributions have'to be measured, repeated sampling with different samplers should be considered. Sub-sampling of sediments is often required; therefore, samples should be thoroughly homogenized before splits are made (USEPA 1994a; ASTM 1995a). The amount of sediment required for analyses can range from a few micrograms (e.g., CHN analysis) to several kilograms (e.g., radioactive isotopes, laboratory toxicity tests). Hence, the minimum amount of sample needed to perform all analyses must be estimated ------- before choosing the sampling equipment and may limit the number of measurements performed on a sample. For example, if a coring tube is used to sample sediment, the diameter and the thickness of sections need to provide enough material for all of the planned analyses. The same sample is typically used for more than one analysis. Hence, a compromise would be to use the original sections for measurements requiring small aliquots and to use pooled sections for measurements requiring larger aliquots. It should be noted that this procedure potentially limits subsequent statistical analysis of data (e.g., correlations, principal component analysis, cluster analysis) which can be done only with paired data. A special case where the sample volume is particularly critical is evaluating pore-water composition of sediment. In this case, the water content of the sediment must be estimated in advance to be sure enough water can be extracted from each sample. In summary, to increase confidence and lower uncertainty, the sample volume must be large enough to provide a representative sample for each measurement endpoint of interest, enough material to perform all analyses plus further measurements that may be needed (i.e., Toxicity Identification Evaluation; Chapter 16). Often these two requirements are difficult to achieve due to lack of knowledge and the need to minimize the sampling effort. However, it should be kept in mind that collection of additional sample will result in a different sample. Sampling equipment (dredges, grabs, corers): All types of sampling equipment vary in performance and each device has specific advantages and disadvantages (i.e., size, weight, triggering mechanisms; ASTM 1995b). Few comparisons have been reported dealing with sampling efficiency between types of equipment (Baudo 1990); however, different types of dredges, grabs, and corers provide unique types of samples. Thus, choice of equipment for both whole-sediment and for pore-water sampling depends on the study objectives, measurement endpoint of interest, characteristics of the study area, sediment type and compactness, and the presence of interfering flora or fauna (e.g., roots, shells), (Baudo 1990, Murdoch and MacKnight 1991, Adams 1991, ASTM 1995b). Moreover, allowance should be made for the different performance of the same sampler depending on the environment in which it is used (e.g., soft bottom or sand). The equipment can alter or contaminate the sample (e.g., metal or plastic in contact with sample; cleaning of the sampler) or the equipment may produce artifacts due structural limitations (e.g., washing out of finer material, gas or temperature ------- changes). Mudroch and MacKnight (1991) and ASTM (1995b) provide additional details regarding sampling equipment. In addition to dredges, grabs, and corers, a number of non-conventional sampling tools have been applied for specific purposes including "peepers" (dialysis chambers for collection of pore water), sedimentation traps, and artificial substrates (for benthic colonization studies). Although these and other non-conventional tools may provide useful information, lack of standardization may lead to a high uncertainty with their use. To summarize, any sampling equipment may introduce a marked uncertainty since it is largely unknown whether a representative sample has been collected. Sample handling: Techniques of sample conservation and manipulation should be carefully examined using specific equipment to prevent not only the contamination of the samples, but also possible alterations (Murdoch and Bourbonniere 1991; ASTM 1995b) . Alteration of the sediment usually remains undetected unless specific studies are designs are used (e.g., repeated measures at different times or after each sampling step). Hence, identification of new handling artifacts may make results of previous studies questionable. In order to minimize uncertainty, a consensus method should be established and followed; however, periodic revisions may be required. Number of samples: Switzer (1979) pointed out a common statistical problem: the estimation of the required number of sampling points can be determined statistically only after the data have been gathered, or if some estimate of crucial sediment characteristics are obtained in preliminary studies. A number of approaches can be used to estimate the minimum number of samples needed to estimate an average value for the measurement of interest (Baudo "1990). Information needed to determine the number of samples includes heterogeneity in physical and chemical data (Kratochvil and Taylor 1981; Sokal and Rohlf 1981; Hakansson and Jansson 1983). The required number of samples also depends on the distribution of the data (normal, Poisson, negative binomial distributions). A detailed description of the statistical properties of these distributions can be found elsewhere (Bliss and Fisher 1953; Sokal and Rohlf 1981). For the definitive sampling of an area, additional considerations including ------- directionality and point sources may dictate the sampling plan. The sampling strategies in these cases can be classified in three main types {Hakanson and Jansson 1981; Baudo 1990): (1) a deterministic system, with a sampling design based on previous information and varying density; (2) a stochastic system, when the sampling stations are randomly selected; and (3) a regular grid system, with the sampling stations randomly or deterministically selected. Advantages and disadvantages of each method are discussed in Baudo (1990). Repetitive sampling: More often than not, only one sample per station is collected. Repetitive sampling allows for an estimate of the local spatial or temporal heterogeneity (USEPA 1994a,b; ASTM 1995a-e). An accurate estimation of the sampling variability (within the station and among stations) is needed to avoid false positive or false negative decisions and assumes an even greater importance when assessing spatial or temporal variability. The obvious disadvantages of repeated sampling are the increased cost and time though the increased costs of collecting two or more samples from the same station can be quite low, especially if a multi-sampler can be used. However, it will be much more expensive to perform all of the analyses on each sample. Alternatively, a representative measure could be made on all sample pairs to evaluate local heterogeneity (i.e., CHN analysis which may be related to distributions of metals and organic contaminants). An extrapolation to biological variables may be more difficult. Samples for biological measures are often pooled to provide an average of the local variability. For example, if the purpose of the study is to conduct a reconnaissance survey to identify contaminated areas for further investigation, the experimental design might include collection of just one composited sample from each area to allow for sampling a larger area. The lack of replication at an area usually precludes statistical comparisons (e.g., ANOVA), but these surveys can be used to identify contaminated areas for further study or these data can be evaluated using statistical regressions (ASTM 1995a; USEPA 1994a,b). In other instances, the purpose of the study might be to conduct a quantitative sediment survey to determine statistically significant differences between control (or reference) sediments and test sediments from one or more areas. The number of replicates/site should be based on the need for sensitivity or power. In a quantitative survey, field replicates (separate samples from different grabs collected at the same area) would need to be taken at each site. Separate subsamples from the field replicates might be used to determine within-sample ------- variability or for comparisons of test procedures (e.g., comparative sensitivity among test organisms), but these subsamples cannot be considered true field replicates for statistical comparisons among areas. 17.3.2.2 Accounting for and Reducing Uncertainty MacKnight (1991) identified the following factors as the most important to identifying sampling options: (1) purpose of sampling, (2) study objectives, (3) historical data and other available information, (4) bottom dynamics at the sampling area, (5) size of the sampling area, and (6) available funds vs. estimated (real) cost of the project. Factors 1 and 2 are obviously critical, and must be agreed upon in advance between managers and scientists involved in the project. This could easily be the most important source of uncertainty, since "there is no one formula for design of a sediment sampling pattern which would be applicable to all sediment sampling programs" (MacKnight 1991). Since inadequate strategies and unclear goals of sediment sampling are among the most important sources of uncertainty, the early involvement of managers in the sampling plan should be sought. In addition, to reduce the sampling bias, the proposed project should be peer reviewed by scientists with expertise in each one of the fields of study covered by the project. This peer review should evaluate the adequacy of the sample media, sample volume, sampling equipment, sample handling, samples number, repetitions. Information related to Factors 3 (historic data) and 4 (dynamics of bottom sediment) listed above are often not available to assist in the selection of sampling areas (number and location) with the required degree of confidence. This information is needed to assure an unbiased assessment of Factor 5 (size of the sampling area) and Factor 6 (costs). In any case, the choice of sampling plan should support a statistical evaluation of the sampling variability (both spatial and temporal). Hence, a pilot study is usually needed to established local spatial and temporal heterogeneity before the definitive sampling is performed. Whenever feasible, in situ tools should be used in the pilot study to estimate distributions in relevant physical, chemical, and biological measurements (e.g., sediment compactness, echo-sounding, pH, oxygen, redox, biotic communities). The extrapolation to field conditions from data gathered on samples transferred to the laboratory is always subject to uncertainty. For this reason, there is a need for developing in situ techniques for measuring chemistry, toxicity testing, and benthic ------- community. Alternately, uncertainty associated with sampling could be substantially lowered by visually checking the sampling sites (e.g., by using divers, submersible cameras, manned vehicles; Tables 17.1 and 17.2). 17.3.2.3 Interpretation of Uncertainty In Relation to Decision Making The number of samples determine the cost of the study (assuming all samples are used for analysis). Hence, there is a need to limit the number as much as possible and still retain confidence that the final results will be both sound and defensible. Too few samples will result in large variability and may result in the need for additional sampling; whereas, too many samples will result in a waste of resources. On the other hand, the "uncertainty" associated with measurement endpoints outlined in Chapter 18 should be weighed relative to the cost-benefit analysis for the potential remedial options. An overestimate of the actual contamination, both in terms of concentration or the distribution in the study area may lead to an inflated cost for remediation. An underestimate could result in a wrong choice (no remedial action) or in limited intervention. 17.3.2.4 Summary Uncertainty in sampling is typically unaccounted for in most risk assessments. This uncertainty can result from either poor knowledge of the real performance of the samplers or from an inadequately designed sampling plan. In addition, both systematic and random errors can occur and usually remain unknown or undetected. Furthermore, the uncertainty associated with sampling is much higher in cases when the actual sampling is done under "blind" conditions (i.e., sampling from a boat; Table 17.1) compared to when the operators can see the sampling medium (shallow areas sampled by hand or with visual aids; Table 17.2). The uncertainty in describing temporal variability is usually greater since it compounds the uncertainty in spatial heterogeneity with uncertainty in repeated sampling. Tables 17.1 and 17.2 provide an indication of overall uncertainty in sampling and assume: standardized procedures are followed, the reliability of the selected sampling method is considered, and allowances are made for systematic or random errors. Sampling uncertainty has components which can be reduced by: (1) planning the sampling according to the existing information, (2) using appropriate collection and handling methods, (3) conducting pilot studies, and (4) measuring the different sources of variability in the definitive study. ------- Table 17.1 Degree of uncertainty in " blind sampling" Knowledge Systematic errors Random errors 1 Sample type L L L | Sample volume L L M 1 Sample equipment L H H 1 (non conventional) H H H Sample handling L L M Sample number H H H Sample repetitions H L H L= Low, M = Medium, H = high Table 17.2 Degree of uncertainty in visual sampling Knowledge Systematic errors Random errors Sample type L L L Sample volume L L L Sample equipment I L L (non conventional) H H H Sample handling L L M Sample number H L H Sample repetitions H L L L= Low, M = Medium, H = high ------- 17.3.3 Error and Uncertainty in Models Applied in Sediment Ecological Risk Assessments Exposure models are used to predict concentrations of contaminants in the physical environment (sediments, interstitial water, and overlying water) and concentrations in organisms as a function of space and time (Chapter 15). Two types of system-level models are typically applied to evaluate contaminated sediments: (1) contaminant transport and fate models and (2) food chain models. Sections 8.5 and 8.6 discuss uncertainty associated with specific models used to evaluate bioavailability of contaminants associated with sediments. System-level models should be able to predict contaminant concentrations as affected by natural or anthropogenic events. The goal of modeling should be to develop a predictive model (i.e., a model which is based on parameters which can be measured accurately in the laboratory or by means of simple field tests). Computations should then be based on these parameters with, ideally, no calibration or fine-tuning. In this way, one can have confidence in the predictions and can also use the model to evaluate different environmental conditions and different systems, again with little or no additional calibrating. 17.3.3.1 Model Calibration When a model is calibrated, the calibration is only valid for the data used in the calibration. In order to illustrate this point, consider predicting contaminant concentrations in a lake over 25 years. During this time, a few large storms can occur and these storms may be responsible for most of the sediment and contaminant transport. If the model is calibrated to data taken in an "average" year (i.e., when the large storms did not occur), then parameters and results will be incorrect because they did not include extreme events. If, on the other hand, the model is calibrated to data taken in a very stormy time, the parameters and extrapolated results will also be incorrect. A predictive model needs to be based on the concept that the future is statistically the same as the past. What is known is only that events of a certain magnitude have a certain probability of occurring. Predictive models should then be able to predict the most probable result and the probabilities of the results of different sequence of events. As an example of model variability, consider the prediction of PCB half-life in Lake Ontario made by three independent groups (Limnotech, Manhattan College, and University of Toronto). All of these models are based on the concept of a ------- well-mixed sediment layer (Table 17.3; Ziegler and Connolly 1995). The effect of this layer on sediment fluxes is dependent on the thickness of the layer (typically a poorly defined characteristic). Because of this, estimates of half-life differ by almost one order of magnitude, from 3 years to 25 years. Therefore..(INFORMATION FENDING FROM LICK). Table 17.3 Results of three different predictions of half-life of PCBs in Lake Ontario. Investigators Assumed thickness of layer PCB half-life Limnotech 15 cm 25 yrs Manhattan College 8 cm 15 yrs University of Toronto 0.5 cm 3 yrs 17.3.3.2Errors and Uncertainties in Contaminant Transport and Fate Process Models In order to quantitatively understand and predict the environmental effects of contaminated sediments, especially as influenced by natural large episodic events or by remedial actions, a knowledge of the transport and fate of the sediments and the contaminants associated with these sediments is necessary (Chapter 15). Some of the more significant processes that need to be understood and quantified include: (1) the resuspension, erosion, and sediment bed dynamics; (2) sorption of contaminants to particles and colloids; (3) flocculation, settling speeds, and deposition rates of particles and floes; (4) hydrodynamics including currents and wave action; (5) air-water exchange of contaminants; (6) biochemical reactions and degradation of contaminants; and (7) the inputs of contaminants from the surrounding land, atmospheric, and point discharges. The process most relevant to estimates of contaminant fluxes at the sediment-water interface are processes 1 to 4 listed above. These sediment-water exchange processes are important because these factors control phase distributions, bioavailability, and contaminant concentrations in the sediments and overlying water (Chapter 15). The flux of contaminants to surface waters from the surrounding land resulting from non-point discharges is also not well quantified. Point discharges are generally better known and controlled. ------- 17.3.3.3 Relevance to Decision Making In some form or another, models are always used in organizing and interpreting data and, therefore, in decision making. These models may be simple conceptual models or they may be complex models involving many physical, chemical, and biological processes described by large numbers of differential equations. The solutions to these models may be simple estimates or large arrays of numbers. Models should help in making decisions (e.g., selection of remedial options and understanding the effects of these remedial actions). Because of errors in models, more complex models are not necessarily more accurate or more helpful than simple models. However, simple models are often based on larger numbers of assumptions and may thus be inaccurate. Although complex models may address these assumptions more appropriately, they require more input information which makes them less useful. In some cases, complex models may have more assumptions (i.e., input parameters) which, if not verified, lead to higher uncertainty. In all modeling, the potential errors of the model should be understood and quantified. This information needs to be transmitted to the risk manager both at the problem formulation stage and at the risk management stage (Figure 17.2). Because of natural variability, models should give the most probable outcome of a sequence of events as well as the probabilities. 17.4 CONCLUSIONS AND RECOMMENDATIONS • Guidelines have been developed for conducting ecological risk assessments to promote consistency in design, analysis, and interpretation of data. These guidelines also allow for the establishment of quality standards and consistent terminology for assessments. Consistency in the use of guidelines can help inform all stakeholders as to the relative degree of confidence and scientific knowledge under which a decision was made. Additionally, the best way to reduce initial uncertainties in the risk assessment is to involve all interested parties through stakeholder input and expert opinion surveys. This involves asking all interested parties (including the risk managers, the scientific community, and the public) to define the problem in a form of a concise narrative. Once the problem has been identified, appropriate assessment and measurement endpoints can be selected and applied. • Sources of uncertainty unique to characterizing exposure or effects in SERAs include: (1) sample location, collection and handling errors; (2) spatial and ------- temporal heterogeneity of the stressor, (3) references for comparison of stressor levels, (4) substrate type and interactions with stressor, (5) non-equilibrium of chemical stressor between sediment and water, (6) effects of non-contaminant stressors in toxicity tests, and (7) laboratory to field extrapolations, • Risk characterization requires consideration of the relative Importance of the errors associated with making both false positive and false negative determinations. Risk assessment usually focuses on reducing the risk of false positive results associated with statistical analysis (by establishing a low a level). However, from an environmental protection perspective, emphasis should also be placed on reducing the risk of making false negative decisions (i.e., falsely concluding that an area is not contaminated when it actually is), • Sediment sampling has uncertainty components which can be reduced by planning the sampling program according to the existing information, using appropriate collection and handling methods, conducting pilot studies, and measuring the different sources of variability in the definitive study. Additionally, uncertainty associated with sampling of sediment can be substantially lowered by visually checking the sampling sites. • Exposure models are used to predict concentrations of contaminants in the physical environment and concentrations in organisms as a function of space and time. The models should be predictable of contaminant concentrations as affected by natural or anthropogenic events. The goal of modeling should be to develop a predictive model (i.e., a model which is based on parameters which can be measured accurately in the laboratory or by means of simple field tests). Computations should then be based on these parameters with, ideally, no calibration or fine-tuning. In this way, one can have confidence in the predictions and can also use the model to evaluate different environmental conditions and different systems, again with little or no additional calibration. • The outcome of all risk management actions should either be the acceptance or the reduction of the risk. Risk reduction involves many potential actions which range from technical through socio-economic to the political. In undertaking risk management, it is necessary to decide which risks must be managed and in what priority. • Strategic actions for addressing uncertainty in SERAs are listed in Section 18.8. 17.5 REFERENCES ------- Adams, D.D. 1991. Sampling Sediment Pore Water. In A. Mudroch and S.C. MacKnight, eds., Handbook of techniques for aquatic sediments sampling. CRC Press, Ann Arbor, Ml, pp. 171- 202. American Society for Testing and Materials. 1995a. Standard test methods for measuring the toxicity of sediment-associated contaminants with freshwater invertebrates. E1706-95b. In Annual Book of ASTM Standards, Vol. 11.05, Philadelphia, PA. pp. 1204-1285. American Society for Testing and Materials. 1995b. Standard guide collection, storage, characterization, and manipulation of sediments for toxicological testing. E1391-94. In Annual Book of ASTM Standards, Vol. 11.05, Philadelphia, PA. pp. 835-855. American Society for Testing and Materials. 1995c. Standard guide for conducting 10-day static sediment toxicity tests with marine and estuarine amphipods. E 1367-92. In Annua! Book of ASTM Standards, Vol. 11.05, Philadelphia, PA. pp. 767-792. American Society for Testing and Materials. 1995d. Standard guide for designing biological tests with sediments. E1 525-94a. In Annual Book of ASTM Standards, Vol. 11.05, Philadelphia, PA. pp. 972-989. American Society for Testing and Materials. 1995e. Standard guide for determination of bioaccumulation of sediment-associated contaminants by benthic invertebrates. E1 688-95. In Annual Book of ASTM Standards, Vol. 11.05, Philadelphia, PA. pp. 1140-1189. Ankley, G.T., K. Lodge, D.J. Call, M.D. Balcer, L.T. Brooke, P.M. Cook, R.G. Kreis, A.R. Carlson, R.D. Johnson, G.J. Niemi, R.A. Hoke, C.W. West, J.P. Giesy, P.D. Jones, and Z.C. Fuying. 1992. Integrated assessment of contaminated sediments in the Lower River and Green Bay, Wisconsin. Ecotoxicol. Environ. Safety 23:46-63. Baudo, R. 1990. Sediment sampling, mapping, and data analysis. In R. Baudo, J.P. Giesy and H. Muntau, eds., Sediments: Chemistry and toxicity of in-place pollutants. Lewis Publishers, Chelsea, Ml, pp. 15-60. Bliss, C.I. and R.A. Fisher. 1953. Fitting the negative binomial distribution to biological data. Biometrics 9:176-200. Burton, G.A., Jr. and C.G. Ingersoll. 1 994. Evaluating the toxicity of sediments. The ARCS assessment guidance document. EPA/905-B94/002, Chicago, IL. Burton, G.A., T.J. Norberg-King, C.G. Ingersoll, G.T. Ankley, P.V. Winger, J. ------- Kubitz, J.M. Lazorchak, M.E. Smith, I.E. Greer, F.J. Dwyer, D.J. Call, K.E. Day, P. Kennedy, and M. Stinson. 1996. Interlaboratory study of precision: Hyalelfa azteca and Chironomus tentans freshwater sediment toxicity assays. Environ. Toxicol. Chem.: In press. Canadian Water Quality Guidelines. 1987. Task force on water quality guidelines of the Canadian council of resource and environment ministers. Ottawa, ON. Cardwell, R.D., B.R. Parkhurst, W. Warren-Hicks, and J.S. Volosin. 1993. Aquatic ecological risk. Water Environ. Technol. 5:47-51. Environment Canada. 1996a. Biological Test Method: Test for growth and survival in sediment using the freshwater amphipod Hyalella azteca. Environment Canada, Ottawa, Ontario. Technical report number pending: In press. Environment Canada. 1996b. Biological Test Method: Test for growth and survival in sediment using larvae of freshwater midges (Chironomus tentans or Chironomus riparius), Environment Canada, Ottawa, Ontario, Technical report number pending: In press. Francinques, N.R. Jr., M.R. Palermo, C.R. Lee, R.K. Peddicord. 1985. Management strategy for disposal of dredged material: Contaminant testing and controls. Miscellaneous Paper D-85-1, U.S. Army Engineer Waterways Experiment Station, Vicksburg, MS. Graney, R.L., A. Maciorowski, K.R. Solomon, H. Nelson, D. Laskowski and J.L. Baker. 1994. Report of the aquatic risk assessment and mitigation dialogue group. SETAC Foundation for Education, Pensacola, FL. Grigalunas T.A. and J.J. Opaluch. 1989. Economic considerations of managing contaminated marine sediments. In Contaminated marine sediments -Assessment and remediation. National Research Council, National Academy Press, Washington, DC, pp. 291-310. Hakanson, L. and M. Jansson. 1983. Principles of lake sedimentology. Springer-Verlag, Berlin, 316 p. Harwell, M.A., W. Cooper and R. Flaak. 1992. Prioritizing ecological and human welfare risks from environmental stresses. Environmental Management, 16:451-464. Health Council of the Netherlands. 1993. Ecotoxicological risk assessment and policy-making in the Netherlands - dealing with uncertainties. Network, 6(3)/7(1):8-11 ------- International Joint Commission. 1988. Options for the remediation of contaminated sediments in the Great Lakes: Sediment Subcommittee and its Remedial Options Work Group to the Great Lakes Water Quality Board Report to the International Joint Commission. Windsor, Ont. Jessen, R.J. 1978. Statistical Survey Techniques. Wiley, New York. Kemble, N.E., W.G. Brumbaugh, E.L. Brunson, F.J. Dwyer, C.G. Ingersoll, D.P. Monda, and D.F. Woodward. 1994. Toxicity of metal-contaminated sediments from the upper Clark Fork River, MT to aquatic invertebrates in laboratory exposures. Environ. Toxicol. Chem. 13:1985-1997. Klaine, S. J., G.P. Cobb, R.L. Dickerson, K.R. Dixon, R.J. Kendall, E.E. Smith and K.R. Solomon. 1996. An ecological risk assessment for the use of the biocide, dibromonitrilopropionamide (DBNPA) in industrial cooling systems. Environ. Toxicol. Chem. 15:21-30. Kratochvil, B. and J.K. Taylor. 1981. Sampling for chemical analysis. Anal. Chem. 53: 924A-938A. Lynam W.J,, A.E. Glazer, J.H. Ong, and S.F. Coons. 1987. An overview of sediment quality in the United States. EPA-905/9-88-002, Washington, DC. MacKnight, S.D. 1991. Selection of bottom sediment sampling stations. In A. Mudroch and S.C. MacKnight, eds., Handbook of techniques for aquatic sediments sampling. CRC Press, Boca Raton, FL, pp. 17-28. McBean, E.A. and F.A. Rovers. 1992. Estimation of the probability of exceedance of a contaminant concentration. Ground Water Monitoring Review 12:115-119. Mudroch, A. and S.D. MacKnight. 1991. Bottom sediment sampling. In A. Mudroch and S.C. MacKnight, eds.,. Handbook of techniques for aquatic sediments sampling. CRC Press, Boca Raton, FL, pp. 29-95. Mudroch, A. and R.A. Bourbonniere. 1991. Sediment sample handling and processing. In A. Mudroch and S.C. MacKnight, eds., Handbook of techniques for aquatic sediments sampling. CRC Press, Boca Raton, FL, pp. 131-169. Parkhurst, B.R., W. Warren-Hicks, and L.E. Noel. 1992. Performance characteristics of effluent toxicity tests: summarization and evaluation of data. Environ. Toxicol. Chem. 11:771-791. Parkhurst, B.R., W. Warren-Hicks, T. Etchison, J.B. Butcher, R.D. Cardwell and J. Voloson. 1995. Methodology for aquatic ecological risk assessment. Report ------- prepared for the Water Environment Research Foundation, Alexandria, VA, RP91-AER-1 1995. Russell, M. and M. Gruber. 1987. Risk assessment in environmental policy-making. Science 236:286-290. Solomon, K.R., D.B. Baker, P. Richards, K.R. Dixon, S.J. Klaine, T.W. La Point, R.J. Kendall, J.M. Giddings, J.P. Giesy, L.W. Hall, Jr., C.P. Weisskopf, and M. Williams. 1996. Ecological risk assessment of atrazine in North American surface waters. Environ. Toxicol. Chem. 1 5:31-76. Sokal, R.R. and F.J. Rohlf. 1981. Biometry. Freeman and Co., New York, 859 p. Steel, R.G. and J.H. Torrie. 1980. Principals and Procedures of Statistics, McGraw Hill, NY. Stephan, C.E., D. I. Mount, D.J, Hansen, J.H. Gentile, G.A. Chapman, and W.A. Brungs. 1985. Guidelines for deriving numerical national water quality criteria for the protection of aquatic organisms and their uses. PB85-227049, National Technical Information Service, Springfield, VA. Switzer, P. 1975. Statistical considerations in network design. Water Resour. Res., 15: 1512-1516. U.S. Environmental Protection Agency. 1989. Assessing human health risks from chemically contaminated fish and shellfish: A guidance manual. EPA 503/8-89-002, Washington, DC. U.S. Environmental Protection Agency. 1992a. Framework for Ecological Risk Assessment. EPA/630/R-92/001, Washington, DC. U.S. Environmental Protection Agency. 1992b. Sediment classification methods compendium. Sediment Oversight Technical Committee. EPA 813-R-92-006, Washington, DC. U.S. Environmental Protection Agency. 1994a. Methods for measuring the toxicity and bioaccumulation of sediment-associated contaminants with freshwater invertebrates. EPA/600/R-94/024, Duluth, MN. U.S. Environmental Protection Agency. 1994b. Methods for measuring the toxicity of sediment-associated contaminants with estuarine and marine amphipods. EPA/600/R-94/025, Duluth, MN. U.S. Environmental Protection Agency and U.S. Army Corps of Engineers. 1991. Evaluation of dredge material proposed for ocean disposal. EPA-503/8-91/001, ------- Washington, DC. U.S. Environmental Protection Agency and U.S. Army Corps of Engineers. 1994. Evaluation of dredged material proposed for discharge in inland and near coastal waters (draft). EPA-823-B-94-002, Washington, DC. Warren-Hicks, W., and B.R. Parkhurst. 1992. Performance characteristics of effluent toxicity tests: Variability and its implications for regulatory policy. Environ. Toxicol. Chem. 11:793-804. Ziegler and Connolly. 1995. (Citation pending from Lick) ------- NHEERL-COR-2012A TECHNICAL REPORT DATA (Please read instructions on the reverse before compt^i—1 1. REPORT NO. EP A/600/A-96/098 2. 3. 4. TITLE AND SUBTITLE Workgroup summary report on methodological uncertainty in conducting sediment ecological risk assessments with contaminated sediments 5. REPORT DATE 6. PERFORMING ORGANIZATION CODE 7. AUTHOR(S) C. Ingersoll1, R. Swartz2, et al. G. R. Biddinger, T. Dillon, C. G. Ingersoll, editors 8. PERFORMING ORGANIZATION REPORT NO. 9. PERFORMING ORGANIZATION NAME AND ADDRESS 'National Biological Service 2U.S. Environmental Protection Agency, NHEERL, Corvallis, OR 10. PROGRAM ELEMENT NO. 11. CONTRACT/GRANT NO. 12. SPONSORING AGENCY NAME AND ADDRESS US EPA ENVIRONMENTAL RESEARCH LABORATORY 200 SW 35th Street Corvallis, OR 97333 13. TYPE OF REPORT AND PERIOD COVERED Symposium paper 14. SPONSORING AGENCY CODE EPA/600/02 15. SUPPLEMENTARY NOTES 1996. Proceedings of the 22nd Pellston Workshop, Pacific Grove, CA, April 23-28, 1995 of the Society of Environmental Toxicology and Chemistry (SETAC) 16. ABSTRACT This chapter describes the range of issues related to the uncertainty associated with sediment ecological risk assessments (SERA), including an evaluation of: (1) uncertainty associated with the overall SERA framework; (2) the effects of false positive and false negative errors associated with sediment toxicity tests; (3) spatial and temporal distributions of sediment contamination; (4) sampling errors; and (5) uncertainties associated with transport and fate models. 17. KEY WORDS AND DOCUMENT ANALYSIS a. DESCRIPTORS b. IDENTIFIERS/OPEN ENDED TERMS c. COSAT1 Field/Group sediments, ecological risk assessment 18. DISTRIBUTION STATEMENT 19. SECURITY CLASS {This Report) 21. NO. OF PAGES 36 20. SECURITY CLASS [This page) 22. PRICE EPA Form 2220-1 (Rev. 4-77) PREVIOUS EDITION IS OBSOLETE ------- |