EPA/600/R-06/047 ERASC-013 May 2006 ESTIMATION OF BIOTA SEDIMENT ACCUMULATION FACTOR (BSAF) FROM PAIRED OBSERVATIONS OF CHEMICAL CONCENTRATIONS IN BIOTA AND SEDIMENT by Lawrence Burkhard U.S. Environmental Protection Agency Office of Research and Development National Health and Environmental Effects Research Laboratory Mid-Continent Ecology Division Duluth, Minnesota Ecological Risk Assessment Support Center Office of Research and Development U.S. Environmental Protection Agency Cincinnati, OH ------- NOTICE This report is an external draft for review purposes only and does not constitute Agency policy. Mention of trade names or commercial products does not constitute endorsement or recommendation for use. DRAFT: Do not cite or quote ii ------- TABLE OF CONTENTS AUTHORS, CONTRIBUTORS AND REVIEWERS iv ACKNOWLEDGMENTS iv INTRODUCTION 1 RECOMMENDATIONS 1 DEFINITION OF BSAF 4 MEASURING USEFUL Csoc-Ce PAIRS FOR CALCULATION OF BSAFs 6 CALCULATION OF BSAFs 8 BASIS FORBSAF REGRESSION APPROACH 10 THE REGRESSION APPROACH 14 RESPONSES TO QUESTIONS RAISED IN EXPECTED OUTCOMES 15 REFERENCES 25 APPENDIX: ECOLOGICAL RISK ASSESSMENT SUPPORT CENTER REQUEST FORM 29 DRAFT: Do not cite or quote iii ------- AUTHORS, CONTRIBUTORS AND REVIEWERS AUTHOR Lawrence Burkhard U.S. Environmental Protection Agency Office of Research and Development National Health and Environmental Effects Research Laboratory Mid-Continent Ecology Division Duluth, MN 55804 REVIEWERS Keith Sappington U.S. Environmental Protection Agency Office of Research and Development National Center for Environmental Assessment Washington, DC 20460 Steven Ferraro U.S. Environmental Protection Agency Office of Research and Development National Health and Environmental Effects Research Laboratory Western Ecology Division Newport, OR 97365 ACKNOWLEDGMENTS Programmatic review of the document was conducted by Dale Hoff of EPA Region 8, a Trichair of EPA's Ecological Risk Assessment Forum. Jeff Swartout of EPA's National Center for Environmental Assessment, ORD, kindly provided review of BSAF statistical approaches. DRAFT: Do not cite or quote iv ------- 1 INTRODUCTION 2 In March 2004, the Ecological Risk Assessment Forum (ERAF) submitted a request to 3 ORD's Ecological Risk Assessment Center (ERASC) relating to the estimation of Biota- 4 Sediment Accumulation Factors (BSAFs) (Appendix). BSAF is a parameter describing 5 bioaccumulation of sediment-associated organic compounds or metals into tissues of ecological 6 receptors. The Problem Statement in the request was "What is the most appropriate method to 7 estimate the BSAF from paired observations of concentrations in biota and sediment?" The 8 Expected Outcome asked for answers to specific questions regarding the use of regression 9 analysis for estimating BSAFs for nonionic organic compounds. The specific questions are 10 addressed in the latter portion of this document. A statement on the most appropriate method to 11 estimate the BSAF is provided below. This document is focused solely on the determination of 12 BSAFs for nonionic organic chemicals and is primarily applicable to fish and high level 13 shellfish, e.g., crabs. The determination of BSAFs for metals is not discussed. 14 RECOMMENDATIONS 15 There are two methods for determining the BSAF from paired observations: 1) a 16 regression approach, whereby the BSAF is estimated by determining the slope of the CSOC-C,, line 17 [Csoc is the concentration of chemical in the sediment on an organic carbon basis (ng/kg organic 18 carbon) and Ct is the concentration of chemical in the organism on a lipid basis (|ig/kg lipid)], 19 and 2) an averaging approach, whereby the BSAF is estimated by averaging the BSAFs from the 20 paired observations across the site. Both approaches use the same data. The second approach, 21 however, is generally the more appropriate method for estimating the BSAF because regression 22 analysis has these four limitations: DRAFT: Do not cite or quote 1 ------- 1 1) Regression analysis, whether model I (simple linear regression) or model II (geometric 2 mean regression, major axis regression, Bartlett's three-group method, or Kendall's 3 robust line-fit method (Sokal and Rohlf, 1995)), requires meeting parametric assumptions 4 about the relationship between the X and Y variables. 5 6 2) Regression analysis, in order to be useful, requires a range of values in the X and Y 7 variables. 8 9 3) When large ranges exist in the CSOC-C,, values (e.g., Csoc spans two orders of magnitude), 10 weighting of the data in the regression analysis and/or transformation of the data might 11 be required for proper analysis. 12 13 4) Although regression analysis can be done on data sets with limited numbers of CSOC-C<, 14 pairs, determining the slope of the line fitting limited numbers of pairs can lead to highly 15 uncertain slopes. 16 17 In contrast, the averaging approach (estimating the BSAF by averaging the BSAFs from 18 each CSOC-CC pair) requires none of these conditions or assumptions. Further, unlike the 19 regression approach, the averaging approach can be performed with limited data. 20 Both the regression and averaging approaches require similar conditions (e.g., food web 21 structure, sediment/water column concentration quotients, chemical bioavailability, and diets of 22 the organisms) for each CSOC-C,, pair. (This can be problematic for Superfund and other sites that 23 have highly heterogenous conditions.) Additionally, for both approaches, accuracy and precision 24 of the calculated BSAFs are a function of the sample size, i.e., the number of the Csoc-Ct pairs. 25 With the regression and averaging approaches, each CSOC-C,, pair is location specific and 26 each pair incorporates all of the conditions existing at the location. In order to use either 27 approach, the conditions must be the similar across all locations. Mixing of CSOi.-Ct paired 28 observations with different underlying conditions is not recommended and will, in all likelihood, 29 result in BSAFs with poor predictive accuracy. DRAFT: Do not cite or quote ------- 1 With the averaging approach, the distribution of the individual BSAFs (determined from 2 each CSOC-CC pair) can be evaluated very easily; this evaluation is commonly done in statistical 3 analysis of data. Knowing the underlying distribution of the BSAFs allows the selection of the 4 most appropriate (unbiased) averaging technique. Further, with the individual BSAFs (CSOC-C<, 5 pairs), the homoscedasticity (equality) of the variances across the individual BSAFs can be 6 assessed. In cases where the variances are heteroscedastic (unequal), an appropriate weighted 7 averaging technique would be used, and in general, the weights would be the reciprocal of the 8 variances for the individual BSAFs. The averaging approach can also be easily implemented 9 with other weighting considerations such as portions of the site represented by individual 10 BSAFs, e.g., some BSAFs might be reflective of three quarters of the site while the remaining 11 BSAFs are reflective of the other quarter of the site. The averaging approach also provides the 12 information on the final BSAF (grand mean) distribution and variance which are required for one 13 and two stage Monte Carlo uncertainty analyses. 14 There is great value in plotting the Cl against Csoc; BSAFs against Csoc; and Q, Csoc, and 15 BSAFs against geographical information. These plots should be done and evaluated for trends in 16 the data! They may provide key insights and understanding of the complexities existing at the 17 site of interest. The importance of resolving discrepancies within the data can not be overstated 18 (e.g., Why are some BSAFs so different? Are there trends or dependencies upon concentrations 19 of chemicals in sediment or with geographical location within the site? Why don't the CSOC-C{ 20 pairs form a linear relationship?) Spending time and resources resolving these discrepancies will 21 be well worth the effort since the uncertainties associated with remediation decisions will be DRAFT: Do not cite or quote ------- 1 smaller. Additionally, any discrepancies in the data at this level will be translated into higher 2 and more complex analyses since these analyses use this information. 3 The following sections provide a description of the BSAF along with its underlying 4 assumptions, a discussion on how to measure a useful BSAF, a discussion on the basis of the 5 regression approach, and answers to specific questions related to regression analysis. 6 DEFINITION OF BSAF 7 The BSAF is defined (Ankley et al., 1992) as 8 BSAF = ° * (1) C If s J soc 9 10 where C0 is the chemical concentration in the organism (|ig/kg wet weight),/ is the lipid fraction 11 of the organism (g lipid/g wet weight), Cs is the chemical concentration in surficial sediment 12 (ng/kg dry weight) and/soc is the fraction of the sediments as organic carbon (g organic carbon/g 13 dry weight). In general, BSAFs should be determined from spatially and temporally coordinated 14 fish and surficial sediment samples under conditions in which recent loadings of the chemicals to 15 ecosystem are relatively unchanged (Burkhard et al., 2003). The BSAF definition does not 16 invoke or include the assumption of equilibrium conditions for the chemical between the 17 organism and sediment (Ankley et al., 1992; Thomann et al., 1992). As shown by Thomann et 18 al. (1992), BSAFs are appropriate for describing bioaccumulation of sediment contaminants in 19 aquatic food webs with non-equilibrium conditions between both the sediment and fish, and 20 sediment and its overlying water. Equilibrium is regarded as a reference condition for describing 21 degrees of disequilibrium, and thus, is not a requirement for measurement, prediction, or 22 application of BSAFs. DRAFT: Do not cite or quote 4 ------- 1 With specific reference to benthic invertebrates, numerous investigators (Lake et al., 2 1984; McElroy and Means, 1988; Bierman, 1990; Lake et al., 1990; Ferraro et al., 1990) have 3 invoked two assumptions regarding BSAFs: 1) equilibrium conditions and 2) no metabolism of 4 the chemical. These assumptions when combined with EqP (equilibrium partitioning) theory 5 (DiToro et al., 1991), leads to the conclusion that the BSAF, for these specific conditions, is 6 equal to the partitioning relationship of the chemical between organic carbon in the sediment and 7 lipids of the organism. Depending upon the affinities of the nonpolar organic chemical for lipid 8 and sediment organic carbon, the BSAF, under these specific conditions, should be in the range 9 of 1 to 2 (McFarland and Clarke, 1986). For aquatic organisms tightly connected to the 10 sediments like oligochaetes and other benthic invertebrates, experimental measurements (Lake et 11 al., 1990; Tracy and Hansen, 1996) are generally consistent with the theoretical value, i.e., in the 12 range of 1 to 2. 13 There are solid mechanistic reasons why fish should not be in equilibrium with their 14 sediments (Thomann et al., 1992). For fish, BSAFs incorporate wide ranges of influences 15 including biomagnification due to the trophic level of the fish; sediment-water column chemical 16 disequilibrium; the diet of the fish and its underlying food web; the fish's home range, and 17 chemical metabolism within the fish and its food web (Burkhard et al., 2003). Suggestions that 18 BSAFs for fish should be in the range of 1 to 2 by combining the definition of the BSAF with the 19 assumptions of equilibrium conditions and no metabolism are incorrect (Wong et al., 2001). As 20 explained above, measured BSAFs above or below 1 to 2 are entirely reasonable for fish 21 (Burkhard et al., 2003). BSAFs outside this range for fish do not violate the general definition of DRAFT: Do not cite or quote ------- 1 BSAFs nor invalidate the usefulness of BSAFs in predicting chemical residues in fish for 2 sediment contaminants (Burkhard et al., 2004). 3 MEASURING USEFUL CSOC-C, PAIRS FOR CALCULATION OF BSAFs 4 Probably the most important factor in measuring a BSAF with predictive power is the 5 requirement that the sediment samples analyzed be reflective of the immediate home range of the 6 fish. Depending upon the site, the degree of difficulty in defining the immediate home range of 7 the organism can vary widely. In situations where the movement of the organisms is confined by 8 the geography of the site, e.g., dams or falls, the home range of the organisms can probably be 9 defined fairly easily. When required, home ranges can be determined by tagging/recapture, 10 radio-telemetry, and/or ultrasonic telemetry studies at the site of interest. Estimates of home 11 ranges for freshwater fishes can be determined using the allometric relationship (Minns, 1995): 12 lnH = -2.91 + 3.14HAB + 1.65 In L or In H = 3.33+2.98 HAB + 0.58 In W 13 where H is the home range size (m2), HAB is 0 for rivers and 1 for lakes, W is body weight (g), 14 and L is body length (mm). For freshwater invertebrates (crabs), marine and estuarine 15 ecosystems, allometric relationships for home range have not been reported. 16 Having a good understanding of the immediate home range of the species is important. 17 Organisms with smaller home ranges will, in all likelihood, be more representative of the study 18 site than those with large home ranges that extend way beyond the study site. Just because a fish 19 (or other aquatic organism) is caught at a sampling location, one can not infer that the chemical 20 residue in the fish is due to the chemicals residing at the study site. Knowledge of the fish's 21 home range is the only way that one can establish the connection of the fish to the sampling 22 location. It is strongly recommended that local fisheries experts be consulted during the DRAFT: Do not cite or quote 6 ------- 1 sampling design phase of the field study to help in determining the immediate home range and 2 trophic level of the organisms at the site; local knowledge will be extremely helpful. Although 3 the above allometric relationship is available for estimating home ranges, one shouldn't 4 necessarily assume that the "calculated" and "actual" immediate home ranges for the organisms 5 are the same; one will still need to do the leg work of establishing as best as one can the 6 immediate home ranges for the organisms at the site. 7 Once the home range of the species of interest is established, sediment samples reflective 8 of the species home range need to be collected. It is important that the sediment samples 9 collected be representative of the sediments to which the organisms are exposed and not a 10 homogenized sediment core representing the entire bed of contaminated sediment. For most 11 organisms, the surficial sediments are most reflective of the organism's immediate exposure 12 history, and generally, smaller depths of the surficial layer, e.g., 0 to 2 cm, are preferred over 13 larger depths, e.g., 0 to 30 cm. For deeper burrowing organisms such as some clams and 14 polychaetes, slightly larger surficial depths, e.g., 0 to 5 cm, might be more appropriate of their 15 recent exposure hi story. 16 Beyond establishing the home range of the organism and the appropriate sediment 17 samples, the collection and analysis of adequate numbers of organisms and sediment samples is 18 required for deriving unbiased estimates of the mean concentrations of chemicals with known 19 variances. This document will not address the subject of sample collection, compositing, and 20 analysis. With unbiased estimates of the mean concentrations, the BSAF for the specific site can 21 be calculated using Equation 1. DRAFT: Do not cite or quote ------- 1 In any study design, it is important that biota samples be collected and composited in size 2 or age classes. For fish, dietary composition changes substantially with size and age, and these 3 changes will result in differences in BSAFs among size and age classes. For forage fish, 4 common classes are young-of-the-year, juveniles, and adults, and for piscivorous fishes, 5 common classes are year classes, e.g., 2, 3, 6, and 10 years old. Mixing of fishes of different 6 size/age classes is not recommended because of the increased variance for the average chemical 7 residue in the organisms. 8 Biota samples for chemical analysis should never be composited by mixing different fish 9 species. Different fish species have different life histories and diets. BSAFs derived from 10 composite samples composed of different species will be highly biased by the individual species. 11 Further, resolving what the potential biases are for an individual species would require the 12 collection and analysis of that species. 13 When a CSOC-C} pair (or BSAF) is measured for a specific chemical, the measured value 14 incorporates all conditions and parameters existing at the location of interest. The major 15 conditions and parameters incorporated into the CSOC-C<, pair (or BSAF) are 1) the distribution of 16 the chemical between the sediment and water column, 2) the relationship of the food web to 17 water and sediment, and 3) the length of the food web (or trophic level of the organism). 18 CALCULATION OF BSAFs 19 The BSAF is calculated from four measured variables (see equation 1, repeated below): 20 concentration of the chemical in the organism on a wet weight basis (C0), the lipid content of the 21 wet tissue (/,), the concentration of the chemical in the sediment on a dry weight basis (Cs), and 22 the organic carbon content of the dry sediment (/"soc). DRAFT: Do not cite or quote 8 ------- 1 c if, 2 BSAF ~- —^- (1) C // l ' s J soc 3 4 A CSOC-C(, pair will, in many cases, be composed of multiple composite tissue samples and 5 multiple sediment samples (spanning the immediate home range of the organisms) for a 6 sampling location. In order to determine the BSAF for the CSOC-C{ pair, average concentrations in 7 the tissue and sediment need to be determined; the numerator and denominator of Equation 1. 8 The lipid normalized concentration of the chemical in each tissue sample should be determined 9 and then, these values should be averaged to determine the average chemical concentration for 10 the organisms. If the tissue samples have different numbers of organisms in each composite, 11 e.g., three fishes in one sample and five fishes in the second sample, a weight average 12 concentration should be determined. For normally distributed residues and the two sample fish 13 example, the weighted average concentration equals: 14 15 C,_mg = E(wz.xCc_,)/IX = (3xC{_owe+5xC^0)/(3+5) (2) 16 17 where w{ is the number of organisms in composite /', Ct.{ is the lipid normalized concentration of 18 the chemical in composite /', and Q.avg is the weighted average lipid normalized concentration in 19 the tissues. The standard deviation of a weighted average (sa^) equals 20 21 V.g = (^x(_-_)/(Zwl) (3) 22 DRAFT: Do not cite or quote ------- 1 For log-normally distributed residues in the fish, the weighting would be done on the log 2 transformed data. Sediment samples would be treated similarly; normalizing for organic carbon 3 and then, calculating the average concentration of the chemical in the sediments. 4 The BSAF for the CSOC-C<, pair would then be determined by dividing C<,_avg by Csoc.avg. 5 The variance for the BSAF can be estimated using the equation (Mood et al., 1974): 6 )2 +BSAF\sc )2 - 2rsc sr BSAF l-w C C C 7 SOC- OVg 8 9 where SBSAF, %oc.avg, and sc,_m% are the standard deviations for the BSAF, Csoc.avg, and Q.avg, 10 respectively; and r is the correlation coefficient between Csoc.avg and C<,_avg. 1 1 For each CSOC-C} pair, a BSAF is determined. As discussed previously, the average BSAF 12 would subsequently be determined from the individual BSAFs using the most appropriate 13 (unbiased) averaging technique based upon the underlying distribution of the BSAFs. 14 BASIS FOR BSAF REGRESSION APPROACH 15 Equation 1 can be rearranged: 16 CJf, = BSAF x Cs/fsoc (5) 17 By substitution, equation 5 can be expressed as: 18 Cf = BSAF x Csoc (6) 19 where Csoc is the concentration of chemical in the sediment on an organic carbon basis (|ig/kg 20 organic carbon) and Ct is the concentration of chemical in the organism on a lipid basis (|ig/kg 21 lipid). DRAFT: Do not cite or quote 10 ------- 1 Plotting of Csoc against Ct results in the following illustrative plot (Graph A), where the 2 slope of the line is the BSAF. However, the slope of Cs plotted against C0 (Graph B) is not the 3 BSAF because these two measures of chemical concentrations are not organic carbon and lipid 4 normalized. Use of the regression approach to derive the BSAF incorporates an implicit 5 assumption above and beyond those required for measuring a BSAF at a specific location. The 6 implicit assumption of the regression approach is that all Csoc-Cj pairs must have or incorporate 7 the same underlying ecological conditions and parameters. Q. 'T O) slope = Ay / Ax = BSAF csoc (ug/kg-organic carbon) slope = by I Ax £ BSAF Cs (ug/kg-dry weight) 8 For a Superfund site, it is common to collect samples across the site with a number of 9 different sampling locations. For example, consider a New England stream with a series of three 10 dams, and assume that two-year-old carp and sediment are collected and analyzed in each 11 reservoir. Further assume that enough fish and sediment were collected so that representative 12 and unbiased mean concentrations were determined for each reservoir. Thus, three sets of paired 13 carp-sediment observations would be determined, one for each of the three reservoirs. DRAFT: Do not cite or quote 11 ------- 1 These paired observations of Csoc and C, can be plotted (Graphs C & D). In Graph C, the 2 pairs form a nearly linear relationship suggesting that the underlying conditions for the CSOC-C,, 3 pairs are consistent across the samples and thus allow estimation of the BSAF using the 4 regression approach. In Graph D, the pairs form no easily defined linear relationship, and in this 5 case, there is too little variability in the CSOC-C,, pairs for the regression approach to be useful in 6 estimating the BSAF. In Graph E, a situation where four sets of paired carp-sediment data were 7 determined, three of the pairs form a nearly linear relationship, but one pair is different from the 8 other pairs. Depending upon how one draws the line, either the triangle or square data in Graph 9 E could be the different (or outlier) CSOC-CC pair. In this case, one or more of the CSOC-C,, pairs 10 have different underlying conditions, and thus, it would be inappropriate to estimate the BSAF 11 using the regression approach. 12 As discussed above, each carp-sediment pair is location specific and each pair 13 incorporates all of the major conditions and parameters existing at the location. In order to use 14 the regression approach with pairs of CSOC-C,, observations, the major conditions and parameters 15 must be the same for all locations. This requirement is the implicit assumption incorporated into 16 the regression approach. Mixing of Csoc-Ct paired observations with different conditions and 17 parameters will result in CSOC-C,, plots where the CSOC-C,, pairs will form a non-linear relationship 18 (e.g., possibly Graph E), and in all likelihood, a BSAF with poor predictive power.1 19 For the above examples, if the BSAF for each pair of CSOC-C{ observations are plotted 20 against Csoc, the following graphs are obtained (graphs CC, DD, and EE). The relationships 21 among the CSOC-C,, pairs in the above graphs remain in the graphs based upon the BSAFs; lrrhe mixing of CSOC-C, paired observations with different conditions and parameters is not recommended for the averaging approach as well. BSAFs with poor predictive power (i.e., accuracy) will, in all likelihood, result when different conditions and parameters exist across the individual QOC-C, pairs used in the analysis. DRAFT: Do not cite or quote 12 ------- 1 compare Graphs C to CC, D to DD, and E to EE. In essence, by calculating the BSAF, one has 2 mathematically removed the concentration dependence shown in Graphs C, D, and E. For 3 further comparison purposes, the BSAF for each pair of Csoc-Ct observations are also plotted 4 against Q (graphs CCC, ODD, EEE). 5 The graphs, i.e., C, D, E, CC, DD, EE, CCC, ODD, and EEE, are some of the plots 6 recommended for evaluating trends and underlying conditions associated with the CSOC-C,, pairs. 7 We recommend that these plots be completed prior to performing the final calculations for 8 determining the site-specific BSAF. These plots will help in identifying sources of variation and 9 error in the individual CSOC-C} pairs and BSAF values. S1 c u • •o Q. D) i1 eT D • •o Q. 3 i1 o" E ^ • CC DD EE CCC 1 DDD •» m EEE A C( (ug/kg-lipid) C( (ug/kg-lipid) C( (ug/kg-lipid) DRAFT: Do not cite or quote 13 ------- 1 THE REGRESSION APPROACH 2 A key consideration in using the regression approach is to realize that both Csoc and Ct are 3 measured with error. With the simple linear regression least-squares technique, one variable (the 4 Fs) are measured with error while the other variable (the Jf s) are fixed and have no error. 5 Simple linear regression is referred to as model I regression analysis. When Jf s and Fs are both 6 measured with error, one of a number of model II regression techniques will be more appropriate 7 and unfortunately "the appropriate method depends on the nature of the data" (Sokal and Rohlf, 8 1995). Sokal and Rohlf (1995) provide an excellent discussion on model II regression and the 9 techniques of geometric mean regression (also called reduced major axis, standard major axis, or 10 relation d'allometrie), slope of the major axis, Bartlett's three-group method, and Kendall's 11 robust line-fit method. Additionally, Sokal and Rohlf (1995) discuss the Berkson case of model 12 II regression where model I regression is appropriate. 13 It is suggested that the determination of the slope of Csoc-Cj pairs be performed using the 14 geometric mean regression technique (Halfon, 1985; Sokal and Rohlf, 1995) because with this 15 technique the slope of the regression is not dependent upon the scale of the Jfs and Fs used in 16 the analysis. Additionally, Ricker (1973) has recommended that the geometric mean regression 17 technique be used for determining functional relationships (i.e., slope) when "the variability is 18 mostly natural... in X and 7"; the case, I believe, when sediment samples representative of the 19 organism's actual exposure history are collected. 20 For the geometric mean regression technique, the slope of geometric mean regression line 21 is the geometric mean of the slopes of the following two linear regression least-squares lines: 22 y = a + b"x (7) DRAFT: Do not cite or quote 14 ------- 1 and 2 x = c + dy (8) 3 The slope of the geometric mean regression line is computed as the geometric mean of b" and 4 lid: 5 b = (b" I d)m (9) 6 The intercept a is computed as done in linear regression: 7 a = Y - bX (10) 8 For further details on the geometric mean regression technique, the reader is referred to Halfon 9 (1985) and Sokal and Rohlf (1995). 10 An Excel add-in function for geometric mean regression can be downloaded from the 11 following URL. 12 http ://www.uottawa. ca/academic/arts/geographie/lpcweb/newlook/data_and_downloads/ 13 download/sawsoft/modelii/modelii.htm 14 RESPONSES TO QUESTIONS RAISED IN EXPECTED OUTCOMES 15 Do I fit a straight line through the data? 16 Yes. If the CSOC-C{ observations don't form a straight line, then one must figure out why 17 data diverge from the linear relationship. Reasons for the CSOC-C} observations diverging 18 from a straight line include (Note, there are many more causes than those listed): DRAFT: Do not cite or quote 15 ------- 1 • The organisms in different CSOC-CC pairs reside at different trophic levels in the 2 food web. 3 • The organisms in different CSOC-C, pairs have dramatically different diets even 4 though they reside at the same trophic level. For example, for one pair, the 5 organisms might consume primarily zooplankton while for other pairs, the 6 organisms might consume primarily benthic invertebrates. 7 • The bioavailability of the chemical in the contaminated sediment varies 8 substantially across the CSOC-C,, pairs. 9 • Across the sampling locations, inputs of the chemicals to the site differ 10 substantially. For example, consider a harbor where organisms residing in the 11 lower parts of the harbor are exposed to runoff and ground water seepage from 12 an old industrial site while organisms residing in the upper parts of the harbor 13 are not exposed this to discharge. 14 • Different populations of the same species. For example, in the Hudson River, 15 there are resident and migratory striped bass fish populations, and chemical 16 residues in the populations differ widely. 17 18 Do I plot my data on a log-log scale? 19 It is recommended that the data be plot in arithmetic-arithmetic scales because in 20 arithmetic-arithmetic space, the slope of the line is the BSAF when CSOC-C, pairs are used. 21 In general, the data, i.e., the Csoc-Ct pairs, are assumed to be scaled arithmetically, and 22 thus, should be plotted on arithmetic-arithmetic scales. 23 As a note of clarification, in log-log scales, the slope of the regression line (log C, 24 regressed against log Csoc) is not the BSAF. See Equation 12, derived from the 25 rearrangements of Equation 6 and then, Equation 11. 26 27 log C, = log [ Csoc x BSAF ] (11) DRAFT: Do not cite or quote 16 ------- 1 2 log C{ = slope x log Csoc + log BSAF (12) 3 4 Do I force the line through the origin? 5 Yes, when doing regression with arithmetic-arithmetic scales. (If one is performing the 6 regression with log-log scales, the origin does not exist because the logarithm of zero is 7 undefined. Thus, the line can not be forced through the origin.) 8 How do I handle non-detects? 9 I'm not sure of your definition of non-detects. I'll provide answers for both definitions: 10 chemicals present at concentrations below the minimum detection limit (MDL) of the 11 method and chemicals not detected at all, i.e., no response above instrumental noise. For 12 the case where the chemical is present at concentrations below the MDL, use the 13 uncensored value in the calculation; don't use the MDL value. For the case where the 14 chemical is not detected at all, Superfund typically uses l/2 of the MDL. However, as 15 discussed below, there are approaches for working with data below the MDL and when 16 the chemical is not detected at all. Calculation of BSAFs using arbitrarily 1A of the MDL 17 for concentrations in sediment and/or biota can result in spurious and non-predictive 18 BSAFs. In each case (chemical present below the MDL and chemical not detected at all), 19 the resulting values must be flagged and different flags should be used for each case. 20 When plotting of the different CSOi.-Ct pairs is done, different symbols/colors should be 21 used for the above two flagged data types. Examine this plot to see if the flagged data 22 aligns with the general trend of the CSOC-C} pairs that are not flagged. Chemicals not DRAFT: Do not cite or quote 17 ------- 1 detected at all and chemicals with concentrations below the MDL should each be treated 2 separately. One probably has greater confidence in the uncensored flagged data (below 3 the MDL) than the chemicals not detected at all. This comparison/evaluation should be 4 performed by doing the regression analysis without the flagged data, with the less-than- 5 the-MDL flagged data included, and with the flagged data alone. Significance testing of 6 the slopes (asking whether the slopes are different) should be done and these 7 comparisons should help in determining whether to include or exclude the flagged data in 8 the final regression. Examination of the residual plots should be done and will help 9 greatly in determining whether to include or exclude chemicals present at concentrations 10 less than MDL and/or chemicals not detected at all. 11 In general, for chemicals not detected at all (i.e., 1A of the MDL is used), they should be 12 excluded from the analysis since these values are highly uncertain relative to the other 13 Csoe-Ct pairs. Additionally, the flagged data would, in high likelihood, be from sampling 14 locations where less contamination existed and not the site of planned active remediation. 15 The above discussion was centered on non-detects and their use in the regression 16 analysis. There are statistical approaches for averaging with censored data, i.e., non- 17 detects (El-Shaarawi and Dolan, 1989; Newman etal., 1989; Newman, 1995). These 18 approaches can be used with normally and log-normally distributed data. It is 19 recommended that unbiased means be calculated only if less than 20% of the reported 20 values are reported as being non-detect (Berthouex and Brown, 1994). DRAFT: Do not cite or quote 18 ------- 1 How do I estimate the confidence interval around a prediction? 2 The standard error of the geometric mean regression slope can be approximated by the 3 standard error of the linear least-squares regression slope (Sokal and Rohlf, 1995). Most 4 linear least-squares regression programs (SAS) or spreadsheets (Louts 123 and Excel) 5 calculate the standard error of the slope. 6 The 95% confidence limits on the slope would be calculated using student-t value: 7 8 Upper 95% CI = b + sb x /0>05[ll_2] (13) 9 10 Lower 95% CI = b - sb x tQQ5[n_2] (14) 11 12 where b is the geometric mean regression slope, sb is the standard error of the geometric 13 mean regression slope, n is the total number of data points used in the geometric mean 14 regression, and ^005 is the two tailed Student-t for an a = 0.05%. 15 When calculating the geometric mean of the ratios of the CSOC-C<, pairs (i.e., BSAFs), the 16 averaging process in log space provides the mean and standard deviation. The 95% 17 confidence limits would be calculated in log space using the mean and standard 18 deviation, and then, the CIs would be transformed into arithmetic space. In arithmetic 19 space, the 95% CI will be asymmetric. 20 Do I normalize by organic carbon and lipid? 21 Yes. The BSAF is the ratio of the concentration in the biota on a lipid basis to the 22 concentration in the sediment on an organic carbon basis. DRAFT: Do not cite or quote 19 ------- 1 By working with CSOC-C,, pairs (which are organic carbon and lipid normalized), one 2 places these concentrations on a thermodynamic basis. By expressing the concentrations 3 on a thermodynamic basis, the concentrations of the chemicals in sediment and tissue are 4 corrected for differences in bioavailability and partitioning behavior. By using the 5 thermodynamic based expressions, the CSOC-C{ pairs are expressed equivalently. 6 Do I use weighted regression? 7 There are two general cases. First, when the Csoc and Ct are individual observations (not 8 averages), then individual CSOC-C,, pairs should be given equal weights. Second, if the Csoc 9 and Cl are averages, then individual CSOC-C} pairs should be given equal weights except if 10 the Csoc and Ct variances are highly heterogeneous (p<0.001). If the variances are highly 11 heterogeneous (very dissimilar), then perform both weighted (by the inverse of the 12 variance) and unweighted regression and compare slopes. The heterogeneous variances 13 might or might not have any appreciable effect on the slope. If appreciable effects exist 14 on the slope, then the weighted regression model is preferred. 15 If I transform the data, do I need to use weighted regression? 16 See answer to previous question. The variances would need to be evaluated in log space 17 for heterogeneity. 18 How do I take into account the home range of the biota whose tissue I measured? 19 As explained in the background, one must have knowledge of the organism's home 20 range. With this information, sediment samples across the home range must be collected 21 and analyzed, and the sediment samples must be representative of the organism's 22 immediate life history. Accounting for the home range of the organism is done by DRAFT: Do not cite or quote 20 ------- 1 averaging the analytical results for sediment samples collected within the organism's 2 home range. 3 What if my r2 is low and my data do not plot with the appearance of an increasing linear 4 function? 5 When this type of behavior is observed in the plot of CSOC-C<, pairs, this is an extremely 6 strong suggestion that different sampling locations have the different underlying 7 conditions and parameters; e.g., different food webs, different organism populations, 8 differences in chemical bioavailability, different diets, etc.; or a very limited dynamic 9 range. In these cases, one will need to determine the factors causing these differences. If 10 one can not resolve these difference, the same problems will also exist with other 11 methods for predicting chemical residues, e.g., food web models, because these methods 12 require this knowledge as well. In general, when this type of behavior is observed, the 13 problem is in the data itself, and no statistical analysis method will circumvent the 14 problem. Without resolving these differences, their effects will be reflected or 15 incorporated into all calculations with the data. 16 How do I deal with outliers? 17 There are a number of different types of outliers. First, if the chemical was not detected 18 at all, and /^ of the MDL was used, one could easily set these values aside without much 19 criticism, in essence, making the argument that one has low confidence in the values. 20 Second, if the chemical was flagged as being below the MDL and the uncensored value is 21 reported, treating these values as outliers and setting them aside would be much harder. 22 You would have to determine what level of confidence you place on values below the DRAFT: Do not cite or quote 21 ------- 1 MDL. In general, uncensored data below the MDL is included in the analysis unless 2 there is an overwhelming reason to excluded the data, e.g., some type of methodological 3 bias in the analytical technique. Third, the CSOi.-Ct pair is very different from the general 4 population of CSOi.-Ct pairs. In this situation, always make sure the data are not 5 miscalculated, transposed, or misidentified, and ensure that no other type of 6 methodological error is associated with the data. If the data pair appears to be correct, 7 statistical techniques are available for the testing of outliers. 8 Snedocor and Cochran (1980, p 167-168) present a statistical method for linear 9 regression where the regression is performed without the outlier, and then the outlier is 10 tested as to whether it is within sampling error of the population. The test criterion is a t- 11 value. Because the outlier is not chosen randomly, to ensure a 1- a confidence, the 12 calculated t-value is compared to the t-value from the t-table using a'; where a' equals a 13 divided by n. Probably values for testing for outliers should be generally conservative, 14 e.g., a = 5% or a = 1%. With an n of 20, the critical t-value for an a of 5% would be 15 found using an a' of 0.25% with the t-table. 16 SAS software, software for statistical analysis, provides outlier detection and testing 17 algorithms within its regression model program. 18 Do I develop a separate regression for each compound in a mixture? 19 Yes. This is most desirable because individual chemicals have different chemical 20 properties. The differing behavior is most often observed with PCBs where fish appear DRAFT: Do not cite or quote 22 ------- 1 to be slightly enriched with the higher chlorinated PCB congeners relative to the 2 distribution existing in the sediments. 3 When the value of x (i.e., exposure point concentration in sediment) is uncertain (e.g.,when 4 biota migrate), how do I account for this in my regression? 5 The best method of accounting for organism migration is to design your sampling plan 6 for the organism such that the organisms are collected just before they migrate back out 7 of the site. This approach maximizes time the organism spends at the site of interest, and 8 provides the best estimate of the residue in the organism based upon the organism's 9 exposure in its immediate home range at the site. 10 Sampling design simulations (Burkhard, 2003) for the measurement of BSAFs (or CSOC-C,, 11 paired observations for determinations of BSAFs) suggest that spatial variability in the 12 concentrations of the chemical does not add large uncertainties into the measured BSAF 13 beyond those caused by temporal variability of the chemical concentrations in the water. 14 Further, random walk migration simulations suggested that BSAFs (or CSOC-Q paired 15 observations for determinations of BSAFs) can be measured with low uncertainty even 16 when extreme spatial concentrations exist at the field site, provided the measurements are 17 performed in more contaminated locations of the site for higher Kow chemicals, i.e., >105 18 (Burkhard, 2003). The requirement of performing the field measurements at the more 19 contaminated locations within the site will limit the regression approach because the 20 range of CSOC-C<, pairs will be small (see second paragraph of the Recommendations). DRAFT: Do not cite or quote 23 ------- 1 If the organisms spend a very short time at the site, e.g., the fish migrate through the site 2 in a few days to a week, determination of BSAFs is not recommended even though the 3 BSAF can be measured. The sediments from the site would not be reflective of the fish's 4 recent exposure history. 5 Are there ways to improve my study design knowing what I know now about regression? 6 First, the importance of collecting sediment samples that are reflective of the organism's 7 immediate home range can not be overstated. Spending time and resources to better 8 define the relationship of the organisms to the sediments will greatly decrease the 9 uncertainty associated with the resulting BSAFs. In addition, predictions using food web 10 models, both steady-state or dynamic, will greatly improve because of the improved 11 knowledge on the underlying relationship between the sediment and organism. 12 Second, it is important that composite samples reflective of the biota at the site of interest 13 be collected. Clearly, collection and analysis of more organisms will provide a better 14 measure of the average residue in the biota. However, biota samples consisting of mixed 15 age classes is not recommended, e.g., juvenile and adult minnows, or one-year-old and 16 three-year-old largemouth bass. Minimizing the differences in age (or size) will improve 17 the quality of the biota samples and ultimately provide smaller variances for the biota 18 residues. Typically, fishes of given size (e.g., smallest fish >75% of the largest fish) or 19 age group (e.g., 3-year-olds) are collected. DRAFT: Do not cite or quote 24 ------- 1 After sample collection and analysis, plans should be made to visually examine the data 2 by making plots of CSOC-C} paired observations and plots of BSAFs against Csoc. The 3 Csoi.s, Qs, and BSAFs should be plotted on a GIS type plot to determine if the values are 4 correlated with geographical trends and conditions, e.g., the BSAFs increase with 5 increasing distance away from the source on a river. Any additional information or 6 understanding one can glean for the site will be advantageous in the remediation decision 7 process. 8 As part of the overall study plan for successfully measuring a BSAF, time and resources 9 should be allocated for resolving causes of non-linearity (when they exist) in the CSOC-C,, 10 paired observations. Resolving why will greatly aid in understanding the complexities of 11 the site, and provide decision makers and risk assessors a much better basis for assessing 12 and evaluating remediation options. 13 Deriving a BSAF using regression analysis or by calculating the average of the individual 14 BSAFs uses the same data. Hence, it is suggested that BSAFs be derived using both 15 approaches. The added effort for the second analysis should be relatively small since 16 much of the effort, in performing the data analyses, is organizing the data into a usable 17 form for the calculations. 18 REFERENCES 19 There are many standard college level textbooks on statistical analysis which include 20 regression analysis. Almost all include discussion and examples on the linear least-squares DRAFT: Do not cite or quote 25 ------- 1 regression technique. Coverage of geometric mean regression analysis technique is often not 2 addressed in standard college level textbooks. Halfon (1985) is an excellent reference on 3 geometric mean regression. Sokal and Rohlf (1995) address the subject of model II regression 4 including geometric mean regression. 5 Ankley, G.T., P.M. Cook, A.R. Carlson et al. 1992. Bioaccumulation of PCBs from sediments 6 by oligochaetes and fishes: Comparison of laboratory and field studies. Can. J. Fish. Aquat. Sci. 7 49:2080-2085. 8 Berthouex, P.M. and L.C. Brown. 1994. Statistics for Environmental Engineers. Lewis 9 Publishers/CRC Press, Boca Ration, FL. 10 Bierman, V.J., Jr. 1990. Equilibrium partitioning and biomagnification of organic chemicals in 11 benthic animals. Environ. Sci. Technol. 24:1407-1412. 12 Burkhard, L.P. 2003. Factors influencing the design of bioaccumulative factor and biota- 13 sediment accumulation factor field studies. Environ. Toxicol. Chem. 22(2):351-360. 14 Burkhard, L.P., P.M. Cook and D.R. Mount. 2003. The relationship of bioaccumulative 15 chemicals in water and sediment to residues in fish: A visualization approach. Environ. Toxicol. 16 Chem. 22(11):2822-2830. 17 Burkhard, L.P., P.M. Cook and M.T. Lukasewycz. 2004. Biota-sediment accumulation factors 18 for poly chlorinated biphenyls, dibenzo-p-dioxins, and dibenzofurans in southern Lake Michigan 19 lake trout (Salvelinus namaycush). Environ. Sci. Technol. 38(20):5297-5305. 20 DiToro, D.M., C.S. Zarba, DJ. Hansen et al. 1991. Technical basis for establishing sediment 21 quality criteria for nonionic organic chemicals using equilibrium partitioning. Environ. Toxicol. 22 Chem. 10:1541-1583. 23 El-Shaarawi, A.H. and D.M. Dolan. 1989. Maximum likelihood estimation of water quality 24 concentrations from censored data. Can. J Fish Aquat. Sci. 46(6): 1033-1039 25 Ferraro, S.P., H. Lee Jr., R.J. Ozretich and D.T. Specht. 1990. Predicting bioaccumulation 26 potential: a test of a fugacity-based model. Arch. Environ. Contam. Toxicol. 19(3):386-394. 27 Halfon, E. 1985. Regression method in ecotoxicology: A better formulation using the geometric 28 mean functional regression. Environ. Sci. Technol. 19:747-749 DRAFT: Do not cite or quote 26 ------- 1 Lake, J.L., N. Rubinstein and S. Pavignano. 1984. Predicting bioaccumulation: Development of 2 a simple partitioning model for use as a screening tool in regulating ocean disposal of wastes. 3 In: Fate and Effects of Sediment-Bound Chemicals in Aquatic Systems, K.L. Dickson, A.W. 4 Maki and W.A. Brungs, Ed. Pergamon Press, New York, NY. p. 151-166. 5 Lake, J.L., N. Rubinstein, H. Lee II, C. A. Lake, J. Heltshe and S. Pavignano. 1990. Equilibrium 6 partitioning and bioaccumulation of sediment-associated contaminants by infaunal organisms. 7 Environ. Toxicol. Chem. 9:1095-1106. 8 McElroy, A.E. and J.C. Means. 1988. Factors affecting the bioavailability of 9 hexachlorobiphenyls to benthic organisms. In: Aquatic Toxicology and Hazard Assessment, 10 Vol. 10, WJ. Adams, G.A. Chapman and W.G. Landis, Ed. American Society for Testing and 11 Materials, Philadelphia, PA. p. 149-158. 12 McFarland, V.A. and J.U. Clarke. 1986. Testing bioavailability of polychlorinated biphenyls 13 from sediments using a two-level approach. In: Proceedings of the US Army Engineers 14 Committee on Water Quality, 6th Seminar, R.G. Wiley, Ed. Hydraulic Engineering Research 15 Center, Davis, CA. p. 220-229. 16 Minns, C.K. 1995. Allometry of home range size in lake and river fishes. Can. J. Fish. Aquat. 17 Sci. 52:1499-1508. 18 Mood, A.M., F.A. Graybill and D.C. Boes. 1974. Introduction to the Theory of Statistics, 3rd ed. 19 McGraw-Hill, New York, NY. 20 Newman, M.C. 1995. Quantitative Methods in Aquatic Ecotoxicology. Lewis/CRC Press, 21 Boca Raton, FL. 22 Newman, M.C., P.M. Dixon, B.B Looney and I.E. Finder III. 1989. Estimating mean and 23 variance for environmental samples with below detection limit observations. Water Res. Bull. 24 25(4):905-916. 25 Ricker, W.E. 1973. Linear regression in fishery research. J. Fish. Res. Board Can. 30:409-434. 26 Snedecor, G.W. and W.G. Cochran. 1980. Statistical Methods, 7th ed. Iowa State University 27 Press, Ames, IA. p. 167-168. 28 Sokal, R.R. and FJ. Rohlf. 1995. Biometry: The Principles and Practice of Statistics in 29 Biological Research, 3rd ed. W.H. Freeman and Co., New York, NY. 30 Thomann, R.V., J.P. Connolly and T.F. Parkerton. 1992. An equilibrium model of organic 31 chemical accumulation in aquatic food webs with sediment interaction. Environ. Toxicol. Chem. 32 11:615-629. DRAFT: Do not cite or quote 27 ------- 1 Tracy, G.A. and DJ. Hansen. 1996. Use of biota-sediment accumulation factors to assess 2 similarity of nonionic organic chemical exposure to benthically-coupled organisms of differing 3 trophic mode. Arch. Environ. Contam. Toxicol. 30(4):476- 475. 4 Wong, C,S., P.D. Capel and L.H. Nowell. 2001. National-scale, field based evaluation of the 5 biota-sediment accumulation factor model. Environ. Sci. Technol. 35(9): 1709-1715. DRAFT: Do not cite or quote 28 ------- APPENDIX ECOLOGICAL RISK ASSESSMENT SUPPORT CENTER REQUEST FORM Problem Statement: What is the most appropriate method to estimate the Biota Sediment Accumulation Factor (BSAF) from paired observations of concentrations in biota and sediment? Requestors: Sharon Thorns and Al Hanke, Region 4 Background: BSAF is a parameter describing bioaccumulation of sediment-associated organic compounds or metals into tissues of ecological receptors. In a typical experiment to measure bioaccumulation the researcher collects colocated sediments and tissues over a gradient of contamination. Simple compared to bioaccumulation and trophic transfer models, it finds its use at Superfund sites to estimate progress toward achieving a protective tissue concentration as sediments become cleaner. Expected Outcome: The expected outcome is a white paper addressing the following questions regarding the use of regression to obtain the most accurate estimate of BSAF: Do I fit a straight line? Do I plot my data on a log-log scale? Do I force the line through the origin? How do I handle non-detects? How do I estimate the confidence interval around a prediction? Do I normalize by organic carbon and lipid? Do I use weighted regression? If I transform the data, do I need to use weighted regression? How do I take into account the home range of the biota whose tissue I measured? What if my r2 is low and my data do not plot with the appearance of an increasing linear function? How do I deal with outliers? Do I develop a separate regression for each compound in a mixture? When the value of x (i.e., exposure point concentration in sediment) is uncertain (e.g., when biota migrate), how do I account for this in my regression? Are there ways to improve my study design knowing what I know now about regression? Where the topics are covered by standard books or web sites on statistics, they may be referenced. A few case studies may be useful to illustrate the concepts. Additional Comments: Requestor can provide case studies. DRAFT: Do not cite or quote 29 ------- |