EPA/600/R-06/047
                                                                ERASC-013
                                                                  May 2006
ESTIMATION OF BIOTA SEDIMENT ACCUMULATION FACTOR
     (BSAF) FROM PAIRED OBSERVATIONS OF CHEMICAL
         CONCENTRATIONS IN BIOTA AND SEDIMENT
                                  by

                           Lawrence Burkhard
                   U.S. Environmental Protection Agency
                    Office of Research and Development
         National Health and Environmental Effects Research Laboratory
                      Mid-Continent Ecology Division
                           Duluth, Minnesota
                 Ecological Risk Assessment Support Center
                    Office of Research and Development
                   U.S. Environmental Protection Agency
                             Cincinnati, OH

-------
                                      NOTICE
       This report is an external draft for review purposes only and does not constitute Agency
policy. Mention of trade names or commercial products does not constitute endorsement or
recommendation for use.
DRAFT: Do not cite or quote                ii

-------
                        TABLE OF CONTENTS


AUTHORS, CONTRIBUTORS AND REVIEWERS	iv

ACKNOWLEDGMENTS	iv

INTRODUCTION	1

RECOMMENDATIONS	1

DEFINITION OF BSAF 	4

MEASURING USEFUL Csoc-Ce PAIRS FOR CALCULATION OF BSAFs	6

CALCULATION OF BSAFs 	8

BASIS FORBSAF REGRESSION APPROACH	10

THE REGRESSION APPROACH 	14

RESPONSES TO QUESTIONS RAISED IN EXPECTED OUTCOMES	15

REFERENCES  	25

APPENDIX: ECOLOGICAL RISK ASSESSMENT SUPPORT CENTER
REQUEST FORM	29
DRAFT: Do not cite or quote            iii

-------
                  AUTHORS, CONTRIBUTORS AND REVIEWERS
AUTHOR
Lawrence Burkhard
U.S. Environmental Protection Agency
Office of Research and Development
National Health and Environmental Effects Research Laboratory
Mid-Continent Ecology Division
Duluth, MN 55804
REVIEWERS

Keith Sappington
U.S. Environmental Protection Agency
Office of Research and Development
National Center for Environmental Assessment
Washington, DC 20460

Steven Ferraro
U.S. Environmental Protection Agency
Office of Research and Development
National Health and Environmental Effects Research Laboratory
Western Ecology Division
Newport, OR 97365
ACKNOWLEDGMENTS

      Programmatic review of the document was conducted by Dale Hoff of EPA Region 8, a
Trichair of EPA's Ecological Risk Assessment Forum. Jeff Swartout of EPA's National Center
for Environmental Assessment, ORD, kindly provided review of BSAF statistical approaches.
DRAFT: Do not cite or quote                iv

-------
 1     INTRODUCTION




 2            In March 2004, the Ecological Risk Assessment Forum (ERAF) submitted a request to




 3     ORD's Ecological Risk Assessment Center (ERASC) relating to the estimation of Biota-




 4     Sediment Accumulation Factors (BSAFs) (Appendix). BSAF is a parameter describing




 5     bioaccumulation of sediment-associated organic compounds or metals into tissues of ecological




 6     receptors. The Problem Statement in the request was "What is the most appropriate method to




 7     estimate the BSAF from paired observations of concentrations in biota and sediment?" The




 8     Expected Outcome asked for answers to specific questions regarding the use of regression




 9     analysis for estimating BSAFs for nonionic organic compounds. The specific questions are




10     addressed in the latter portion of this document. A statement on the most appropriate method to




11     estimate the BSAF is provided below. This document is focused solely on the determination of




12     BSAFs for nonionic organic chemicals and is primarily applicable to fish and high level




13     shellfish, e.g., crabs.  The determination of BSAFs for metals is not discussed.




14     RECOMMENDATIONS




15            There are two methods for determining the BSAF from paired observations: 1) a




16     regression approach, whereby the BSAF is estimated by determining the slope of the CSOC-C,, line




17     [Csoc is the concentration of chemical in the sediment on an organic carbon basis  (ng/kg organic




18     carbon) and Ct is the concentration of chemical in the organism on a lipid basis (|ig/kg lipid)],




19     and 2) an averaging approach, whereby the BSAF is estimated by averaging the BSAFs from the




20     paired observations across the site. Both approaches use the same data. The second approach,




21     however, is generally the more appropriate method for estimating the BSAF because regression




22     analysis has these four limitations:







       DRAFT: Do not cite or quote                 1

-------
 1         1)   Regression analysis, whether model I (simple linear regression) or model II (geometric
 2             mean regression, major axis regression, Bartlett's three-group method, or Kendall's
 3             robust line-fit method (Sokal and Rohlf, 1995)), requires meeting parametric assumptions
 4             about the relationship between the X and Y variables.
 5
 6         2)   Regression analysis, in order to be useful, requires a range of values in the X and Y
 7             variables.
 8
 9         3)   When large ranges exist in the CSOC-C,, values (e.g., Csoc spans two orders of magnitude),
10             weighting of the data in the regression analysis and/or transformation of the data might
11             be required for proper analysis.
12
13         4)   Although regression analysis can be done on data sets with limited numbers of CSOC-C<,
14             pairs, determining the slope of the line fitting limited numbers of pairs can lead to highly
15             uncertain slopes.
16
17             In contrast, the averaging approach (estimating the BSAF by averaging the BSAFs from

18      each CSOC-CC pair) requires none of these conditions or assumptions. Further, unlike the

19      regression approach, the averaging approach can be performed with limited data.

20             Both the regression and averaging approaches require similar conditions (e.g., food web

21      structure, sediment/water column concentration quotients, chemical bioavailability, and diets of

22      the organisms) for each CSOC-C,, pair.  (This can be problematic for Superfund and other sites that

23      have highly heterogenous conditions.)  Additionally, for both approaches, accuracy and precision

24      of the calculated BSAFs are a function of the sample size, i.e., the number of the Csoc-Ct pairs.

25             With the regression and averaging approaches, each CSOC-C,, pair is location specific and

26      each pair incorporates all of the conditions existing at the location. In order to use either

27      approach, the conditions must be the similar across all locations.  Mixing of CSOi.-Ct paired

28      observations with different underlying conditions is not recommended and will, in all likelihood,

29      result  in BSAFs with poor predictive accuracy.
        DRAFT: Do not cite or quote

-------
 1            With the averaging approach, the distribution of the individual BSAFs (determined from




 2      each CSOC-CC pair) can be evaluated very easily; this evaluation is commonly done in statistical




 3      analysis of data. Knowing the underlying distribution of the BSAFs allows the selection of the




 4      most appropriate (unbiased) averaging technique.  Further, with the individual BSAFs (CSOC-C<,




 5      pairs), the homoscedasticity (equality) of the variances across the individual BSAFs can be




 6      assessed. In cases where the variances are heteroscedastic (unequal), an appropriate weighted




 7      averaging technique would be used, and in general, the weights would be the reciprocal of the




 8      variances for the individual BSAFs. The averaging approach can also be easily implemented




 9      with other weighting considerations such as portions of the site represented by individual




10      BSAFs,  e.g., some BSAFs might be reflective of three quarters of the site while the remaining




11      BSAFs are reflective of the other quarter of the site.  The averaging approach also provides the




12      information on the final BSAF (grand mean) distribution and variance which are required for one




13      and two stage Monte Carlo uncertainty analyses.




14            There is great value in plotting the Cl against Csoc; BSAFs against Csoc; and Q, Csoc, and




15      BSAFs against geographical information. These plots should be done and evaluated for trends in




16      the data!  They may provide key insights and understanding of the complexities existing at the




17      site of interest. The importance of resolving discrepancies within the data can not be overstated




18      (e.g., Why are some BSAFs so different?  Are there trends or dependencies upon concentrations




19      of chemicals in sediment or with geographical location within the site?  Why don't the CSOC-C{




20      pairs form a linear relationship?)  Spending time and resources resolving these discrepancies will




21      be well worth the effort since the uncertainties associated with remediation decisions will be
        DRAFT: Do not cite or quote

-------
 1      smaller.  Additionally, any discrepancies in the data at this level will be translated into higher

 2      and more complex analyses since these analyses use this information.

 3            The following sections provide a description of the BSAF along with its underlying

 4      assumptions, a discussion on how to measure a useful BSAF, a discussion on the basis of the

 5      regression approach, and answers to specific questions related to regression analysis.

 6      DEFINITION OF BSAF

 7            The BSAF is defined (Ankley et al., 1992) as


 8                                      BSAF =    °   *                                  (1)
                                                    C If
                                                     s J soc
 9
10      where C0 is the chemical concentration in the organism (|ig/kg wet weight),/ is the lipid fraction

11      of the organism (g lipid/g wet weight), Cs is the chemical concentration in surficial sediment

12      (ng/kg dry weight) and/soc is the fraction of the sediments as organic carbon (g organic carbon/g

13      dry weight).  In general, BSAFs should be determined from spatially and temporally coordinated

14      fish and surficial sediment samples under conditions in which recent loadings of the chemicals to

15      ecosystem are relatively unchanged (Burkhard et al., 2003).  The BSAF definition does not

16      invoke or include the assumption of equilibrium conditions for the chemical between the

17      organism and sediment (Ankley et al., 1992; Thomann et al., 1992).  As shown by Thomann et

18      al. (1992), BSAFs are appropriate for describing bioaccumulation of sediment contaminants in

19      aquatic food webs with non-equilibrium conditions between both the sediment and fish, and

20      sediment and its overlying water. Equilibrium is regarded as a reference condition for describing

21      degrees of disequilibrium, and thus, is not a requirement for measurement, prediction, or

22      application of BSAFs.



        DRAFT: Do not cite or quote                4

-------
 1            With specific reference to benthic invertebrates, numerous investigators (Lake et al.,




 2      1984; McElroy and Means, 1988; Bierman, 1990; Lake et al., 1990; Ferraro et al., 1990) have




 3      invoked two assumptions regarding BSAFs: 1) equilibrium conditions and 2) no metabolism of




 4      the chemical.  These assumptions when combined with EqP (equilibrium partitioning) theory




 5      (DiToro et al., 1991), leads to the conclusion that the BSAF, for these specific conditions, is




 6      equal to the partitioning relationship of the chemical between organic carbon in the sediment and




 7      lipids of the organism.  Depending upon the affinities of the nonpolar organic chemical for lipid




 8      and sediment organic carbon, the BSAF, under these specific conditions, should be in the range




 9      of 1 to 2 (McFarland and Clarke, 1986). For aquatic organisms tightly connected to the




10      sediments like oligochaetes and  other benthic invertebrates, experimental measurements (Lake et




11      al., 1990; Tracy and Hansen, 1996) are generally consistent with the theoretical value, i.e., in the




12      range of 1 to 2.




13            There are solid mechanistic reasons why fish should not be in equilibrium with their




14      sediments (Thomann et al., 1992). For fish, BSAFs incorporate wide ranges of influences




15      including biomagnification due to the trophic level of the fish; sediment-water column chemical




16      disequilibrium; the diet of the fish and its underlying food web; the fish's home range, and




17      chemical metabolism within the fish and its food web (Burkhard  et al., 2003). Suggestions that




18      BSAFs for fish should be in the  range of 1 to 2 by combining the definition of the BSAF with the




19      assumptions of equilibrium conditions and no metabolism are incorrect (Wong et al., 2001). As




20      explained above, measured BSAFs above or below 1 to 2 are entirely reasonable for fish




21      (Burkhard et al., 2003). BSAFs outside this range for fish do not violate the general definition of
       DRAFT: Do not cite or quote

-------
 1     BSAFs nor invalidate the usefulness of BSAFs in predicting chemical residues in fish for




 2     sediment contaminants (Burkhard et al., 2004).




 3     MEASURING USEFUL CSOC-C, PAIRS FOR CALCULATION OF BSAFs




 4            Probably the most important factor in measuring a BSAF with predictive power is the




 5     requirement that the sediment samples analyzed be reflective of the immediate home range of the




 6     fish. Depending upon the site, the degree of difficulty in defining the immediate home range of




 7     the organism can vary widely. In situations where the movement of the organisms is confined by




 8     the geography of the site, e.g., dams or falls, the home range of the organisms can probably be




 9     defined fairly easily. When required, home ranges can be determined by tagging/recapture,




10     radio-telemetry, and/or ultrasonic telemetry studies at the site of interest.  Estimates of home




11     ranges for freshwater fishes can be determined using the allometric relationship (Minns, 1995):




12            lnH = -2.91 + 3.14HAB + 1.65 In L  or   In H = 3.33+2.98 HAB + 0.58 In W




13     where H is the home range size (m2), HAB is 0 for rivers and 1 for lakes, W is body weight (g),




14     and L is body length (mm).  For freshwater invertebrates (crabs), marine and estuarine




15     ecosystems, allometric relationships for home range have not been reported.




16            Having a good understanding of the immediate home range of the species is important.




17     Organisms with smaller home ranges will, in all likelihood, be more representative of the study




18     site than those with large home ranges that extend way beyond the study site. Just because a fish




19     (or other aquatic organism) is caught at a sampling location, one can  not infer that the chemical




20     residue in the fish is due to the chemicals residing at the study site. Knowledge of the fish's




21     home range  is the only way that one can establish the connection of the fish to the sampling




22     location.  It is strongly recommended that local fisheries experts be consulted during the







       DRAFT: Do not cite or quote                6

-------
 1      sampling design phase of the field study to help in determining the immediate home range and




 2      trophic level of the organisms at the site; local knowledge will be extremely helpful. Although




 3      the above allometric relationship is available for estimating home ranges, one shouldn't




 4      necessarily assume that the "calculated" and "actual" immediate home ranges for the organisms




 5      are the same; one will still  need to do the leg work of establishing as best as one can the




 6      immediate home ranges for the organisms at the site.




 7            Once the home range of the species of interest is established, sediment samples reflective




 8      of the species home range need to be collected. It is important that the sediment samples




 9      collected be representative of the sediments to which the organisms are exposed and not a




10      homogenized sediment core representing the entire bed of contaminated sediment.  For most




11      organisms, the surficial sediments  are most reflective of the organism's immediate exposure




12      history, and generally, smaller depths of the surficial layer, e.g., 0 to 2 cm, are preferred over




13      larger depths, e.g., 0 to 30 cm.  For deeper burrowing organisms such as some clams and




14      polychaetes, slightly larger surficial depths, e.g., 0 to 5 cm, might be more appropriate of their




15      recent exposure hi story.




16            Beyond establishing the home range of the organism and the appropriate sediment




17      samples, the collection and analysis of adequate numbers of organisms and sediment samples is




18      required for deriving unbiased estimates of the mean concentrations of chemicals with known




19      variances.  This document will not address the subject of sample collection,  compositing, and




20      analysis. With unbiased estimates of the mean concentrations, the BSAF for the specific site can




21      be calculated using Equation 1.
       DRAFT: Do not cite or quote

-------
 1            In any study design, it is important that biota samples be collected and composited in size




 2      or age classes.  For fish, dietary composition changes substantially with size and age, and these




 3      changes will result in differences in BSAFs among size and age classes.  For forage fish,




 4      common classes are young-of-the-year, juveniles, and adults, and for piscivorous fishes,




 5      common classes are year classes, e.g., 2, 3, 6, and 10 years old.  Mixing of fishes of different




 6      size/age classes is not recommended because of the increased variance for the average chemical




 7      residue in the organisms.




 8            Biota samples for chemical analysis should never be  composited by mixing different fish




 9      species.  Different fish species have different life histories and diets. BSAFs derived from




10      composite samples composed of different species will be highly biased by the individual species.




11      Further, resolving what the potential biases are for an individual species would require the




12      collection and analysis of that species.




13            When a CSOC-C} pair (or BSAF) is measured for a specific chemical, the measured value




14      incorporates all conditions and parameters existing at the location of interest.  The major




15      conditions and parameters incorporated into the CSOC-C<, pair (or BSAF) are 1) the distribution of




16      the chemical between the sediment and water column, 2) the relationship of the food web to




17      water and sediment, and 3) the length of the food web (or trophic level of the organism).




18      CALCULATION OF BSAFs




19            The BSAF is calculated from  four measured variables (see equation 1,  repeated below):




20      concentration of the chemical in the organism on a wet weight basis (C0), the lipid content of the




21      wet tissue (/,), the concentration of the chemical in the sediment on a dry weight basis (Cs),  and




22      the organic carbon content of the dry sediment (/"soc).







        DRAFT: Do not cite or quote                 8

-------
 1
                                                    c if,
 2                                      BSAF ~- —^-                                 (1)
                                                   C //                                   l '
                                                    s J soc
 3
 4     A CSOC-C(, pair will, in many cases, be composed of multiple composite tissue samples and

 5     multiple sediment samples (spanning the immediate home range of the organisms) for a

 6     sampling location.  In order to determine the BSAF for the CSOC-C{ pair, average concentrations in

 7     the tissue and sediment need to be determined; the numerator and denominator of Equation 1.

 8     The lipid normalized concentration of the chemical in each tissue sample should be determined

 9     and then, these values should be averaged to determine the average chemical concentration for

10     the organisms. If the tissue samples have different numbers of organisms in each composite,

11     e.g., three fishes in one sample and five fishes in the second sample, a weight average

12     concentration should be determined. For normally distributed residues and the two sample fish

13     example, the weighted average concentration equals:

14
15                 C,_mg  = E(wz.xCc_,)/IX  =  (3xC{_owe+5xC^0)/(3+5)            (2)

16
17     where w{ is the number of organisms in composite /', Ct.{ is the lipid normalized concentration of

18     the chemical in composite /', and Q.avg is the weighted average lipid normalized concentration in

19     the tissues.  The standard deviation of a weighted average (sa^) equals

20
21                      V.g  =   (^x(_-_)/(Zwl)                 (3)

22
       DRAFT: Do not cite or quote

-------
 1     For log-normally distributed residues in the fish, the weighting would be done on the log



 2     transformed data.  Sediment samples would be treated similarly; normalizing for organic carbon



 3     and then, calculating the average concentration of the chemical in the sediments.



 4            The BSAF for the CSOC-C<, pair would then be determined by dividing C<,_avg by Csoc.avg.



 5     The variance for the BSAF can be estimated using the equation (Mood et al., 1974):



 6

                                       )2  +BSAF\sc    )2 -  2rsc   sr     BSAF
                                    l-w              C              C    C
 7                      SOC- OVg

 8

 9     where SBSAF, %oc.avg, and sc,_m% are the standard deviations for the BSAF, Csoc.avg, and Q.avg,



10     respectively; and r is the correlation coefficient between Csoc.avg and C<,_avg.



1 1            For each CSOC-C} pair, a BSAF is determined.  As discussed previously, the average BSAF



12     would subsequently be determined from the individual BSAFs using the most appropriate



13     (unbiased) averaging technique based upon the underlying distribution of the BSAFs.



14     BASIS FOR BSAF REGRESSION APPROACH



15            Equation 1 can be rearranged:



16                                  CJf,  =   BSAF x  Cs/fsoc                             (5)




17     By substitution, equation 5 can be expressed as:



18                                     Cf  =   BSAF x  Csoc                                (6)




19     where Csoc is the concentration of chemical in the sediment on an organic carbon basis (|ig/kg



20     organic carbon) and Ct is the concentration of chemical in the organism on a lipid basis (|ig/kg



21     lipid).
       DRAFT: Do not cite or quote                 10

-------
 1            Plotting of Csoc against Ct results in the following illustrative plot (Graph A), where the


 2     slope of the line is the BSAF. However, the slope of Cs plotted against C0 (Graph B) is not the


 3     BSAF because these two measures of chemical concentrations are not organic carbon and lipid


 4     normalized. Use of the regression approach to derive the BSAF incorporates an implicit


 5     assumption above and beyond those required for measuring a BSAF at a specific location.  The


 6     implicit assumption of the regression approach is that all Csoc-Cj pairs must have or incorporate


 7     the same underlying ecological conditions and parameters.
                    Q.

                    'T
                    O)
                            slope = Ay / Ax = BSAF
                          csoc (ug/kg-organic carbon)
         slope = by I Ax £ BSAF
          Cs (ug/kg-dry weight)
 8            For a Superfund site, it is common to collect samples across the site with a number of


 9     different sampling locations. For example, consider a New England stream with a series of three


10     dams, and assume that two-year-old carp and sediment are collected and analyzed in each


11     reservoir. Further assume that enough fish and sediment were collected so that representative


12     and unbiased mean concentrations were determined for each reservoir. Thus, three sets of paired


13     carp-sediment observations would be determined, one for each of the three reservoirs.
       DRAFT: Do not cite or quote
11

-------
 1             These paired observations of Csoc and C, can be plotted (Graphs C & D).  In Graph C, the

 2      pairs form a nearly linear relationship suggesting that the underlying conditions for the CSOC-C,,

 3      pairs are consistent across the samples and thus allow estimation of the BSAF using the

 4      regression approach. In Graph D, the pairs form no easily defined linear relationship, and in this

 5      case, there is too little variability in the CSOC-C,, pairs for the regression approach to be useful in

 6      estimating the BSAF. In Graph E, a situation where four sets of paired carp-sediment data were

 7      determined, three of the pairs form a nearly linear relationship, but one pair is different from the

 8      other pairs.  Depending upon how one draws the line, either the triangle or square data in Graph

 9      E could be the different (or outlier) CSOC-CC pair.  In this case, one or more of the CSOC-C,, pairs

10      have different underlying conditions, and thus, it would be inappropriate to estimate the BSAF

11      using the regression approach.

12             As discussed above, each carp-sediment pair is location specific and each pair

13      incorporates all of the major conditions and parameters existing at the location. In order to use

14      the regression approach with pairs of CSOC-C,, observations, the major  conditions and parameters

15      must be the  same for all locations.  This requirement is the implicit assumption incorporated into

16      the regression approach.  Mixing of Csoc-Ct paired observations with  different conditions and

17      parameters will result in CSOC-C,, plots where the CSOC-C,, pairs will form a non-linear relationship

18      (e.g., possibly Graph E), and in all likelihood, a BSAF with poor predictive power.1

19             For the above examples, if the BSAF  for each pair of CSOC-C{ observations are plotted

20      against Csoc, the following graphs are obtained (graphs CC, DD, and EE).  The relationships

21      among the CSOC-C,, pairs in the above graphs remain in the graphs based upon the BSAFs;
        lrrhe mixing of CSOC-C, paired observations with different conditions and parameters is not recommended for the
        averaging approach as well.  BSAFs with poor predictive power (i.e., accuracy) will, in all likelihood, result when
        different conditions and parameters exist across the individual QOC-C, pairs used in the analysis.


        DRAFT: Do not cite or  quote                  12

-------
1     compare Graphs C to CC, D to DD, and E to EE. In essence, by calculating the BSAF, one has




2     mathematically removed the concentration dependence shown in Graphs C, D, and E.  For




3     further comparison purposes, the BSAF for each pair of Csoc-Ct observations are also plotted




4     against Q (graphs CCC, ODD, EEE).




5            The graphs, i.e., C, D, E, CC, DD, EE, CCC, ODD, and EEE, are some of the plots




6     recommended for evaluating trends and underlying conditions associated with the CSOC-C,, pairs.




7     We recommend that these plots be completed prior to performing the final calculations for




8     determining the site-specific BSAF.  These plots will help in identifying sources of variation and




9     error in the individual CSOC-C} pairs and BSAF values.
                      S1
c


u

•

•o
Q.
D)
i1
eT
D
•





•o
Q.
3
i1
o"
E




^ •
                         CC
DD
EE
CCC
1
DDD
•»
m
EEE
A
                            C( (ug/kg-lipid)
  C( (ug/kg-lipid)
  C( (ug/kg-lipid)
      DRAFT: Do not cite or quote
     13

-------
 1      THE REGRESSION APPROACH




 2            A key consideration in using the regression approach is to realize that both Csoc and Ct are




 3      measured with error.  With the simple linear regression least-squares technique, one variable (the




 4      Fs) are measured with error while the other variable (the Jf s) are fixed and have no error.




 5      Simple linear regression is referred to as model I regression analysis. When Jf s and Fs are both




 6      measured with error, one of a number of model II regression techniques will be more appropriate




 7      and unfortunately "the appropriate method depends on the nature of the data" (Sokal and Rohlf,




 8      1995). Sokal and Rohlf (1995) provide an excellent discussion on model II regression and the




 9      techniques of geometric mean regression (also called reduced major axis, standard major axis, or




10      relation d'allometrie), slope of the major axis, Bartlett's three-group method, and Kendall's




11      robust line-fit method. Additionally, Sokal and Rohlf (1995) discuss the Berkson case of model




12      II regression where model I regression is appropriate.




13            It is suggested that the determination of the slope of Csoc-Cj pairs be performed using the




14      geometric mean regression technique (Halfon, 1985; Sokal and Rohlf, 1995) because with this




15      technique the slope of the regression is not dependent upon the scale of the Jfs and Fs used in




16      the analysis. Additionally, Ricker (1973) has recommended that the geometric mean regression




17      technique be used for determining functional relationships (i.e.,  slope) when "the variability is




18      mostly natural... in X and 7"; the case, I believe, when sediment samples representative of the




19      organism's actual exposure history are collected.




20            For the geometric mean regression technique, the slope of geometric mean regression line




21      is the geometric mean of the slopes of the following two linear regression least-squares lines:





22                                        y  =  a  + b"x                                   (7)







        DRAFT: Do not cite or  quote                 14

-------
 1     and




 2                                       x =  c  +  dy                                  (8)





 3     The slope of the geometric mean regression line is computed as the geometric mean of b" and




 4     lid:





 5                                     b = (b"  I  d)m                               (9)






 6     The intercept a is computed as done in linear regression:





 7                                       a =  Y -  bX                                 (10)






 8     For further details on the geometric mean regression technique, the reader is referred to Halfon




 9     (1985) and Sokal and Rohlf (1995).




10            An Excel add-in function for geometric mean regression can be downloaded from the




11     following URL.




12        http ://www.uottawa. ca/academic/arts/geographie/lpcweb/newlook/data_and_downloads/




13        download/sawsoft/modelii/modelii.htm




14     RESPONSES TO QUESTIONS RAISED IN EXPECTED OUTCOMES




15     Do I fit a straight line through the data?




16            Yes. If the CSOC-C{ observations don't form a straight line, then one must figure out why




17            data diverge from the linear relationship.  Reasons for the CSOC-C} observations diverging




18            from a straight line include (Note, there are many more causes than those listed):
       DRAFT: Do not cite or quote                15

-------
 1                   •  The organisms in different CSOC-CC pairs reside at different trophic levels in the
 2                      food web.

 3                   •  The organisms in different CSOC-C, pairs have dramatically different diets even
 4                      though they reside at the same trophic level. For example, for one pair, the
 5                      organisms might consume primarily zooplankton while for other pairs, the
 6                      organisms might consume primarily benthic invertebrates.

 7                   •  The bioavailability of the chemical  in the contaminated sediment varies
 8                      substantially across the CSOC-C,, pairs.

 9                   •  Across the  sampling locations, inputs of the chemicals to the site differ
10                      substantially. For example, consider a harbor where organisms residing in the
11                      lower parts of the harbor are exposed to runoff and ground water seepage from
12                      an old industrial site while organisms residing in the upper parts of the harbor
13                      are not exposed this to discharge.

14                   •  Different populations of the same species. For example, in the Hudson River,
15                      there are resident and migratory striped bass fish populations, and chemical
16                      residues in  the populations differ widely.
17
18      Do I plot my data on a log-log scale?

19            It is recommended that the data be plot in arithmetic-arithmetic scales because in

20            arithmetic-arithmetic space, the slope of the line is the BSAF when CSOC-C, pairs are used.

21            In general, the data, i.e., the Csoc-Ct pairs, are  assumed to be scaled arithmetically, and

22            thus,  should be plotted on arithmetic-arithmetic scales.
23            As a note of clarification, in log-log scales, the slope of the regression line (log C,

24            regressed against log Csoc) is not the BSAF.  See Equation 12, derived from the

25            rearrangements of Equation 6 and then, Equation 11.

26
27                               log C, =  log [  Csoc x  BSAF ]                         (11)
        DRAFT: Do not cite or quote                 16

-------
 1
 2                         log C{  =  slope  x log  Csoc +  log  BSAF                    (12)
 3
 4      Do I force the line through the origin?

 5            Yes, when doing regression with arithmetic-arithmetic scales. (If one is performing the

 6            regression with log-log scales, the origin does not exist because the logarithm of zero is

 7            undefined. Thus, the line can not be forced through the origin.)

 8      How do I handle non-detects?

 9            I'm not sure of your definition of non-detects. I'll provide answers for both definitions:

10            chemicals present at concentrations below the minimum detection limit (MDL) of the

11            method and chemicals not detected at all, i.e., no response above instrumental noise. For

12            the case where the chemical is present at concentrations below the MDL, use the

13            uncensored value in the calculation; don't use the MDL value. For the case where the

14            chemical is not detected at all, Superfund typically uses l/2 of the MDL.  However, as

15            discussed below, there are approaches for working with data below the MDL and when

16            the chemical is not detected at all. Calculation of BSAFs  using arbitrarily 1A of the MDL

17            for concentrations in sediment and/or biota can result in spurious and non-predictive

18            BSAFs. In each case (chemical present below the MDL and chemical not detected at all),

19            the resulting values must be flagged and different flags should be used for each case.



20            When plotting of the different CSOi.-Ct  pairs is done, different symbols/colors should be

21            used for the above two flagged data types. Examine this plot to see if the flagged data

22            aligns with the general trend of the CSOC-C} pairs that are not  flagged.  Chemicals not



        DRAFT: Do not cite or quote                17

-------
 1             detected at all and chemicals with concentrations below the MDL should each be treated




 2             separately. One probably has greater confidence in the uncensored flagged data (below




 3             the MDL) than the chemicals not detected at all. This comparison/evaluation should be




 4             performed by doing the regression analysis without the flagged data, with the less-than-




 5             the-MDL flagged data included, and with the flagged data alone.  Significance testing of




 6             the slopes (asking whether the slopes are different) should be done and these




 7             comparisons should help in determining whether to include or exclude the flagged data in




 8             the final regression.  Examination of the residual plots should be done and will help




 9             greatly in determining whether to include or exclude chemicals present at concentrations




10             less than MDL and/or chemicals not detected at all.









11             In general, for chemicals not detected at all (i.e., 1A of the MDL is used), they should be




12             excluded from the analysis since these values are highly uncertain relative to the other




13             Csoe-Ct pairs. Additionally, the flagged data would, in high likelihood, be from sampling




14             locations where less  contamination existed and not the site of planned active remediation.









15             The above discussion was centered on non-detects and their use in the regression




16             analysis. There are statistical approaches for averaging with censored data, i.e., non-




17             detects (El-Shaarawi and Dolan, 1989; Newman etal., 1989; Newman, 1995). These




18             approaches can be used with normally and log-normally distributed data. It is




19             recommended that unbiased means be calculated only if less than 20% of the reported




20             values are reported as being  non-detect (Berthouex and Brown, 1994).








        DRAFT: Do not cite or quote                18

-------
 1     How do I estimate the confidence interval around a prediction?

 2            The standard error of the geometric mean regression slope can be approximated by the

 3            standard error of the linear least-squares regression slope (Sokal and Rohlf, 1995). Most

 4            linear least-squares regression programs (SAS) or spreadsheets (Louts 123 and Excel)

 5            calculate the standard error of the slope.


 6            The 95% confidence limits on the slope would be calculated using student-t value:

 7
 8                           Upper  95% CI =  b  +  sb  x /0>05[ll_2]                     (13)
 9
10                           Lower  95% CI =  b  -  sb  x tQQ5[n_2]                     (14)
11
12            where b is the geometric mean regression slope, sb is the standard error of the geometric

13            mean regression slope, n is the total number of data points used in the geometric mean

14            regression, and ^005 is the two tailed Student-t for an a = 0.05%.


15            When calculating the geometric mean of the ratios of the CSOC-C<, pairs (i.e., BSAFs), the

16            averaging process in log space provides the mean and standard deviation. The 95%

17            confidence limits would be calculated in log space using the mean and standard

18            deviation, and then, the CIs would be transformed into arithmetic space.  In arithmetic

19            space, the 95% CI will be asymmetric.

20     Do I normalize by organic carbon and lipid?

21            Yes. The BSAF is  the ratio of the concentration in the biota on a lipid basis to the

22            concentration in the sediment on an organic carbon basis.


       DRAFT: Do not cite or quote                 19

-------
 1            By working with CSOC-C,, pairs (which are organic carbon and lipid normalized), one




 2            places these concentrations on a thermodynamic basis.  By expressing the concentrations




 3            on a thermodynamic basis, the concentrations of the chemicals in sediment and tissue are




 4            corrected for differences in bioavailability and partitioning behavior. By using the




 5            thermodynamic based expressions, the CSOC-C{ pairs are expressed equivalently.




 6      Do I use weighted regression?




 7            There are two general cases.  First, when the Csoc and Ct are individual observations (not




 8            averages), then individual CSOC-C,, pairs should be given equal weights. Second, if the Csoc




 9            and Cl are averages, then individual CSOC-C} pairs should be given equal weights except if




10            the Csoc and Ct variances are highly heterogeneous (p<0.001). If the variances are highly




11            heterogeneous (very dissimilar), then perform both weighted (by the inverse of the




12            variance) and unweighted regression and compare slopes.  The heterogeneous variances




13            might or might not have any appreciable effect on the slope. If appreciable effects exist




14            on the slope, then the weighted regression model is preferred.




15      If I transform the data,  do I need to use weighted regression?




16            See answer to previous question.  The variances would need to be evaluated in log space




17            for heterogeneity.




18      How do I take into account the home range of the biota whose tissue I measured?




19            As explained in the background, one must have knowledge of the organism's home




20            range. With this information, sediment samples across the home range must be collected




21            and analyzed, and the sediment samples must be representative of the organism's




22            immediate life history. Accounting for the home range of the organism is done by







        DRAFT: Do not cite or quote                20

-------
 1             averaging the analytical results for sediment samples collected within the organism's




 2             home range.




 3      What if my r2 is low and my data do not plot with the appearance of an increasing linear




 4      function?




 5             When this type of behavior is observed in the plot of CSOC-C<, pairs, this is an extremely




 6             strong suggestion that different sampling locations have the different underlying




 7             conditions and parameters; e.g., different food webs, different organism populations,




 8             differences in chemical bioavailability, different diets, etc.; or a very limited dynamic




 9             range. In these cases, one will need to determine the factors causing these differences. If




10             one can not resolve these difference, the same problems will also exist with other




11             methods for predicting chemical residues, e.g., food web models, because these methods




12             require this knowledge as well. In general, when this type of behavior is observed, the




13             problem is in the data itself, and  no statistical analysis method will circumvent the




14             problem. Without resolving these differences, their effects will be reflected or




15             incorporated into all calculations with the data.




16      How do I deal with outliers?




17             There are a number of different types of outliers.  First, if the chemical was not detected




18             at all, and /^  of the MDL was used, one could easily set these values aside without much




19             criticism, in essence, making the argument that one has low confidence in the values.




20             Second, if the chemical was flagged as being below the MDL and the uncensored value is




21             reported, treating these values as outliers and setting them aside would be much harder.




22             You would have to determine what level of confidence you place on values below the







        DRAFT: Do not cite or quote                21

-------
 1             MDL. In general, uncensored data below the MDL is included in the analysis unless




 2             there is an overwhelming reason to excluded the data, e.g., some type of methodological




 3             bias in the analytical technique.  Third, the CSOi.-Ct pair is very different from the general




 4             population of CSOi.-Ct pairs. In this situation, always make sure the data are not




 5             miscalculated, transposed, or misidentified, and ensure that no other type of




 6             methodological error is associated with the data. If the data pair appears to be correct,




 7             statistical techniques are available for the testing of outliers.









 8             Snedocor and Cochran (1980, p 167-168) present a statistical method for linear




 9             regression where the regression is performed without the outlier, and then the outlier is




10             tested as to whether it is within sampling error of the  population.  The test criterion is a t-




11             value. Because the outlier is not chosen randomly, to ensure a 1- a confidence, the




12             calculated t-value is compared to the t-value from the t-table using a'; where a' equals a




13             divided by n. Probably values for testing for outliers  should be generally conservative,




14             e.g., a = 5% or a = 1%. With an n of 20, the critical t-value for an a of 5% would be




15             found using an a' of 0.25% with the t-table.









16             SAS software, software for statistical analysis, provides outlier detection and testing




17             algorithms within its regression model program.




18      Do I develop a separate regression for each compound in  a mixture?




19             Yes. This is most desirable because individual chemicals have different chemical




20             properties. The differing behavior is most often observed with PCBs where fish appear








        DRAFT: Do not cite or quote                 22

-------
 1            to be slightly enriched with the higher chlorinated PCB congeners relative to the




 2            distribution existing in the sediments.




 3      When the value of x (i.e., exposure point concentration in sediment) is uncertain (e.g.,when




 4      biota migrate), how do I account for this in my regression?




 5            The best method of accounting for organism migration is to design your sampling plan




 6            for the organism such that the organisms are collected just before they migrate back out




 7            of the site.  This approach maximizes time the organism spends at the site of interest, and




 8            provides the best estimate of the residue in the organism based upon the organism's




 9            exposure in its immediate home range at the site.









10            Sampling design simulations (Burkhard, 2003) for the measurement of BSAFs (or CSOC-C,,




11            paired observations for determinations of BSAFs) suggest that spatial variability in the




12            concentrations of the chemical does not add large uncertainties into the measured BSAF




13            beyond those caused by temporal variability of the chemical concentrations in the water.




14            Further, random walk migration  simulations suggested that BSAFs (or CSOC-Q paired




15            observations for determinations of BSAFs) can be measured with low uncertainty even




16            when extreme spatial concentrations exist at the  field site, provided the measurements are




17            performed in more contaminated locations of the site for higher Kow chemicals, i.e., >105




18            (Burkhard, 2003).  The requirement of performing the field measurements at the more




19            contaminated locations within the site will limit  the regression approach because the




20            range of CSOC-C<, pairs will be small (see second paragraph of the Recommendations).
        DRAFT: Do not cite or quote                23

-------
 1            If the organisms spend a very short time at the site, e.g., the fish migrate through the site




 2            in a few days to a week, determination of BSAFs is not recommended even though the




 3            BSAF can be measured.  The sediments from the site would not be reflective of the fish's




 4            recent exposure history.




 5     Are there ways to improve my study design knowing what I know now about regression?




 6            First, the importance of collecting sediment samples that are reflective of the organism's




 7            immediate home range can not be overstated.  Spending time and resources to better




 8            define the relationship of the organisms to the sediments will greatly decrease the




 9            uncertainty associated with the resulting BSAFs. In addition, predictions using food web




10            models, both steady-state or dynamic, will  greatly improve because of the improved




11            knowledge on the underlying relationship between the sediment and organism.









12            Second, it is important that composite samples reflective of the biota at the site of interest




13            be collected. Clearly, collection and analysis of more organisms will provide a better




14            measure of the average residue in the biota. However, biota samples consisting of mixed




15            age classes is not recommended, e.g., juvenile and adult minnows, or one-year-old and




16            three-year-old largemouth bass.  Minimizing the differences in age (or size) will improve




17            the quality of the biota samples and ultimately provide smaller variances for the biota




18            residues. Typically, fishes of given size (e.g., smallest fish  >75% of the largest fish) or




19            age group (e.g., 3-year-olds) are  collected.
       DRAFT: Do not cite or quote                24

-------
 1            After sample collection and analysis, plans should be made to visually examine the data




 2            by making plots of CSOC-C} paired observations and plots of BSAFs against Csoc. The




 3            Csoi.s, Qs, and BSAFs should be plotted on a GIS type plot to determine if the values are




 4            correlated with geographical trends and conditions, e.g., the BSAFs increase with




 5            increasing distance away from the source on a river. Any additional information or




 6            understanding one can glean for the site will be advantageous in the remediation decision




 7            process.









 8            As part of the overall study plan for successfully measuring a BSAF, time and resources




 9            should be allocated for resolving causes of non-linearity (when they exist) in the CSOC-C,,




10            paired observations. Resolving why will greatly aid in understanding the complexities of




11            the site, and provide decision makers and risk assessors a much better basis for assessing




12            and evaluating remediation options.









13            Deriving a BSAF using regression analysis or by calculating the average of the individual




14            BSAFs uses the same data.  Hence, it is suggested that BSAFs be derived using both




15            approaches. The added effort for the second analysis should be relatively small since




16            much of the effort, in performing the data analyses, is  organizing the data into a usable




17            form for the calculations.




18      REFERENCES




19            There are many standard college level textbooks on statistical analysis which include




20      regression analysis. Almost all include discussion and examples on the linear least-squares







        DRAFT: Do not cite or quote                25

-------
 1      regression technique.  Coverage of geometric mean regression analysis technique is often not

 2      addressed in standard college level textbooks. Halfon (1985) is an excellent reference on

 3      geometric mean regression. Sokal and Rohlf (1995) address the subject of model II regression

 4      including geometric mean regression.

 5      Ankley, G.T., P.M. Cook, A.R. Carlson et al. 1992. Bioaccumulation of PCBs from sediments
 6      by oligochaetes and fishes: Comparison of laboratory and field studies. Can. J. Fish. Aquat.  Sci.
 7      49:2080-2085.

 8      Berthouex, P.M. and L.C. Brown.  1994.  Statistics for Environmental Engineers. Lewis
 9      Publishers/CRC Press, Boca Ration, FL.

10      Bierman, V.J., Jr.  1990.  Equilibrium partitioning and biomagnification of organic chemicals in
11      benthic animals. Environ. Sci. Technol. 24:1407-1412.

12      Burkhard, L.P.  2003. Factors influencing the design of bioaccumulative factor and biota-
13      sediment accumulation factor field studies. Environ. Toxicol. Chem. 22(2):351-360.

14      Burkhard, L.P., P.M. Cook and D.R. Mount. 2003.  The relationship of bioaccumulative
15      chemicals in water and sediment to residues in fish: A visualization approach. Environ. Toxicol.
16      Chem. 22(11):2822-2830.

17      Burkhard, L.P., P.M. Cook and M.T. Lukasewycz.  2004. Biota-sediment accumulation factors
18      for poly chlorinated biphenyls, dibenzo-p-dioxins, and dibenzofurans in southern Lake Michigan
19      lake trout (Salvelinus namaycush).  Environ. Sci. Technol. 38(20):5297-5305.

20      DiToro, D.M., C.S. Zarba, DJ. Hansen et al. 1991.  Technical basis for establishing sediment
21      quality criteria for nonionic organic chemicals using equilibrium partitioning. Environ. Toxicol.
22      Chem. 10:1541-1583.

23      El-Shaarawi, A.H. and D.M. Dolan.  1989. Maximum likelihood estimation of water quality
24      concentrations from censored data. Can. J Fish Aquat. Sci. 46(6): 1033-1039

25      Ferraro, S.P., H. Lee Jr., R.J. Ozretich and D.T. Specht.  1990. Predicting bioaccumulation
26      potential:  a test of a fugacity-based model.  Arch. Environ. Contam. Toxicol. 19(3):386-394.

27      Halfon, E. 1985.  Regression method in ecotoxicology: A better formulation using the geometric
28      mean functional regression. Environ. Sci.  Technol.  19:747-749
        DRAFT: Do not cite or quote                26

-------
 1     Lake, J.L., N. Rubinstein and S. Pavignano. 1984. Predicting bioaccumulation: Development of
 2     a simple partitioning model for use as a screening tool in regulating ocean disposal of wastes.
 3     In: Fate and Effects of Sediment-Bound Chemicals in Aquatic Systems, K.L. Dickson, A.W.
 4     Maki and W.A. Brungs, Ed. Pergamon Press, New York, NY. p. 151-166.

 5     Lake, J.L., N. Rubinstein, H. Lee II, C. A. Lake, J. Heltshe and S. Pavignano. 1990. Equilibrium
 6     partitioning and bioaccumulation of sediment-associated contaminants by infaunal organisms.
 7     Environ. Toxicol. Chem.  9:1095-1106.

 8     McElroy, A.E. and J.C. Means. 1988. Factors affecting the bioavailability of
 9     hexachlorobiphenyls to benthic organisms. In: Aquatic Toxicology and Hazard Assessment,
10     Vol. 10, WJ. Adams, G.A. Chapman and W.G. Landis, Ed. American Society  for Testing and
11     Materials, Philadelphia, PA. p. 149-158.

12     McFarland, V.A. and J.U. Clarke.  1986. Testing bioavailability of polychlorinated biphenyls
13     from sediments using a two-level approach. In: Proceedings of the US Army Engineers
14     Committee on Water Quality, 6th Seminar, R.G. Wiley, Ed. Hydraulic Engineering Research
15     Center, Davis, CA.  p. 220-229.

16     Minns, C.K.  1995.  Allometry of home range size in lake and river fishes.  Can. J. Fish. Aquat.
17     Sci.  52:1499-1508.

18     Mood, A.M., F.A. Graybill and D.C. Boes. 1974. Introduction to the Theory of Statistics, 3rd ed.
19     McGraw-Hill, New York, NY.

20     Newman, M.C. 1995. Quantitative Methods in Aquatic Ecotoxicology.  Lewis/CRC Press,
21     Boca Raton, FL.

22     Newman, M.C., P.M. Dixon, B.B Looney and I.E. Finder III. 1989.  Estimating mean and
23     variance for environmental samples with below detection limit observations. Water Res. Bull.
24     25(4):905-916.

25     Ricker, W.E.  1973. Linear regression in fishery research. J. Fish. Res. Board Can. 30:409-434.

26     Snedecor, G.W. and W.G. Cochran. 1980. Statistical Methods, 7th ed. Iowa State University
27     Press, Ames,  IA. p. 167-168.

28     Sokal, R.R. and FJ. Rohlf. 1995. Biometry: The Principles and Practice of Statistics in
29     Biological Research, 3rd ed. W.H. Freeman and Co., New York, NY.

30     Thomann, R.V., J.P. Connolly and  T.F. Parkerton. 1992.  An equilibrium model of organic
31     chemical accumulation in aquatic food webs with sediment interaction.  Environ. Toxicol. Chem.
32     11:615-629.
       DRAFT: Do not cite or quote                27

-------
1     Tracy, G.A. and DJ. Hansen.  1996. Use of biota-sediment accumulation factors to assess
2     similarity of nonionic organic chemical exposure to benthically-coupled organisms of differing
3     trophic mode. Arch. Environ. Contam. Toxicol.  30(4):476- 475.

4     Wong, C,S., P.D. Capel and L.H. Nowell.  2001.  National-scale, field based evaluation of the
5     biota-sediment accumulation factor model.  Environ. Sci. Technol. 35(9): 1709-1715.
      DRAFT: Do not cite or quote                28

-------
                                         APPENDIX

      ECOLOGICAL RISK ASSESSMENT SUPPORT CENTER REQUEST FORM
Problem Statement: What is the most appropriate method to estimate the Biota Sediment
Accumulation Factor (BSAF) from paired observations of concentrations in biota and sediment?
Requestors: Sharon Thorns and Al Hanke, Region 4
Background: BSAF is a parameter describing bioaccumulation of sediment-associated organic compounds
or metals into tissues of ecological receptors. In a typical experiment to measure bioaccumulation the
researcher collects colocated sediments and tissues over a gradient of contamination.  Simple compared to
bioaccumulation and trophic transfer models, it finds its use at Superfund sites to estimate progress toward
achieving a protective  tissue concentration as sediments become cleaner.
Expected Outcome: The expected outcome is a white paper addressing the following questions regarding
the use of regression to obtain the most accurate estimate of BSAF:

Do I fit a straight line?
Do I plot my data on a log-log scale?
Do I force the line through the origin?
How do I handle non-detects?
How do I estimate the confidence interval around a prediction?
Do I normalize by organic carbon and lipid?
Do I use weighted regression?
If I transform the data, do I need to use weighted regression?
How do I take into account the home range of the biota whose tissue I measured?
What if my r2 is low and my data do not plot with the appearance of an increasing linear function?
How do I deal with outliers?
Do I develop a separate regression for each compound in a mixture?
When the value of x (i.e., exposure point concentration in sediment) is uncertain (e.g., when biota migrate),
   how do I account for this in my regression?
Are there ways to improve my study design knowing what I know now about regression?

Where the topics are covered by standard books or web sites on statistics, they may be referenced. A few
case studies may be useful to illustrate the concepts.
Additional Comments: Requestor can provide case studies.
     DRAFT: Do not cite or quote                29

-------