EPA/600/R-06/047
                                                            ERASC-013F
                                                           February 2009
ESTIMATION OF BIOTA SEDIMENT ACCUMULATION FACTOR
     (BSAF) FROM PAIRED OBSERVATIONS OF CHEMICAL
         CONCENTRATIONS IN BIOTA AND SEDIMENT
                                 by

                          Lawrence Burkhard
                   U.S. Environmental Protection Agency
                    Office of Research and Development
         National Health and Environmental Effects Research Laboratory
                     Mid-Continent Ecology Division
                           Duluth, Minnesota
                 Ecological Risk Assessment Support Center
                    Office of Research and Development
                   U.S. Environmental Protection Agency
                            Cincinnati, OH

-------
                                      NOTICE
       This document has been subjected to the Agency's peer and administrative review and
has been approved for publication as an EPA document.  Mention of trade names or commercial
products does not constitute endorsement or recommendation for use.
This report should be cited as: Burkhard, L. 2009. Estimation of Biota Sediment Accumulation
Factor (BSAF) from Paired Observations of Chemical Concentrations in Biota and Sediment.
U.S. Environmental Protection Agency, Ecological Risk Assessment Support Center, Cincinnati,
OH. EPA/600/R-06/047.
                                          11

-------
                       TABLE OF CONTENTS


AUTHORS, CONTRIBUTORS AND REVIEWERS	iv

ACKNOWLEDGMENTS	v

INTRODUCTION	1

RECOMMENDATIONS	1

DEFINITION OF B SAP	3

MEASURING USEFUL CSOC-Q PAIRS FOR CALCULATION OF BSAFs	5

CALCULATION OF BSAFs	7

BASIS FOR B SAP REGRESSION APPROACH	11

THE REGRESSION APPROACH	14

RESPONSES TO QUESTIONS RAISED IN EXPECTED OUTCOMES	16

REFERENCES	26

APPENDIX: ECOLOGICAL RISK ASSESSMENT SUPPORT CENTER
REQUEST FORM	30
                                in

-------
                  AUTHORS, CONTRIBUTORS AND REVIEWERS
AUTHOR

Lawrence Burkhard
U.S. Environmental Protection Agency
Office of Research and Development
National Health and Environmental Effects Research Laboratory
Mid-Continent Ecology Division
Duluth, MN 55804

CONTRIBUTOR

Michael Kravitz
U.S. Environmental Protection Agency
Office of Research and Development
National Center for Environmental Assessment
Cincinnati, OH 45268

REVIEWERS OF EXTERNAL REVIEW DRAFT

Michael C. Newman
Virginia Institute of Marine Science
School of Marine Science
The College of William and Mary
Gloucester Point, VA 23062

David Glaser
Quantitative Environmental Analysis, LLC
Montvale, NJ  07645

Joan U. Clarke
U.S. Army Engineer Research and Development Center
Environmental Laboratory
Vicksburg, MS 39180
                                         IV

-------
                              ACKNOWLEDGMENTS

       The first draft of this document was internally (within EPA) reviewed by Keith
Sappington (EPA Office of Prevention, Pesticides and Toxic Substances) and Steven Ferraro
(Office of Research and Development [ORD]/National Health and Environmental Effects
Research Laboratory).  Jeff Swartout of ORD/National Center for Environmental Assessment
kindly provided review of BSAP statistical approaches. David Farrar (ORD/NCEA) offered
useful discussion of statistical issues. Programmatic review of the document was conducted by
the Trichairs of EPA's  Ecological Risk Assessment Forum: Venessa Madden (EPA Region 7),
Sharon Thorns (EPA Region 4) and Marc Greenberg (Office of Solid Waste and Emergency
Response/Office of Superfund Remediation and Technology Innovation). Though all review
comments were considered in the preparation of this document, the peer reviewers (both internal
and external) do not necessarily  agree with all the conclusions contained herein.

-------
INTRODUCTION
       In March 2004, the Ecological Risk Assessment Forum (ERAF) submitted a request to
ORD's Ecological Risk Assessment Center (ERASC) relating to the estimation of
Biota-Sediment Accumulation Factors (BSAFs) (see Appendix). BSAF is a parameter
describing bioaccumulation of sediment-associated organic compounds or metals into tissues of
ecological receptors.  The purpose of this report is to provide a response to the ERAF request.
The Problem Statement in the request was "What is the most appropriate method to estimate the
BSAF from paired observations of concentrations in biota and sediment?" The Expected
Outcome asked for answers to  specific questions regarding the use of regression analysis for
estimating BSAFs for non-ionic organic compounds. The specific questions are addressed in the
latter portion of this document. A statement on the most appropriate method to estimate the
BSAF is provided below.  This document is focused solely on the determination of BSAFs for
non-ionic organic chemicals and is applicable to fish and benthic organisms, e.g., crabs and
bivalves. The determination of BSAFs for metals is not discussed.

RECOMMENDATIONS
       There are two methods for determining the BSAF from paired observations: (1) a
regression approach, whereby the BSAF is estimated by determining the slope of the CSOC-C« line
(Csoc is the concentration of chemical in the sediment on an organic carbon basis [|ig/kg organic
carbon] and Q is the  concentration of chemical in the organism on a lipid basis [|ig/kg lipid])
and (2) an averaging approach, whereby the BSAF is estimated by averaging the BSAFs from
the paired observations across the site.  Both approaches use the same data. The second
approach is recommended as the more appropriate method for estimating the BSAF. Regression
analysis has  some limitations in comparison to the averaging approach:
   1)  The BSAF is,  in essence, a thermodynamic (or fugacity) ratio for the chemical of interest
       between the organism and sediment. Determination of the individual BSAFs from
       appropriately paired observations will enable the detection of differences (if they exist) in
       partitioning behavior among the paired observations for the site.  Regression analysis,
       which simply  constructs a functional relationship between the CSOC-Q pairs, will not
       allow, very easily, the detection of the differences (if they exist). Remediation decisions
       often involve the use of sediment-organism relationships and if fundamental differences
       in partitioning behavior exist across the  site, remedial actions might be different for the
       different portions of the site.

-------
   2)  Regression analysis, whether model I (simple linear regression) or model II (geometric
       mean regression, major axis regression, Bartlett's three-group method, or Kendall's
       robust line-fit method [Sokal and Rohlf, 1995]), requires meeting parametric assumptions
       about the relationship between the X and Y variables.
   3)  Although regression analysis can be done on data sets with limited numbers of CSOC-Q
       pairs, determining the slope of the line fitting limited numbers of pairs can lead to highly
       uncertain slopes.
   4)  Computation of accurate confidence limits can be problematic when regression analyses
       are perform using log transformed CSOC-Q data pairs (Ricker, 1973); log transformation is
       often required to linearize the residue data.
In contrast, the averaging approach (estimating the BSAF by averaging the BSAFs from each
CSOc-Ct pair) largely avoids the above limitations. With the averaging approach, the distribution
of the individual BSAFs (determined from each CSOC-C« pair) can be evaluated very easily; this
evaluation is commonly done in statistical analysis of data. Knowing the underlying distribution
of the BSAFs allows the selection of the most appropriate (unbiased) averaging technique.
Further, with the individual BSAFs (CSOC-CV pairs), the homoscedasticity (equality) of the
variances across the individual BSAFs can be assessed. In cases where the variances are
heteroscedastic (unequal), an appropriate weighted averaging technique would be used, and in
general, the weights would be the reciprocal of the variances for the individual BSAFs. The
averaging approach can also be easily implemented with other weighting considerations such as
portions of the site represented by individual BSAFs, e.g., some BSAFs might be reflective  of
three quarters of the site while the remaining BSAFs are reflective of the other quarter of the site.
The averaging approach also provides the information on the final BSAF (grand mean)
distribution and variance which are necessary for one and two stage Monte Carlo uncertainty
analyses.
       For Superfund sites and other sites, each CSOC-C« pair is location specific, and each pair
incorporates all of the underlying ecological and chemical conditions existing at the sampling
location (e.g., food web structure, sediment/water column concentration quotients, chemical
bioavailability and diets/trophic levels of the organisms).  Regardless of approach (regression or
averaging), the conditions must be similar across all locations.  For sites with highly
heterogenous conditions, having similar underlying conditions can be problematic.  Mixing  of
CSoc-Ci paired observations with different underlying conditions is not recommended and will, in
all likelihood, result in BSAFs with poor predictive accuracy.

-------
       There is great value in plotting the C« against Csoc; BSAFs against Csoc, Ce, sediment
organic carbon content, and organism lipid content; and Ce, Csoc and BSAFs against geographical
information.  These plots should be done and evaluated for trends in the data!  They may provide
key insights and understanding of the complexities existing at the site of interest. If the BSAFs
from the individual Csoc-Ce pairs are independent of the Csoc, Ce, sediment organic carbon
content, organism lipid content values, and geographical location, this would be strongly
suggestive of similar underlying ecological conditions for the Csoc-Ce pairs. When discrepancies
exist, the following evaluations  are suggested: (i) the closeness of the measured residues  or
sediment contaminant concentrations to Method Detection Limits (MDLs), (ii) the characteristics
of the sediment across the site (e.g., the types and amounts of organic carbon in the sediments
[biogenic, coal, coke, soot, tars], grain size, etc.), (iii) the co-occurring contaminants, (iv) the
diets/trophic levels of the organisms, (v) the lipid contents and health of the organisms
(exceptionally low lipids often indicate the organisms are stressed), and (vi) past remedial
actions. Depending upon what is learned from these evaluations, one would attempt to resolve
the Csoc-Ce pairs into units with  similar underlying conditions.  Potential actions could  involve
the segregation of the site into sub-units with Csoc-Ce pairs having similar conditions, or possibly,
discounting the importance of pairs where Csoc and/or Ce in the pairs are just above MDL. The
importance of resolving discrepancies within the data can not be overstated. Spending time and
resources resolving these discrepancies will be well worth the effort since the uncertainties
associated with remediation decisions will be smaller.  Additionally, any discrepancies in the
data at this level will be translated into higher and more complex analyses since these analyses
use this information.
       The following sections provide a description of the BSAF  along with its underlying
assumptions, a discussion on how to measure a useful BSAF, a discussion on the basis of the
regression approach, and answers to specific questions related to regression analysis.

DEFINITION OF BSAF
       The BSAF is defined (Ankley et al., 1992) as
                                  BSAF =   °i                                   (Eq. 1)
                                          C  f
                                          ^ s I J soc

-------
where C0 is the chemical concentration in the organism (|ig/kg wet weight),/^ is the lipid fraction
of the organism (g lipid/g wet weight), Cs is the chemical concentration in surficial sediment
(jig/kg dry weight) and^oc is the fraction of the sediments as organic carbon (g organic carbon/g
dry weight). In general, BSAFs should be determined from spatially and temporally coordinated
organism and surficial sediment samples under conditions in which recent loadings of the
chemicals to ecosystem are relatively unchanged (Burkhard et al., 2003). The BSAF definition
does not invoke or include the assumption of equilibrium conditions for the chemical between
the organism and sediment (Ankley et al., 1992; Thomann et al., 1992). As shown by
Thomann et al. (1992), BSAFs are appropriate for describing bioaccumulation of sediment
contaminants in aquatic food webs with non-equilibrium conditions between both the sediment
and organism (fish in this case), and sediment and its overlying water.  Equilibrium is regarded
as a reference condition for describing degrees of disequilibrium, and thus, is not a requirement
for measurement, prediction, or application of BSAFs.
       With specific reference to benthic invertebrates, numerous investigators (Lake et al.,
1984, 1990; McElroy and Means, 1988; Bierman, 1990; Ferraro et al.,  1990) have invoked two
assumptions regarding BSAFs: (1) equilibrium conditions and (2) no metabolism of the chemical.
These assumptions when combined with EqP (equilibrium partitioning) theory (DiToro et al.,
1991), lead to the conclusion that the BSAF, for these specific conditions, is equal to the
partitioning relationship of the chemical between organic carbon in the sediment and lipids of the
organism.  Depending upon  the affinities of the non-polar organic chemical for lipid and
sediment organic carbon, the BSAF, under these specific  conditions, should be in the range of
1 to 2 (McFarland and Clarke, 1986). For aquatic organisms tightly connected to the sediments
like oligochaetes and other benthic invertebrates, experimental measurements (Lake et al., 1990;
Tracy and Hansen, 1996) are generally consistent with the theoretical value, i.e., in the range of
1 to 2.
       There are solid mechanistic reasons why fish should not be in equilibrium with their
sediments (Thomann et al., 1992).  For fish, BSAFs incorporate wide ranges of influences
including biomagnification due to the trophic level of the fish, sediment-water column chemical
disequilibrium, the diet of the fish and its underlying food web, the fish's foraging range and
chemical metabolism within the fish and its  food web (Burkhard et al.,  2003). Suggestions that

-------
BSAFs for fish should be in the range of 1 to 2 by combining the definition of the BSAF with the
assumptions of equilibrium conditions and no metabolism are incorrect (Wong et al., 2001).  As
explained above, measured BSAFs above or below 1 to 2 are entirely reasonable for fish
(Burkhard et al., 2003). BSAFs outside this range for fish do not violate the general definition of
BSAFs nor invalidate the usefulness of BSAFs in predicting chemical residues in fish for
sediment contaminants (Burkhard et al., 2004).

MEASURING USEFUL CSOC-Q PAIRS FOR CALCULATION OF BSAFs
      Probably the most important factor in measuring a BSAF with predictive power is the
requirement that the sediment samples analyzed be reflective of the foraging range of the fish.
Depending upon the site, the degree of difficulty in defining the foraging range of the organism
can vary widely.  In situations where the movement of the organisms is confined by the
geography of the site, e.g., dams or falls, the foraging range of the organisms can probably be
defined fairly easily. When required, foraging ranges can be determined by tagging/recapture,
radio-telemetry and/or ultrasonic telemetry studies at the site of interest. Estimates of home
ranges (or foraging ranges) for freshwater fishes can be determined using the allometric
relationship (Minns, 1995):

      lnH = -2.91 + 3.14HAB + 1.65 In L  or  In H = 3.33 + 2.98 HAB + 0.58 In W

where H is the home range size (m2), HAB is 0 for rivers and 1 for lakes, W is body weight (g)
and L is body length (mm).  For freshwater invertebrates (such as crabs), marine and  estuarine
ecosystems, allometric relationships for home range have not been reported.
      Having a good understanding of the foraging range of the species is important.
Organisms with smaller foraging ranges will, in all likelihood, be more representative of the
study site than those with large foraging ranges that extend way beyond the study site. Just
because a fish (or other aquatic organism) is caught at a sampling location, one can not infer that
the chemical residue in the fish is due to the chemicals residing at the study site.  Knowledge of
the fish's foraging range is the only way that one can establish the connection of the fish to the
sampling location.  It is strongly recommended that local fisheries experts be consulted during
the sampling design phase of the field study to help in determining the foraging range and

-------
trophic level of the organisms at the site; local knowledge will be extremely helpful.  It should be
noted that the above allometric relationship provides only an estimate of the actual foraging
range.
       Once the foraging range of the species of interest is established, sediment samples
reflective of the species foraging range need to be collected.  It is important that the sediment
samples collected be representative of the sediments where the species normally forages and not
a homogenized sediment core representing the entire bed of contaminated sediment.  For most
organisms, the surficial sediments are most reflective of the organism's immediate
exposure/foraging history, and generally, smaller depths of the surficial layer, e.g., 0  to 2 cm, are
preferred over larger depths, e.g., 0 to 30 cm. For deeper burrowing organisms such  as some
clams and polychaetes, slightly larger surficial depths, e.g., 0 to 5 cm, might be more appropriate
of their recent exposure history.
       Beyond establishing the foraging range of the organism and the appropriate sediment
samples, the collection and analysis of adequate numbers of organisms and sediment samples is
necessary for deriving unbiased estimates of the mean concentrations of chemicals and their
variances. This document will not address the subject of sample collection, compositing and
analysis; see U.S. EPA (2002) for further information on these issues.  With unbiased estimates
of the mean concentrations, the BSAF for the specific site can be calculated using Equation 1.
       In any study design, it is important that biota samples be collected and composited in size
or age classes. For fish, dietary composition changes substantially with size and age, and these
changes will result in differences in BSAFs among size and age classes. For forage fish,
common classes are young-of-the-year, juveniles and adults, and for piscivorous fishes, common
classes are year classes, e.g., 2, 3, 6 and 10 years old. Mixing of fishes of different size/age
classes is not recommended because of the increased variance for the average chemical residue
in the organisms.
       Biota samples for chemical  analysis should never be composited by mixing different fish
species. Different fish species have different life histories and diets. BSAFs derived from
composite samples composed of different species will be highly biased by the individual species.
Further, resolving what the potential biases are for an individual species would require the
collection and analysis of that species.

-------
       When a CSOC-C« pair (or BSAF) is measured for a specific chemical, the measured value
incorporates all conditions and parameters existing at the location of interest.  The major
conditions and parameters incorporated into the Csoc-Ct pair (or BSAF) are (1) the distribution of
the chemical between the sediment and water column, (2) the relationship of the food web to
water and sediment and (3) the trophic status (or position in the food web) of the species.

CALCULATION OF BSAFs
       The BSAF is calculated from four measured variables (see Equation 1, and Equation 2
below): concentration of the chemical in the organism on a wet weight basis (C0), the lipid
content of the wet tissue (ft), the concentration of the chemical  in the sediment on a dry weight
basis (Cs) and the organic carbon content of the dry sediment (fsoc)-
                                   r  If    r
                           BSAF=  0/Je  =-^-                                  (Eq. 2)
                                  C  f    C
                                  ^ s I J soc  ^ soc
Depending upon the species and field sampling design, a CSOC-C« pair might be composed of only
one sediment sample and one tissue sample. However, a CSOC-C« pair might be composed of
multiple composite tissue samples and/or multiple sediment samples spanning the foraging range
of the organisms.  In either of these cases, samples could be analyzed in duplicate/replicate in the
laboratory, and thus, determining the Csoc and/or Ct values could involve the averaging of results
from replicate analyses and/or results from a number of different samples.

Tissue Samples
       For a specific tissue sample, the lipid normalized concentration of the chemical is
determined using the following equation
where n is the number of replicate chemical analyses and m is the number of replicate lipid
analyses. The variance of the Ct can be estimated using the equation (Mood et al., 1974):

-------
where sc  , sc  and sf are the standard deviations for the C«, C0 and/e, respectively; and r is
the correlation coefficient between C0 and/«. In cases where both lipid and chemical contents
are measured once, sc  can not be computed.
       For a group of tissue samples for a specific CSOC-C« pair, their individual Qs (computed
using Equation 3) are averaged using an appropriate statistical technique, dependent upon the
distribution of the individual C«s.  The resulting average is the average lipid normalized
concentration in the tissue for the specific Csoc-Ct, pair. If the tissue samples have different
numbers of organisms in each composite, e.g., three fishes in one sample and five fishes in the
second sample, the determination of a weighted average concentration is suggested. Typically,
residues in aquatic organisms are normally  or log-normally distributed.

Normally Distributed Residues: For a group of tissue samples for a specific CSOC-C« pair with
normally distributed residues, the weighted average lipid normalized concentration is computed
using the following equation, illustrated numerically using the two fish sample example from
above:

              Ct_mg = I(W, x C<_,)/5>, = (3 x Ce_one + 5 x C^0)/(3 + 5)               (Eq. 5)

where W{ is the number of organisms in composite /', CV; is the lipid normalized concentration of
the chemical in composite /' and CVavg is the weighted average lipid normalized concentration in
the tissues. The standard deviation of a weighted average  (sr    ) equals
                                                       ^t-avg
                                                                                  (Eq. 6)

-------
Note, a non-weighted average and standard deviation would be determined by setting the weights,
W{, equal to 1.0.

Log-Normally Distributed Residues: For a group of tissue samples for a specific CSOC-C« pair
with log-normally distributed residues, a weighted average lipid-normalized concentration is
suggested with the number organisms in the samples as the weights. The minimum variance
unbiased (MVU) estimators as described by Gilbert (1987) is suggested for log-normally
distributed residues. However, there are other appropriate estimators for log-normally
distributed data. MVU estimators are calculated by estimating the mean and variance of the
log-normal distribution of the  chemical concentrations:
                                                                                     . 7)
                                                  ^) S w> ]
where y and s2y are the arithmetic mean and variance of the n transformed values^ = In Q, and
wt are the weights for the individual samples (NIST, 1996).
       The minimum unbiased estimator for the mean (fi ), Q_avg in this application is
                                                         ,                        (Eq.9)

where exp( y ) is the sample geometric mean, and \j/n(t) (with t =  s2 12} is the infinite series:
                                     (n-l)3t2        (n-l)5t3
                                     ^    '              '          •            (Eq. 10)
                                        .
                                    2\n2(n + T)   3\n3(n + !)(« + 3)

-------
and this infinite series converges quickly, e.g., four to six terms are normally required for
convergence.
       The minimum unbiased estimator for the variance of JLI (CVavg in this application) is:
[ is>]}
rUJj

Vn
"*>-2)l
n- 1 J
                                                                                 (Eq. 11)
Note, a non-weighted average and standard deviation would be determined by setting the weights,
Wi, equal to 1.
       Using the appropriate statistical averaging technique, the average lipid normalized
concentration of the chemical (CVavg) is determined for a group of tissue samples for a specific
CSOc-Ct pair.

Sediment Samples
       The sediment sample(s) associated with a specific CSOC-C« pair would treated like the
tissue samples as described above and the overall average organic carbon normalized
concentration of the chemical in the sediment(s) (Csoc-avg) would be determined. Weights in the
averaging process should be set to equal to 1.

Calculating the BSAF
       For a specific CSOC-CV pair, the BSAFs for the pair would be determined by dividing its
weighted average lipid normalized concentration of the chemical in the tissue (CVavg) by its
average organic carbon normalized concentration of the chemical in the sediment (Csoc-avg).
                                                      (Eq. 12)
                                   AF = Ct_avg/Csoc_avg
The variance for the specific BSAF can be estimated using the equation (Mood et al., 1974):
         SBSAF
                c
                 soc-avg
c    )2 +BSAF2(sc     )2 - 2rsc
                c            c    C
t-avg              soc-avg        l-avg  soc-avg
                                                                   BSAF
                                                      (Eq. 13)
                                           10

-------
where SB
-------
where Csoc is the concentration of chemical in the sediment on an organic carbon basis (ng/kg
organic carbon) and Ci is the concentration of chemical in the organism on a lipid basis (|ig/kg
lipid).
       Plotting of Csoc against C« results in the following illustrative plot (Graph A), where the
slope of the line is the BSAF.  However, the slope of Cs plotted against C0 (Graph B) is not the
BSAF because these two measures of chemical concentrations are not organic carbon and lipid
normalized.  Use of the regression approach to derive the BSAF incorporates an implicit
assumption above and beyond those required for measuring a BSAF at a specific location. The
implicit assumption of the regression approach is that all Csoc-Ct, pairs must have or incorporate
the same underlying ecological conditions and parameters. Further, the regression approach
assumes that the relationship between Csoc and Ct is linear.
         ~
                 slope = Ay / Ax = BSAF
                                           1
                                           +•»
                                            .
                                           S)
                                           3
slope = Ay / Ax £ BSAF
               Csoc (ug/kg-organic carbon)
 Cs(ug/kg-dry weight)
       For a Superfund site, it is common to collect samples across the site with a number of
different sampling locations. For example, consider a New England stream with a series of three
dams, and assume that 2-year-old carp and sediment are collected and analyzed in each reservoir.
Further assume that representative and unbiased mean concentrations of the chemical were
determined for fish and sediment in each reservoir. Thus, three sets of paired carp-sediment
observations would be determined, one for each of the three reservoirs.
       These paired observations of Csoc and C« can be plotted (Graphs C & D). In Graph C, the
pairs form a nearly linear relationship suggesting that the underlying conditions for the CSOC-C«
pairs are consistent across the samples and thus allow estimation  of the BSAF using the
regression approach.  In Graph D, the pairs form no easily defined linear relationship,  and in this
case, there is too little variability in the CSOC-C« pairs for the regression approach to be useful in
                                           12

-------
estimating the BSAF. In Graph E, a situation where four sets of paired carp-sediment data were
determined, three of the pairs form a nearly linear relationship, but one pair is different from the
other pairs. Depending upon how one draws the line, either the triangle or square data in Graph
E could be the different (or outlier) Csoc-Ci pair. In this case, one or more of the CSOC-C« pairs
likely have different underlying conditions, and, thus, it would be inappropriate to estimate the
BSAF using the regression approach.
       As discussed above, each carp-sediment pair is location specific and each pair
incorporates all of the major conditions and parameters existing at the location.  In order to use
the regression approach with pairs of CSOC-C« observations, the major conditions and parameters
must be the same for all locations. This requirement is the implicit assumption incorporated into
the regression approach.  Mixing of Csoc-Ct paired observations with different conditions and
parameters will result in Csoc-Ci plots where the CSOC-C« pairs will form a non-linear relationship
(e.g., possibly Graph E), and in all likelihood, a BSAF with poor predictive power.1
       For the above examples, if the BSAF for each pair of CSOC-C« observations is plotted
against Csoc, the following graphs are obtained (Graphs CC, DD and EE). The relationships
among the Csoc-Ct, pairs in the above graphs remain in the graphs based upon the BSAFs;
compare Graphs C to CC, D to DD and E to EE.  In essence, by calculating the BSAF, one has
mathematically removed the concentration dependence shown in Graphs C, D and E. For further
comparison purposes, the BSAF for each pair of CSOC-C« observations is also plotted against C«
(Graphs CCC, ODD and EEE).
       The graphs,  i.e., C, D, E, CC, DD, EE, CCC, ODD and EEE, are some of the plots
recommended for evaluating trends and underlying conditions associated with the CSOC-C« pairs.
We recommend that these plots be completed prior to performing the final calculations for
determining the site-specific BSAF. These plots will help in identifying sources of variation and
error in the individual Csoc-Ct, pairs and BSAF values.
lrThe mixing of Csoc-Ce paired observations with different conditions and parameters is not recommended for the
averaging approach as well. BSAFs with poor predictive power (i.e., accuracy) will, in all likelihood, result when
different conditions and parameters exist across the individual Csoc-Ct pairs used in the analysis.
                                            13

-------
          Csoc (u9/k9-°rganic carbon)  csoc (ug/kg-organic carbon)   csoc (ug/kg-organic carbon)
cc
•
* •

u.
m
DD
"••

U_
CO
EE
A
soc (ug/kg-organic carbon) csoc (ug/kg-organic carbon) csoc (ug/kg-organic carbon
ccc

• •

$

ODD

"*

$
m
EEE

A
              C( (ug/kg-lipid)
                                   C. (ug/kg-lipid)
( (ug/kg-lipid)
THE REGRESSION APPROACH
       A key consideration in using the regression approach is to realize that both Csoc and Ct
are measured with error.  With the simple linear regression least-squares technique, one variable
(the Fs) are measured with error while the other variable (the Jf s) are fixed and have no error.
Simple linear regression is referred to as model I regression analysis. When Xs and Fs are both
measured with error, one of a number of model II regression techniques will be more appropriate
and unfortunately "the appropriate method depends on the nature of the data" (Sokal and Rohlf,
1995). Sokal and Rohlf (1995) provide  an excellent discussion on model II regression and the
techniques of geometric mean regression (also called reduced major axis, standard major axis, or
relation d'allometrie), slope of the major axis, Bartlett's three-group method and Kendall's
robust line-fit method. Additionally, Sokal and Rohlf (1995) discuss theBerkson case of model
II regression where model I regression is appropriate.
       It is suggested that the determination of the slope of Csoc-Ct pairs be performed using the
geometric mean regression technique (Halfon, 1985; Sokal and Rohlf, 1995) because with this
technique the slope of the regression is not dependent upon the scale of the Jfs and Fs used in
                                            14

-------
the analysis. Additionally, Ricker (1973) has recommended that the geometric mean regression
technique be used for determining functional relationships (i.e., slope) when "the variability is
mostly natural ... inland F'; the case when sediment samples representative of the organism's
actual exposure history are collected.
       For the geometric mean regression technique, the slope of the geometric mean regression
line is the geometric mean of the slopes of the following two linear regression least-squares lines:

                                    y = a + b"x                                  (Eq. 16)
and
                                    x = c + dy                                   (Eq. 17)

The slope of the geometric mean regression line is computed as the geometric mean of b" and
lid:
                                                                                (Eq. 18)

The intercept a is computed as done in linear regression:

                                    a = y-bx                                   (Eq. 19)

For further details on the geometric mean regression technique, the reader is referred to Halfon
(1985) and Sokal and Rohlf (1995). In addition,  a Microsoft® Office Excel add-in function for
geometric mean regression can be downloaded from the following URL.
http://www.lpc.uottawa.ca/data/scripts/index.html.
       There are two different ways in which a regression model can be used. First, the
regression model can be used to determine the BSAF for the chemical and species of interest.
Second, the regression model can be used to predict residues in biota given a residue in sediment
or vice-versus. When geometric mean regression and model I regression techniques are
performed using log transformed data and residues in biota or in sediment are predicted using the

                                           15

-------
In-ln regression equation, the back transformed arithmetic values of the predictions (in log space)
are bias low (Newman, 1993). As shown in Equation 9, the variance of the log transformed data
needs to be included in the back calculation. The MVU estimators, previously discussed, or
"less efficient but simpler estimators" as presented by Gilbert (1987) are two approaches for
deriving potentially unbiased predictions of the residues.
       Monte Carlo uncertainty techniques (U.S. EPA, 2001) are suggested for determining
uncertainties associated with BSAFs and/or predictions from the regression model approaches.
With this approach, uncertainties from measurement error in biota and sediment residues, biota
lipid content and sediment organic carbon content and from the fit of the regression model can be
incorporated into the BSAFs and/or predictions from the regression model.
       The fit of the regression model/line to the data can be assessed using cross-validation
techniques (Neter et al., 1996). Cross-validation involves the splitting of the data into a training
set and prediction set, and the prediction set is used to evaluate the reasonableness or predictive
ability of the model developed with the training set of data.  Fairly close agreement between the
PRESS statistic (prediction sum of squares) and SSE (error sum of squares) for a regression
model would suggest that its MSE (error mean square) would be a reasonable indicator of the
model's predictive ability (Neter et al., 1996).

RESPONSES TO QUESTIONS RAISED IN EXPECTED OUTCOMES
Do I fit a straight line through the data?
       Yes. If the Csoc-Ci observations do not form a straight line, then one must figure out why
       data diverge from the linear relationship.  Reasons for the Csoc-Ci observations diverging
       from a straight line include (Note, there are many more causes than those listed)
      •  The organisms in different Csoc-Ce pairs reside at different trophic levels in the food
         web.
      •  The organisms in different Csoc-Ce pairs have dramatically different diets even though
         they reside at the same trophic level. For example, for one pair, the organisms might
         consume primarily zooplankton while  for other pairs, the organisms might consume
         primarily benthic invertebrates.
      •  The bioavailability of the chemical in the contaminated sediment varies substantially
         across the CSOC-C« pairs.
      •  Across the sampling locations, inputs of the  chemicals to the site differ substantially.
         For example, consider a harbor where  organisms residing in the lower parts of the
         harbor are exposed to runoff and ground water seepage from an old industrial site
                                           16

-------
         while organisms residing in the upper parts of the harbor are not exposed this to
         discharge.
       •  Different populations of the same species. For example, in the Hudson River, there
         are resident and migratory striped bass fish populations, and chemical residues in the
         populations differ widely.

Do I plot my data on a log-log scale?
       It is suggested that data should be plotted and examined on both arithmetic and log  scales
       in the interest of fully understanding the data set in exploratory analysis. Plotting of the
       data with arithmetic-arithmetic scales might be preferred because in arithmetic-arithmetic
       space, the slope of the line is the BSAF when CSOC-C« pairs are used.

       In log-log scales, the slope of the regression line (log Ct regressed against log Csoc)  is not
       the BSAF.  The logarithm of the BSAF is the intercept of the regression line:

                                 log Ct  =  log [Csoc x BSAF]                       (Eq. 20)

                            log C,  = slope x log Csoc + log BSAF                   (Eq. 21)

Do I force the line through the origin?
       It is suggested that with arithmetic data, regression be performed initially with an
       intercept, and this intercept should be checked to determine if it is significantly different
       from zero.  If the intercept is not significantly different from zero, the regression should
       be redone with no intercept.  When the intercept is significant,  the slope of the regression
       line is not equal to the BSAF. In these cases, the regression line can be used to predict
       residues in biota or sediment given a residue in sediment or biota, respectively.  When the
       intercept is not used (regression line is forced through the origin), the slope of the
       regression line is the BSAF.

       If one is performing the regression with log-log scales, it is suggested that regression be
       performed with an intercept. The intercept is equal to the log of BSAF.
                                            17

-------
How do I handle non-detects?
       Non-detects will be interpreted as analytes reported as being below the method detection
       limit (MDL) of the analytical method. Generally, these analytes are flagged with the "U"
       code and the amount reported is the MDL. For concentration values greater than the
       MDL and less than the practical quantification limit (QL), the concentration values are
       reported and these values are flagged typically with the "J" code. The "U" and "J" flags
       are most often defined as unknown/not-detected and estimated, respectively.  The QL is
       3-5 times the MDL (U.S. EPA, 1989).

       For cases where the amount reported is flagged with the "U" code,  Superfund most often
       uses l/2 of the MDL in subsequent calculations with the data (U.S. EPA, 1989). For
       values greater than the MDL and less than the QL, these values are generally reliable and
       can be used directly. In both cases, one should carefully track the effects of the flagged
       data points through their subsequent calculations.  Calculation of BSAFs using estimates
       derived from the MDL for concentrations in sediment and/or biota (i.e., simple-
       substitution methods with left-censored data [Helsel, 2005]) can result in spurious and
       non-predictive BSAFs (Lee and Helsel, 2007).

       When plotting of the different CSOC-C« pairs is done, different symbols/colors should be
       used for the above two flagged data types. Examine this plot to see if the flagged data
       aligns with the general trend of the CSOC-C« pairs that are not flagged.  Chemicals with the
       "U" and "J" flags should be treated separately. This comparison/evaluation should be
       performed by doing the regression analysis without the flagged data, without the "U"
       flagged data, and with the flagged data alone.  Significance testing of the slopes (asking
       whether the slopes are different) should be done and these comparisons should help in
       determining whether to include or exclude the flagged data in the final regression.
       Examination of the residual plots should be done and will help greatly in determining
       whether to include or exclude chemicals with unknown concentrations ("U" flagged)
       and/or with concentration between MDL and PQL ("J" flagged).

       There are statistical approaches for averaging with censored data, i.e.,
       non-detects/unknown concentrations (El-Shaarawi and Dolan, 1989; Newman et al.,  1989;
                                           18

-------
       Newman, 1995).  In a recent publication, Helsel (2005) provides a clearly written and
       very thorough presentation on practical solutions to this issue. These approaches can be
       used with normally and log-normally distributed data.

       It is recommended that unbiased means be calculated only if less than 20% of the
       reported values are reported as being non-detect (Berthouex and Brown, 1994).

       Some of the techniques/approaches suggested above are dependent upon the number
       Csoc-Ct pairs.  With a limited number Csoc-Ci pairs, e.g., 3-5, these approaches for
       handling data with unknown concentrations will be limited.

How do I estimate the confidence interval around a prediction?
       For normally distributed data: The standard error of the geometric mean regression slope
       can be approximated by the standard error of the linear least-squares regression slope
       (Sokal and Rohlf, 1995).  (Note, the slope is the BSAF). Most linear least-squares
       regression programs (SAS) or spreadsheets (IBM® Lotus® 1-2-3 and Microsoft® Office
       Excel) calculate the standard error of the slope. The 95% confidence intervals on the
       slope would be calculated using Student t-value:

                           Upper 95% CI = b + sb x r0.05 [»-2]                       (Eq. 22)

                           Lower 95% CI = b - sb x t0.05 [n_2}                       (Eq. 23)

       where b is the geometric mean regression slope, sb is the standard error of the geometric
       mean regression slope, n is the total number of data points used in the geometric mean
       regression, and ^o.os is the two tailed Student-t for an a = 0.05.

       For log-normally distributed Csoc-Ct data, the intercept of the geometric mean regression
       of the log transformed data (y = ln(x)) is the In(BSAF) (Equation 21). The variance of the
       geometric mean regression intercept can be found using the equation of (Ricker, 1973):
                                           19

-------
   -avg
                                                                          (Eq.24)
where yi's are the natural logarithm of Csoc values, y2's are the natural logarithm of C.
values, s2y  is the variance of the y2 values, s2y    is the variance of the intercept,
is the arithmetic mean of the y1 values, r is the Pearson correlation coefficient between
yl and y2 and b is the geometric mean regression slope. Calculation of the confidence
limits for the intercept are problematic because "no methods of computing accurate
confidence limits appears to have been developed" (Ricker, 1973). It is suggested that
the method of Land as reported by Gilbert (1987) be used to calculate the confidence
limits.  Land's equation for the upper one-sided 100(l-a)% and lower one-sided 100a%
confidence limits for the arithmetic space average are
                        ULl-a =
                                               s H
                                                y
(Eq. 25)
                                          ,    ,a
                        LLa = exp \y + 0.5s2 +  /                           (Eq. 26)
                                  I           Vw-1J
where H\.a and Ha are obtained from tables provided by Land (see Gilbert, 1987).
However, Singh et al. (U.S. EPA, 1997) reports that for small sample sizes, Land's
method can result in unacceptably large (and potentially small) values for the upper (and
lower) one-sided confidence levels when the coefficient of variation is larger than 1.0.
Singh et al. (U.S. EPA, 1997) recommend that other methods should be used for
computing the confidence limits.

If the regression model is used to predict residues in biota for known residues in sediment
or vice-versa, confidence limits on the predicted residues can be estimated.  For
regression lines developed using Model I regression techniques in arithmetic space,
confidence limits can be readily derived using standard statistical  software,  e.g.,
                                    20

-------
       SAS/STAT software. For geometric mean regression performed in arithmetic space, the
       confidence limits are determined using the equations:
                                       = yi±Slxt005[n_2]                        (Eq.27)

                           s2 = 4(1 - r2) + b(\ - r)2(^ - Xmg)2                   (Eq. 28)

       where yt is the predicted value,  s7 is the standard deviation of the predicted value, sy is
       the standard deviation of the /variables used in the regression, Xavg is the average of the
       X variables used in the regression, r is the Pearson correlation coefficient, X\ is the
       observation which predicts the yt value and b is the slope of the geometric mean
       regression line. As noted above, for geometric mean regression, calculation of the
       confidence limits are problematic because "no methods of computing accurate confidence
       limits appear to have been developed" (Ricker, 1973), and thus, the symmetrical
       approximation is used (Equation 27).

       For geometric mean regression performed with log transformed data, confidence limits
       would be calculated using Land's method (Equations 25 and 26) with the variance  of the
       predicted value (in log space) calculated using Equation 28 with the log transformed data.

Do I normalize by organic carbon and lipid?
       Yes.  The BSAF is the ratio of the concentration in the biota on a lipid basis to the
       concentration in the sediment on an organic carbon basis.

       By working with CSOC-Q pairs (which  are organic carbon and lipid normalized), one
       places these concentrations on a thermodynamic basis. By expressing the concentrations
       on a thermodynamic basis, the concentrations of the chemicals in sediment and tissue are
       corrected for differences in bioavailability and partitioning behavior.  By using the
       thermodynamic based expressions, the CSOC-CV pairs are expressed equivalently.
                                           21

-------
Do I use weighted regression?
       There are two general cases.  First, when the Csoc and C« are individual observations (not
       averages), then individual Csoc-Ct pairs should be given equal weights. Second, if the
       Csoc and Ct are averages, then individual CSOC-C« pairs should be given equal weights
       except if the Csoc and C« variances are highly heterogeneous (p < 0.001).  If the variances
       are highly heterogeneous (very dissimilar), then perform both weighted (by the inverse of
       the variance) and unweighted regression and compare slopes. The heterogeneous
       variances might or might not have any appreciable effect on the  slope. If appreciable
       effects exist on the slope, then the weighted regression model is  preferred.

If I transform the data, do I need to use weighted regression?
       See answer to previous  question.  The variances would need to be evaluated in log space
       for heterogeneity.

How do I take into account the home range of the biota whose tissue I measured?
       As explained earlier, one should have knowledge of the organism's foraging range.  With
       this information, sediment samples across the foraging range should be collected and
       analyzed, and the sediment samples should be representative of the organism's immediate
       life history.  Accounting for the foraging range of the organism is done by averaging the
       analytical results for sediment samples collected within the organism's foraging range.

What if my r2 is low and my data do not plot with the appearance of an increasing linear
function?
       Occurrence of this type of behavior in the plot of CSOC-CV pairs strongly suggests that
       different sampling locations have either different underlying conditions and parameters
       (e.g., different food webs, different organism populations, differences in chemical
       bioavailability, different diets, etc.) or a very limited dynamic range.  In these cases, one
       will need to determine the factors causing these differences. If one can not resolve these
       difference, the same problems will also exist with other methods for predicting chemical
       residues, e.g., food web models, because these methods require this knowledge as well.
       In general, when this type of behavior is observed, the problem is in the data itself, and
                                           22

-------
       no statistical analysis method will circumvent the problem. Without resolving these
       differences, the data should not be used.

How do I deal with outliers?
       For CSoc-Ci pairs that are very different from the general population of CSOC-C« pairs (i.e.,
       appear to be outliers), always make sure that the data are not miscalculated, transposed,
       or misidentified. Further, ensure that no other types of methodological errors are
       associated with the data. If the data pairs appear to be numerically and methodologically
       correct, statistical techniques are available for the testing of the data pairs to determine if
       they are outliers.  If a data pair is found to be significantly different from the general
       population of CSOC-Q pairs, the outlier should be excluded from the regression analysis,
       because it is likely indicative of different underlying ecological conditions.

       Snedocor and Cochran (1980, p 167-168) and Neter et al. (1996, p. 374-375) present a
       statistical method for linear regression where the regression is performed without the
       outlier, and then the outlier is tested as to whether it is within sampling error of the
       population. The test criterion is a t-value. Because the outlier is not chosen randomly, to
       ensure a 1 - a confidence, the calculated t-value is compared to the t-value from the
       t-table using a'; where a' equals a divided by n.  Probability values for testing for outliers
       should generally be conservative, e.g., a = 5% or a = 1%. With an n of 20, the critical
       t-value for an a of 5% would be found using an a' of 0.25% with the t-table.
       There are other statistical techniques for outlier detection beyond those described above,
       i.e., m-estimators, s-estimators, least median of squares, and least trimmed squares, that
       provide significant advantages in the detection of outliers and leveraged data pairs.
       Although a bit dated in terms of the software applications used, Rousseeuw and Leroy
       (1987) provides an excellent introduction and discussion of these techniques.  A brief but
       helpful description and application of these  techniques has been provided by Chen (2002).
       These advanced outlier detection algorithms are in numerous statistical packages
       including SAS/STAT software (SAS Institute, Inc.), SYSTAT software (SyStat Software,
       Inc.) and R Statistics software (freeware GNU package).
                                            23

-------
Do I develop a separate regression for each compound in a mixture?
       Yes.  This is most desirable because different chemicals have different chemical
       properties. For example, differing behavior is observed with PCBs where fish appear to
       be slightly enriched with the higher chlorinated PCB congeners relative to the distribution
       existing in the sediments.

When the value of x (i.e., exposure point concentration in sediment) is uncertain (e.g., when
biota migrate), how do I account for this in my regression?
       The best method of accounting for organism migration is to design your sampling plan
       for the organism such that the organisms are collected just before they migrate back out
       of the site. This approach maximizes time the organism spends at the site of interest, and
       provides the best estimate of the residue in the organism based upon the organism's
       exposure in its foraging range at the site.

       Sampling design simulations (Burkhard, 2003) for the measurement of BSAFs (or Csoc-Ct
       paired observations for determinations of BSAFs) suggest that spatial variability in the
       concentrations of the chemical does not add large uncertainties into the measured BSAF
       beyond those caused by temporal variability of the chemical concentrations in the water.
       Further, random walk migration simulations suggested that BSAFs (or Csoc-Ct, paired
       observations for determinations of BSAFs) can be measured with low uncertainty even
       when extreme variability in spatial concentrations exist at the field site, provided the
       measurements are performed in more contaminated locations of the site for higher Kow
       chemicals, i.e., >105 (Burkhard, 2003). The requirement of performing the field
       measurements at the more contaminated locations within the site will limit the analysis of
       BSAF because the range  of CSOC-C« pairs will be small.

       If the organisms spend a very short time at the site, e.g., the fish migrate through the site
       in a few days to a week, determination of BSAFs is not recommended even though the
       BSAF can be measured.  The sediments from the site would not be reflective of the fish's
       recent exposure history.
                                          24

-------
Are there ways to improve my study design knowing what I know now about regression?
       First, the importance of collecting sediment samples that are reflective of the organism's
       foraging range can not be overstated.  Spending time and resources to better define the
       relationship of the organisms to the sediments will greatly decrease the uncertainty
       associated with the resulting BSAFs. In addition, predictions using food web models,
       both steady-state or dynamic, will greatly improve because of the improved knowledge
       on the underlying relationship between the sediment and organism.

       Second, it is important that composite samples reflective of the biota at the site of interest
       be collected. Clearly, collection and analysis of more organisms will provide a better
       measure of the average residue in the biota. However, biota samples consisting of mixed
       age classes is not recommended, e.g., juvenile and adult minnows, or 1-year-old and
       3-year-old largemouth bass. Minimizing the differences in age (or size) will  improve the
       quality of the biota samples and ultimately provide smaller variances for the biota
       residues.  Typically, fishes of given size (e.g., the weight of the smallest fish is not less
       than 75% of the weight of largest fish) or age group (e.g., 3-year olds) are collected.

       After sample collection and analysis, plans should be made to visually examine the data
       by making plots of Csoc-Ct paired observations and plots of BSAFs against Csoc. The
       GocS, C«s and BSAFs should be plotted on a GIS type plot to determine if the values are
       correlated with geographical trends and conditions, e.g., the BSAFs increase  with
       increasing distance away from the source on a river. Any additional information or
       understanding  one can glean for the site will be advantageous in the remediation decision
       process.

       As part of the overall study plan for successfully measuring a BSAF, time and resources
       should be allocated for resolving causes of non-linearity (when they exist) in the Csoc-Ct,
       paired observations.  Resolving why will greatly aid in understanding the complexities of
       the site, and provide decision makers and risk assessors a much better basis for assessing
       and evaluating remediation options.
                                           25

-------
REFERENCES

       There are many standard college level textbooks on statistical analysis which include

regression analysis.  Almost all include discussion and examples on the linear least-squares

regression technique. Coverage of geometric mean regression analysis technique is often not

addressed in standard college level textbooks.  Halfon (1985) is an excellent reference on

geometric mean regression.  Sokal and Rohlf (1995) address the subject of model II regression

including geometric mean regression. Neter et al. (1996) addresses the statistics of linear models.
Ankley, G.T., P.M. Cook, A.R. Carlson et al. 1992. Bioaccumulation of PCBs from sediments
by oligochaetes and fishes: Comparison of laboratory and field studies. Can. J. Fish. Aquat. Sci.
49:2080-2085.

Berthouex, P.M. and L.C. Brown. 1994. Statistics for Environmental Engineers. Lewis
Publishers/CRC Press, Boca Raton, FL.

Bierman, V.J., Jr.  1990.  Equilibrium partitioning and biomagnification of organic chemicals in
benthic animals. Environ. Sci. Technol.  24:1407-1412.

Burkhard, L.P. 2003.  Factors influencing the design of bioaccumulative factor and
biota-sediment accumulation factor field studies. Environ. Toxicol. Chem.  22(2):351-360.

Burkhard, L.P., P.M. Cook and D.R. Mount.  2003.  The relationship of bioaccumulative
chemicals in water and sediment to residues in fish: A visualization approach. Environ. Toxicol.
Chem. 22(11):2822-2830.

Burkhard, L.P., P.M. Cook and M.T. Lukasewycz.  2004. Biota-sediment accumulation factors
for polychlorinated biphenyls, dibenzo-p-dioxins, and dibenzofurans in southern Lake Michigan
lake trout (Salvelinus namaycush).  Environ. Sci. Technol. 38(20):5297-5305.

Campbell, C. 1982. Evaluating propagated and total error in chemical property estimates. In:
Handbook of Chemical Property Estimation Methods: Environmental Behavior of Organic
Compounds, Appendix C, WJ. Lyman, W.F. Reehl and D.H. Rosenblatt, Eds. McGraw-Hill,
New York, NY.

Chen, C. 2002. SUGI27 Proceedings. Paper 265-27, Robust Regression and Outlier Detection
with the ROBUSTREG Procedure, SAS Institute Inc., Cary, NC.  Available at
http://www2.sas.com/proceedings/sugi27/p265-27.pdf

Clarke J.U. and V.A. McFarland. 2000.  Uncertainty analysis for an equilibrium
partitioning-based estimator of polynuclear aromatic hydrocarbon bioaccumulation potential in
sediments. Environ. Toxicol. Chem.  19:360-367.
                                          26

-------
DiToro, D.M., C.S. Zarba, DJ. Hansen et al. 1991. Technical basis for establishing sediment
quality criteria for nonionic organic chemicals using equilibrium partitioning. Environ. Toxicol.
Chem. 10:1541-1583.

El-Shaarawi, A.H. and D.M. Dolan. 1989. Maximum likelihood estimation of water quality
concentrations from censored data. Can. J Fish Aquat. Sci.  46(6): 1033-1039.

Ferraro, S.P., H. Lee Jr., RJ. Ozretich and D.T. Specht.  1990. Predicting bioaccumulation
potential: a test of a fugacity-based model. Arch. Environ. Contam. Toxicol.  19(3):386-394.

Gilbert, R.O.  1987. Statistical Methods for Environmental Pollution Monitoring. John Wiley &
Sons, New York, NY.

Halfon, E. 1985. Regression method in ecotoxicology: A better formulation using the geometric
mean functional regression. Environ. Sci. Technol. 19:747-749

Helsel, D.R.  2005. Nondetects and data Analysis. Statistics for Censored Environmental Data.
John Wiley & Sons, New York, NY.

Lake, J.L., N. Rubinstein and S. Pavignano. 1984. Predicting bioaccumulation: Development of
a simple partitioning model for use as a screening tool in regulating ocean disposal of wastes. In:
Fate and Effects of Sediment-Bound Chemicals in Aquatic Systems, K.L. Dickson, A.W. Maki
and W. A. Brungs, Eds. Pergamon Press, New York, NY. p. 151-166.

Lake, J.L., N. Rubinstein, H. Lee II, C.A.  Lake, J. Heltshe and S.  Pavignano.  1990.  Equilibrium
partitioning and bioaccumulation of sediment-associated contaminants by infaunal organisms.
Environ. Toxicol. Chem. 9:1095-1106.

Lee, L. and D. Helsel.  2007.  Statistical analysis of water-quality data containing multiple
detection limits II: S-language software for nonparametric distribution modeling and hypothesis
testing.  Comput. Geosci.  33(5):696-704.

McElroy, A.E. and J.C. Means.  1988. Factors affecting the bioavailability  of
hexachlorobiphenyls to benthic organisms.  In: Aquatic Toxicology and Hazard Assessment,  Vol.
10, W.J. Adams, G.A. Chapman and W.G. Landis, Eds.  American Society for Testing and
Materials, Philadelphia, PA. p.  149-158.

McFarland, V.A. and J.U. Clarke.  1986.  Testing bioavailability of poly chlorinated biphenyls
from sediments using a two-level approach. In: Proceedings of the US Army Engineers
Committee on Water Quality, 6th Seminar, R.G. Wiley, Ed.  Hydraulic Engineering Research
Center, Davis, CA. p. 220-229.

Minns, C.K.  1995. Allometry of home range size in lake and river fishes. Can. J. Fish. Aquat.
Sci.  52:1499-1508.

Mood, A.M., F.A. Graybill and D.C. Boes.  1974.  Introduction to the Theory of Statistics, 3rd ed.
McGraw-Hill, New York, NY.
                                          27

-------
NIST (National Institute of Standards and Technology). 1996. NIST Dataplot. Available at
http://www.itl.nist.gov/div898/software/dataplot/.

Neter, J., M.H. Kutner, CJ. Nachtsheim and W. Wasswerman. 1996. Applied Linear Statistical
Models, 4th ed. Irwin, Chicago, IL.

Newman, M.C. 1993.  Regression analysis of log-transformed data: statistical bias and its
correction.  Environ. Toxicol. Chem.  12:1129-1133.

Newman, M.C. 1995.  Quantitative Methods in Aquatic Ecotoxicology. Lewis/CRC Press,
Boca Raton, FL.

Newman, M.C., P.M. Dixon, B.B Looney and I.E. Finder III.  1989. Estimating mean and
variance for environmental samples with below detection limit observations.  Water Res. Bull.
25(4):905-916.

Ricker, W.E.  1973.  Linear regression in fishery research.  J. Fish. Res. Board Can.  30:409-434.

Rousseeuw, PJ. and A.M. Leroy.  1987.  Robust Regression and Outlier Detection. Wiley,
Hoboken, NJ.

Snedecor, G.W. and W.G. Cochran.  1980.  Statistical Methods, 7th ed. Iowa State University
Press, Ames, IA. p.  167-168.

Sokal, R.R. and FJ. Rohlf. 1995.  Biometry: The Principles and Practice of Statistics in
Biological Research, 3rd ed. W.H. Freeman and Co., New York, NY.

Thomann, R.V., J.P. Connolly and T.F. Parkerton. 1992.  An equilibrium model of organic
chemical accumulation in aquatic food webs with sediment interaction.  Environ. Toxicol. Chem.
11:615-629.

Tracy, G.A. and DJ. Hansen. 1996. Use of biota-sediment accumulation factors to assess
similarity of nonionic organic chemical exposure to benthically-coupled organisms of differing
trophic mode.  Arch. Environ. Contam. Toxicol. 30(4):476-475.

U.S. EPA.  1989. Risk Assessment Guidance for Superfund: Volume I. Human Health
Evaluation Manual (Part A). U.S. Environmental Protection Agency, Office of Emergency and
Remedial Response, Washington, DC. EPA/540/1-89/002.

U.S. EPA.  1997. The Lognormal Distribution in Environmental Applications. U.S.
Environmental Protection Agency, Washington, DC. EPA/600/S-97/006.

U.S. EPA.  2001. Risk Assessment Guidance for Superfund: Volume III - Part A, Process for
Conducting Probabilistic Risk Assessment. U.S. Environmental Protection Agency, Washington,
DC. EPA/540/R-02/002.

U.S. EPA.  2002. Guidance on Choosing a Sampling Design for Environmental Data Collection.
U.S. Environmental Protection Agency, Washington, DC.  EPA/240/R-02/005.
                                          28

-------
Wong, C.S., P.D. Capel and L.H. Nowell. 2001. National-scale, field based evaluation of the
biota-sediment accumulation factor model. Environ. Sci. Technol. 35(9): 1709-1715.
                                          29

-------
                                     APPENDIX
     ECOLOGICAL RISK ASSESSMENT SUPPORT CENTER REQUEST FORM
Problem Statement: What is the most appropriate method to estimate the Biota Sediment
Accumulation Factor (BSAF) from paired observations of concentrations in biota and
sediment?
Requestors: Sharon Thorns and Al Hanke, Region 4
Background: BSAF is a parameter describing bioaccumulation of sediment-associated organic
compounds or metals into tissues of ecological receptors. In a typical experiment to measure
bioaccumulation the researcher collects colocated sediments and tissues over a gradient of
contamination. Simple compared to bioaccumulation and trophic transfer models, it finds its use
at Superfund sites to estimate progress toward achieving a protective tissue concentration as
sediments become cleaner.
Expected Outcome: The expected outcome is a white paper addressing the following questions
regarding the use of regression to obtain the most accurate estimate of BSAF:

Do I fit a straight line?
Do I plot my data on a log-log scale?
Do I force the line through the origin?
How do I handle non-detects?
How do I estimate the confidence interval around a prediction?
Do I normalize by organic carbon and lipid?
Do I use weighted regression?
If I transform the data, do I need to use weighted regression?
How do I take into account the home range of the biota whose tissue I measured?
What if my r2 is low and my data do not plot with the appearance of an increasing linear
   function?
How do I deal with outliers?
Do I develop a separate regression for each compound in a mixture?
When the value of x (i.e., exposure point concentration in sediment) is uncertain (e.g., when
   biota migrate), how do I account for this in my regression?
Are there ways to improve my study design knowing what I know now about regression?

Where the topics are covered by standard books or web sites on statistics, they may be
   referenced. A few case studies may be useful to illustrate the concepts.
Additional Comments: Requestor can provide case studies.
                                          30

-------