903R93006
                               CBP/TRS 76/93
                                 June 1993
         Incorporating Uncertainty
         Associated with Censored
             Water Quality Data in
                  Parametric Trend
                           Analysis
TD
225
.C54
162
copy 2
        Chesapeake Bay Program
                                I Printed on recycled paper

-------
   Incorporating Uncertainty
   Associated with Censored
     Water Quality Data in
        Parametric Trend
              Analysis
           Chesapeake Bay Program
               June 1993
    Produced under contract to the U.S. Environmental Protection Agency
             Contract No. 68-WO-0043
                       ri o y ^A R-c^ioiv in
                       *j o. iJ-> •"• ""-^  , •v-r.ir
                       o-^vl Center Tor .n-irv
Printed by the U.S. Environmental Protection Agency for the Chesapeake Bay Program

-------
                              ENDORSEMENT
   The  Chesapeake Bay Program Monitoring Subcommittee has reviewed
   the  assumptions and methods of data analysis used  in this  i ->port
   and  finds  them appropriate  for the  analysis conducted.   The
   findings of this report  are consistent with and supported  by the
   analytical techniques employed.
csc.SA2.6/92    Incorporating Uncertainty  in Parametric Trend Analysis • Page  i

-------

-------
                             TABLE  OF  CONTENTS






     EXECUTIVE SUMMARY  	   vii




     I.  INTRODUCTION	     1




     II.  SUBSTITUTION METHODS  	     2




     III.  WEIGHTING METHOD  	     3




     IV.  CONCLUSIONS	    13




     V.  RECOMMENDATIONS	    14




     VI-.  REFERENCES	.-_.	  .    16




     APPENDIX 1: Annotated  Bibliography   	    17




     APPENDIX 2: Proof  of upper bound on  variance  	    25
esc.SA2.6/92    Incorporating Uncertainty  in Parametric Trend Analysis • Page iii

-------

-------
                           LIST OF FIGURES
   Figure  1.  Sample  time  series  of  observed  mean  monthly
        orthophosphate    (P04F)    concentrations,    showing
        detection  limits	

   Figure  2.  Sample  time  series  of  observed  mean  monthly
        orthophosphate (P04F) concentrations with error bars.
        Below detection  data prior to March 1985 set  to  1/2
        the prevailing detection  limit	

   Figure  3.  Sample  time  series  of  observed  mean  monthly
        orthophosphate (P04F) concentrations, with additional
        censoring  levels  (ADL's)  shown	
                            LIST OF  TABLES

   Table  1.  Trend estimates  from weighted and  non-weighted
        regressions with different  levels  of  censoring,  with
        below  detection observations  set  to the  detection
        limit	    10

   Table  2.  Trend estimates  from weighted and  non-weighted
        regressions with different  levels  of  censoring,  with
        below  detection  observations set to  one-half  of  the
        detection limit	    11

   Table  3.  Trend estimates  from weighted and  non-weighted
        regressions with different  levels  of  censoring,  with
        below  detection observations set  to zero	    12
esc.SA2.6/92    Incorporating Uncertainty  in Parametric Trend Analysis •  Page v

-------
                            EXECUTIVE  SUMMARY
         Water  quality  data   are   often  collected  in  monitoring
     programs to servr as a basis  for  the estimation  of trends.  A
     problem arises  in trend  estimation  when data series  contain
     observations  reported  as below  detection  limit  (BDL).    The
     several published methods that deal  with BDL observations are
     generally  oriented  towards  obtaining   the   "best  value"  to
     substitute  for  the censored  values.    We suggest  a  weighted
     regression approach, as a modification of the usual substitution
     approach,  which incorporates the  fact that a detection limit is
     a measure of precision of the  data.  When the definition of a
     detection  limit  associated  with   a   datum  is   unknown,  a
     conservative lower bound  (with  a theoretical justification) for
     the precision of the observation  is given.  We illustrate this
     approach using water quality monitoring data collected by the
     Chesapeake Bay Program.

          The  results  show that  weighted regression  estimates of
     linear  trends  are much less   sensitive   to  the  method  of
     substitution for  BDL  values than unweighted  regression  trend
     estimates.   This  is  an  important  finding  since  the use of
     weighted  approaches,   by  lessening   the   bias   of   simple
     substitution methods, may obviate the  need  for  "best  value"
     substitution methods which  are more cumbersome  to implement.
     The results also  indicate that use of  weighted  regression in
     multiply censored data series eliminates the need to apply the
     highest detection  limit  to  all data in  the  series (when data
     below this highest limit exist in the data series) in order to
     avoid  trends  due to changing  detection  limits.   This  is an
     important result  since the  latter  practice  which  is generally
     used in (unweighted) trend estimation results in the loss of the
     advantage  of improved  detection limits.
esc.SA2.6/92    Incorporating Uncertainty in Parametric Trend Analysis • Page  vii

-------

-------
   I. INTRODUCTION

        Water  quality  data  are  often  collected  in  monitoring
   programs to serve as  a basis for the estimation of trends.  Such
   time series  of  observations may be  censored  at  one detection
   limit,  or  at  several.   Change-s  in  detection limits during the
   course  of  a  monitoring program often  occur  as a  result  of
   intentional  improvements  to  the  level  of  detection or  by
   switches  in  the  chemical  analysis  methods  that  are  used.
   Varying detection  limits in a data  series  can also occur  as a
   result of quality assurance procedures that  specify periodic re-
   determinations of  the  detection limit.   Although data can be
   censored at both lower and upper limits, below detection limit
   (BDL)  observations  have  been   the   common   problem   we  have
   encountered  in   water  quality  data   from  the Chesapeake  Bay
   Program^.  In  general, seasonally low BDL  nutrient  concentrations
   may be typical of estuarine  and  marine monitoring situations.

        The   several   published   methods  that   deal   with   BDL
   observations are generally  oriented  towards obtaining the "best
   value"  to  substitute for the censored  values (1-9).   In the
   second   section,   we  give  a   brief  description   of  these
   substitution methods.   A more detailed  review can be  found in
   (4), which suggests that overall best performance  is  obtained by
   what are  termed  "robust methods" of  substitution.   The Monte
   Carlo  studies by  (6,1) support  the claim.   However,  these
   substitution methods are not easily  adapted to  correlated data.
   This   is   a   significant  drawback,  since   time  series  of
   environmental data are usually correlated.

        The American  Chemical  Society  Committee on  Environmental
   Improvement  (ACSCEI)  defined  the limit of  detection   (LOD)  as
   "the lowest  concentration  level that can be  determined to be
   statistically different  from  a  blank" (10).   They  recommended
   that the LOD  or  detection  limit be  calculated as 3 times the
   standard deviation of blank determinations.  A detection limit
   may re^er to  an instrument detection limit (IDL) based on signal
   to  noise  ratios  or  a  method detection limit (MOL)  based  on
   replicate determinations of a blank  or  field blanks (see  11).
   Once  the  detection  limit  is  determined,   it   is  used  as  a
   censoring  threshold  for   subsequent  measurements  and   the
   concentration levels observed below that limit are  reported as
   BDL.

        A detection  limit, defined  as  a multiple of the  standard
   deviation of  a number of measurements, thus contains  information
   about  the  precision  of  observations  above  and  below  the
   detection  limit.   We  present,   in  the  third section, a  new
   substitution  approach  that incorporates this fact.   In  some
   cases an analyst may  be  faced with data  containing observations
esc.SA2.6/92    Incorporating Uncertainty in Parametric Trend Analysis  • Page 1

-------
   censored  at  one  or  more  detection  limits  whose  method  of
   calculation  is  not  known.     For  this  case,  we  derive  a
   conservative  estimate  of  the  variance  associated  with  a
   censoring threshold.


   II. SUBSTITUTION METHODS

        The accompanying annotated bibliography  (Appendix 1) gives
   a fairly extensive  survey of  the  current  state of methodology
   for dealing with BDL data.   The consensus among the authors is
   that "bad data is better than  no  data".   That is,  if possible
   one  should  obtain   the  actual  values  for   the  observations."
   reported as BDL.  However, this is usually  not  possible; indeed,
   the ACSCEI  (10)   recommends  reporting below  detects as  "not
   de.tected" with the associated limit of detection appended as a
   s-eparate  variable.    Sin-ce  values  for  BDL  observations  are
   necessary in many statistical situations, estimation methods for
   BDL data have been studied.

        The most  commonly used  method is  the method  of  simple
   substitution.  These  methods  consist  of  substituting a  single
   value   for   an   unknown  BDL   observation.     Following  the
   substitution, the usual statistical analysis is carried out as
   if the  fill-in  data  is real data.  The  fill-in values may be one
   of  the  following:   zero,  the  detection  limit  or  half  the
   detection limit.   The consensus of  the literature  (1-9) is that
   this simple  method  generally  leads to biased estimates.   The
   performance of  these simple substitutions in  Monte Carlo studies
   (6,1)   were  generally  poor.     However,   these  methods  are
   definitely the simplest to implement.

        A  second  class  of methods  are  based  on distributional
   assumptions.  Under  the assumption  that the observations follow
   a certain distribution  (such as log-normal)  the parameters are
   estimated  based  on  the  observed  concentrations  above  the
   detection limit and  th~ percentage  of BDL  data.  Either maximum
   likelihood or  probability plotting methods can be used.   The
   maximum likelihood method is more involved to implement than the
   plotting method.   The plotting method is also  complex when more
   than one detection  limit  is  involved.    These  methods  are
   described  in  (1,5).    This  is also  the  approach  of  tobit
   regression (8), in which maximum likelihood estimates of model
   parameters are obtained under a distributional assumption.

        A   combination  of  the   substitution   method   and  the
   distributional assumption  is   recommended by (4).   Under the
   distributional assumption,  the first few  (depending on how many
   BDL observations  are in the data set) percentiles are estimated.
   These values are  substituted  for  the  BDL  observations and the
   target  analysis   is  carried  out  as  usual.    This  method  is


esc.sA2.6/92     Incorporating Uncertainty in Parai. :tric Trend Analysis  • Page 2

-------
   computationally  involved,  especially when  there  are multiple
   detection limits; however, this approach consistently produced
   small estimation errors (4).

        The methods described in the literature so far  concern only
   uncorrelated  data.    Extensions  of these  methods  and their
   performances to  correlated  da,a  have not  been explored in the
   literature.   These  methods,   even   in  cases  in  which  it   is
   possible to extend  them  to account  for autocorrelation in the
   data, will be computationally  intensive.


   III. WEIGHTING METHOD

        The treatment  of BDL  data suggested  by in this report  is
   novel.    It  takes  advantage   of one  common  feature  of  the
   definition  of   a   detection   limit:    that  of  measurement
   uncertainty.  Thus we note that a detection  limit, in addition
   to  being  a  threshold  for   data   censoring,  also  contains
   information regarding variability in the data.  We, therefore,
   propose  to  incorporate  this  information  in  analysis  via
   weighting.   We  distinguish two  situations: one  in  which the
   detection limit calculation method is known, and one in which  it
   is unknown.

        Because detection limits are measures of uncertainty,  it  is
   clear  that  the  varying  detection  limits  in the  data  are   an
   indication  of  heteroscedasticity.     We   postulate  that  the
   standard deviation of the tth  observation  is st given by

                       s2t proportional to r2t

   where rt is a measure of the uncertainty in the tth  observation.
   The quantity rt could simply be 1/3  of the detection limit (if
   the  ACSCEI  definition  applies)   or some  other  number which
   reflects a knowledge of the uncertainty in the data.

        In  the  case  wheie   the   detection  limit  calculation   is
   unknown, a conservative estimate  of  the variance can be  arrived
   at as follows.  When a datum is reported as  below detection, its
   true value could lie  anywhere between  zero  and  the detection
   limit.  An estimate for the datum is given by the  mid-ooint  of
   this interval.   The  uncertainty in this estimate depends on the
   exact distribution of the  data,  which  is  unknown.   However, a
   upper bound exists for the variance  of  any  distribution  of data
   in this interval.  Technically, this can be  stated as follows:

        If  X is any random variable taking values in an interval
        [a,b] then the variance of X can not exceed (b-a)2/4.

   (See Appendix 2 for the proof.)
csc.SA2.6/92    Incorporating Uncertainty in Parametric Trend Analysis • Page 3

-------
   Thus, in such a case, an  estimate  of  the  variance of the data
   below  the  detection  limit  is the  square  of  the  detection
   limit/4. This estimate of variance does not apply  to data above
   detection in this  case and other estimates  of uncertainty might
   be explored.

        If  we  conclude  that   the   detection  limit  or  above
   conservative estimate  appropriately  represents  the  changing
   variance  of multiply-censored data   series,   then  the  best
   estimation procedure for parametric trend estimation is weighted
   least squares,  since observations can be weighted according to
   the  uncertainty  associated  with  them  to  obtain appropriate
   variance estimates (see 12).  Thus  the observations are weighted
   by the  inverse of  the estimates of  their  standard deviations.
   In our approach, the BDL data are replaced  with y(t) - the value
   of 1/2  of  the  operant  detection  limit and observations  are
   weighted" by w( t ) = ( l/rt) ,  (see 12,  eq.  2.2,  p.  11).

        In general,  heteroscedasticity in the data can occur due
   to many factors,  including  those  due  to sampling variability.
   It is important to  recognize the need for a  composite measure of
   uncertainty  due   to  all   contributing  factors.    However,
   construction  of  such  a  model  requires  estimates  of  the
   individual  sources of variability  and an understanding of their
   inter-relationships, which may be  difficult to obtain.
   An example

        Since the  inception  of the Chesapeake  Bay  Program Water
   Quality Monitoring Program,  detection limits  for orthophosphate
   concentrations   sampled   at   mainstem  monitoring   stations
   successively  declined  due  to  modifications  in  the  water
   chemistry analysis  methods  (Figure 1).   In  this  example,  we
   apply   our   method  to   a   time  series   of  orthophosphate
   concentrations measured from June 1984  through February 1991 at
   a   monitoring   station   in   Maryland   waters   near   the
   Maryland/Virginia border.

        In the  field sampling procedure, grab samples were taken at
   different depths in the water  column at the station in order to
   characterize  the  depth  profile of  nutrient concentrations.
   These samples were  taken  at the surface, bottom  and  two mid-
   water depths which  depended upon the  density structure of the
   water column. We  selected samples measured above the pycnocline
   (AP) because this  series  appeared  to  contain a  downward trend
   and  inspection  of  the data  revealed  no further  significant
   autocorrelation in the data.
CSC.SA2.6/.;    Incorporating Uncertainty ~.i Parametric Trend Analysis • Page k

-------
   Figure   1.   Sample  time   series   of  observed   mean   monthly
   orthophosphate  (P04F)  concentrations, showing detection limits.
   Below detection data only occurred prior  to  March 1985  in this
   data  set  and   are  shown  set  to   the detection  limit.    The
   prevailing detection limit in each period  is  shown by the dashed
   line.
                CB5.3 AP P04F Mean Monthly Concentration
                 BDL observations set to the detection limit
P04F mg/l
   0.014

   0.013

   0.012-j*

   0.011

   0.010

   0.009

   0.008

   0.007-

   0.006-

   O.OOB

   0.004-
    I
   0.003-

   0.002-

   0.001

   0.000-
          I

          ^••••••f
                     o

                    0 0
                                           o oo
 I'l I | T-VT-f 1
J  0  F
u  c   •
n  t   b
                  J
                  U
                  n
^T1
 0
 c
 t
8
4
               B
               0
                      F
                      «
                      b

                      a
                      6
J
U
n
                      O
                      C
                      t
F
«
b

a
7
J
U
n
O
C
t
F
«
b

e
8
JO
UC
nt
F
«
b

e
8
JO
UC
nt
FJ
«U
bn

9
0
O
C
t
                                           g
                                           1
                                     Month
esc.SA2.6/92     Incorporating Uncertainty in Parametric Trend Analysis •  Page 5

-------
        During the winter, the station was sampled once a month; in
   other months,  it was sampled twice a month.  To obtain a monthly
   series   of  observations,   we   averaged   the   within-month
   concentrations to  obtain monthly values (see Figure 1).  In this
   data set,  within-month concentrations were either BDL or above
   detection at the  same  detection limit,  therefore obviating the
   problem of estimating  a monthly average from BDL and above-BDL
   data.  In  this plot, BDL  data  were  set  equal  to the detection
   limit; the only BDL data  in the series occurred prior to March
   1985 and all these values were BDL.

        The monthly  values  and  the  range of uncertainty,  r(t),
   associated with each monthly value  in  this  sample data set is
   shown in Figure 2.  In this plot,  BDL data were set equal to 1/2
   the detection limit.   We  calculated r(t) as 1/3 the  prevailing
   detection  limit   (see  Figure  1),  since detection  limits  of
   orthophosphate were determined as 3 times the standard deviation
   of duplicate samples (C.  Zimmermann, pers. comm.; N.  Fristche).

        To investigate the level  of censoring on our estimates, we
   created four additional data  sets  based on this original sample
   data.  This was  done by applying an artificial detection limit
   (ADL) to  the  data from March  1985  onward.   We  selected four
   ADL's to correspond to approximate additional  censoring levels
   of 5% (ADL=0.00240) , 15%  (ADL=0.00265),  30%  (ADL=0.00315) and
   50%  (ADL=0.00420)  of  the  data.   Thus,  each  of  these ADL's,
   applied  individually,  replaced the  real  detection  limits of
   0.0016 and 0.0006  that occurred from March  1985 onward.  These
   artificial data sets retained the real detection  limits  of 0.012
   and 0.007 prior to March  1985.  These contrived detection limit
   series  are shown   in  Figure  3  along with  the  observed  mean
   monthly concentrations  (BDL prior  to March 1985 are shown set to
   1/2 the detection  limit).

        To examine the effect of  the substitution  choice for BDL
   data, we  constructed  three categories of sample  data  sets by
   setting  Br>L  observations  equal  to  0,  to  1/2  the  prevailing
   detection limit,  or to the prevailing detection limit.

        To  illustrate the   effects  of  applying  the  weighting
   procedure, we modeled each sample series twice:  (1)  weighting
   each observation according to the prevailing detection limit (as
   shown in  Figures  1 and  3) and (2)  applying no  weights (which
   computationally  means  applying  the   same   weight)   to  the
   observations. Our  rationale for weighting all observations, not
   only  BDL  data,  in a time period according to  the  prevailing
   detection limit is based on the idea of data precision that is
   associated  with  detection  limits  (10):   a  datum  of  0.03
   determined with a  detection limit of .03 has the same level of
   uncertainty of measurement as a value of 0.015 associated with
   a detection limit  of 0.015 (100%; see 13,  p.82).
esc 3A2.6/92   Incorporating Uncer'ainty in Parametric Trend Analysis • Page 6

-------
   Figure  2.   Sample  time  series   of   observed  mean   monthly
   orthophosphate  (P04F) concentrations  with error  bars.   Below
   detection  data  prior to  March  1985 set  to  1/2 the  prevailing
   detection  limit.   The  error bars  around  each  observation  are
   equal to 1/3  the  prevailing detection limit.
                CB5.3 AP P04F Mean Monthly Concentration
                   with range of uncertainty about values
                  BDL observations =1/2 detection limit
P04F mg/l
0.014-
0.013
0.012-
0.011-
0.010-
0.009-
0.008-
0.007-
0.006-
O.OOB-
0,004-
0.003-
0.002-
0.001-
0.000-
J
U
n
t
t
, t t +
V H /, • • -
	 Ut T j h . T ^ + +++++ ++++++++ + + + +
•f
1 • ' p • I I • * • I • ' • i • • • | 1 • I | I i 1 | 1 I I | I 1 I | I I I | I i 1 p I I p I I | i i I | i 1 i p i i p 1 I | 1 1-rf i i i p i i [
OFJOFJOFJOFdOFJOFJOF
c«uc*uc«uceuceuceuca
tbntbntbntbntbntbntb
         e
         4
8
6
8
e
8
7
e
B
B
9
g
o
g
i
                                    Month
esc.SA2.6/92    Incorporating Uncertainty in Parametric Trend Analysis •  Page 7

-------
   Figure  3.  Sample   time   series  of  observed  mean   monthly
   orthophosphate (P04F) concentrations, with additional censoring
   levels (ADL's)  shown.  ADL's used were 0.00240, 0.00265, 0.00315
   and 0.0042 mg/1.   Below detection data prior  to  March 1985 are
   shown  set to  the  prevailing  detection  limit.    The  actual
   detection limits  prior to  March 1985  are shown  by  a  single
   dashed line.
               CB5.3 AP P04F Mean Monthly Concentrations
               showing levels of ADLs applied March 1985 on
P04F mg/l
-- 0.014
0.013
0.012
0.011
0.010
0.008-
0.008-
0.007-
0.008-
0.000-
0.004-
0.003-
0.002-
0.001-
0.000-
«
1
I
1
4
O
00° o
&ooeeooo o o
o
0 ° 0 n 0 0 o
o o0 o oo o o »
„ 	 iftz^z-?io?>_irrri^_^_r^:i§j£:Qi::ij_5ip i 5"
O ~*CT
O
JOFJOFJOFJOFJOFJOFJOF
Jc«uc«uc«uc«uc«uceuc»
itbntbntbntbntbntbntb
988 88 8 0 9
(B6 78 9 0 1
                                   Month
esc.sA2.6/92    Incorporating Uncertainty in Parametric Trend Analysis • Page 8

-------
        .We used  the  method described in  (14)  to  model  trends in
   each  of  the  example  series as  a seasonal  autoregression of
   concentration on time.  While this method consists of two steps,
   we terminated  our estimation after the  first  step since  this
   gave the results  of interest to us.

        Following  the  first   step  of  this method,  preliminary
   estimates of the trend  and  seasonal effects  were  obtained using
   ordinary least  squares  (OLS)  regression.   We accomplished the
   weighting procedure by multiplying y(t) and t by w(t) prior to
   the model step.  For the second  step,  a model for the residuals
   would  have  been  obtained  using  the  Box-Jenkins methodology.
   Using the selected model for the error  terms, revised estimates
   of trend  and  seasonal  effects  could  then be  obtained  using
   generalized least squares.   We  followed  this  method  prior to
   step two and  made computations  using  the AUTOREG procedure of
   SAS/ETS Version 6.

        The trend estimates, their  standard  errors,  and associated
   probability level obtained for each data  set using weighted and
   unweighted  (weights=l)  regression estimation  are tabulated in
   Tables 1-3 according to  the BDL substitution method.

        Tables 1, 2 and 3  show the  results from the  unweighted and
   weighted regressions for the three substitution  choices for BDL
   data and different  censoring  levels.   These results show  that
   the weighted  regression estimates of  trend coefficients  were
   less sensitive to the value used for substitution than were the
   unweighted   regression  estimates.      Estimates  of   trend
   coefficients from the weighted regressions ranged from -0.000013
   to -0.000024;  trend coefficients  from  unweighted regressions
   ranged  from -0.000002  to  -0.000038.   Specifically,  for  the
   original data  series,  the  trend  estimates from the  weighted
   regressions were -0.000021,  -0.0000022 and -0.000023 whereas the
   corresponding unweighted regression estimates were  -0.000019, -
   0.000009 and -0.000038.   Even at the level of 50%  censored data,
   trend estimates from the  weighted  regression were comparatively
   more robust.

        Although the occurrence of  BDL data  introduced bias in the
   both   weighted   and  unweighted   trend   estimates   in    each
   substitution  method,  the weighted regression  method  appeared
   more advantageous in another important aspect of  trend modelling
   of  environmental data  with  changing detection  limits.    By
   interpreting  the  detection limit  as  a measure  of uncertainty
   associated with  all data points,  the  practice  of  setting all
   data below the highest detection  limit of censored data in the
   data set to the highest  detection  limit to avoid  the estimation
   of spurious trends becomes unnecessary.   This is  illustrated by
   the stability of weighted trend estimates over censoring levels
esc.SAZ.6/92    Incorporating Uncertainty in Parametric Trend Analysis • Page 9

-------
    Table  1.  Trend   estimates  from  weighted  and   non-weighted
    regressions  with  different  levels of  censoring,  with  below
    detection observations set to  the  detection limit.   Estimated
    slope  (b),   standard  error   of   the  slope  and   associated
    probability  (p) are  indicated.
Censoring
level
original

+ 5 %


+ 15 %


+ 30 %


+ 50 %



b
s .e . (b)
P
b
s.e.(b)
P
b
s.e. (b)
P
b
s.e. (b)
P
b
s.e. ( b)
P
Weighted
-0.000023
0.000009
0.0243
-0.000023
0.000009
0.0208
-0.000024
0.000009
0.0177
-0.000023
0.000009
0.0135
-0.000022
0.000008
0.0068
Unweighted
-0.000038
0.000009
0.0001
-0.000036
0.000009
0.0002
-0.000036
0.000009
0.0002
-0.000036
0.000009
0.0002
-0.000030
0.000008
0.0002
esc.SA2.6/92    Incor, oratin? Uncertainty  in Parametric Trend Analysis • Page  10

-------
    Table  2.  Trend   estimates  from  weighted   and   non-weighted
    regressions  with  different  levels  of  censoring,  with  below
    detection observations set to one-half of  the  detection limit.
    Estimated slope (b),  standard error of the  slope and associated
    probability  (p) are indicated.
Censoring
level
original

+ 5 %


+ 15 %


+ 30 %


+ 50 %


Weighted
b
s.e.(b)
P
b
s.e . (b)
P
b
s.e. (b)
P
b
s.e. (b)
P
b
s.e.(b)
P
-0.
0.
0.
-0.
0.
0.
-0.
0.
0.
-0.
0.
0.
-0.
0.
0.
000022
000010
0317
000020
000010
0552
000018
000011 .
0952
000019
000011
0936
000019
000011
0974
Unweighted
-0
0
0
-0
0
0
-0
0
0
-0
0
0
-0
0
0
.000009
.000008
.3149
.000010
.000009
.2988
.000009
.000009
.3560
.000011
.000010
.2628
.000014
.000010
.1696
csc.SA2.6/92    Incorporating Uncertainty in Parametric Trend Analysis • Page 11

-------
    Table  3.  Trend   estimates  from  weighted   and  non-weighted
    regressions  with  different  levels  of  censoring,  with  below
    detection  observations  set  to  zero.    Estimated  slope  (b),
    standard error  of  the slope and associated probability (p) are
    indicated.
Censoring
level
original

+ 5 %


+ 15 %


+ 30 %


+ 50 %



b
s.e.(b)
P
b
s.e . (b)
P
b
s.e. ( b )
P
b
s.e.(b)
P
b
s.e.(b)
P
Weighted
-0.000021
0.000010
0.0433
-0.000017
0.000012
0.1481
-0.000013
0.000013
0.3169
-0.000015
0.000015
0.3067
-0.000016
0.000016
0.3319
Unweighted
-0.000019
0.000011
0.0912
-0.000017
0.000012
0.1603
-0.000018
0.000013
0.1582
-0.000012
0.000014
0.3945
-0.000002
0.000015
0.9026
csc.SA2.6/92    Incorporating Uncertainty  in Parametric Trend Analysis  • Page 12

-------
    within each  substitution method.   Using  weighted  regression
    therefore avoids the loss in  improved  data quality  imposed by
    this latter practice.

         Use  of  weighted   regression   also  improved  other  data
    aspects.   When the data  series  contained  a high percentage of
    BDL data, a significant  amount  of  autocorrelaMon occurred in
    the model  residuals.   xhis  autocorrelation was  not  nearly as
    prevalent in residuals from our weighted regressions as in the
    residuals from the unweighted regression.

    IV. CONCLUSIONS

         We conducted a  comprehensive  search  of the literature on
    the treatment  of BDL data in  statistical water quality analysis.
    We found  several  useful articles written  by  Helsel,  Gilliom,
    Conn  and  others on  this topic.   The accompanying  annotated
    bibliography (Appendix  1) provides  a  more  detailed  summary of
    the articles.  The articles we found  discussed the computation of
    simple descriptive statistics  only.  These  papers  (except for 8
    which  concerns  tobit   regression)   did   not  discuss  trend
    estimation.   The literature  is sparse  on how to deal  with
    censored data when the data  are autocorrelated.

         In  this   paper  we proposed   a   weighting  scheme  which
    interprets the detection limit as a  measure  of uncertainty.  The
    approach  seems  reasonable  since detection  limits  are usually
    calculated  as  multiples of  the standard  deviation  of  blank
    determinations.    For  instances  in  which  detection  limit
    calculations are unknown, we present a weighting function that
    can be used which does not  rely on  distributional assumptions.
    A mathematical  result  shows  that it is the most conservative
    estimate of uncertainty, over all possible distributions.

         The  advantage  of  using  weighted  regression  rather  than
    unweighted regression in describing trends in our sample data
    set was threefold.   First, weighted  trend  estimates  were much
    less sensitive to the level  of censoring in the data than were
    unweighted trenc! estimates.    Second,  weighted trend estimates
    were much less sensitive to the  choice  of data substitution for
    BDL data  (i.e.  0,  1/2  the  detection  limit, or  the  detection
    limit) than were  unweighted  trend  estimates.     Finally,  the
    complicating  effects  of autocorrelation   weire  less  in  the
    weighted  regression  models  than in the unweighted  regression
    models.

         The results  also showed that the use of weighted regression
    eliminates the  need  to  censor  data to the maximum  detection
    limit when dealing with multiply censored  data series to avoid
    spurious  trends.      Thus,   trend  estimation  with  weighted
csc.sA2.6/92    Incorporating Uncertainty in Parametric Trend Analysis  • Page 13

-------
    regression preserves the  effect of improved detection limits in
    data sets.

         Th.e performance of  the estimation procedure  based on the
    weighting scheme we have demonstrated could be further evaluated
    using Monte Carlo methods.  Such a  study would  provide a good
    topic for further research.  Questions that we have attempted to
    address  in a  general  way could be  moi.e extensively examined.
    For example,  the answer to the question of how many BDL data are
    permissible  in  an   analysis  is  affected  by  the  range  in
    concentration in  the data relative to the censored data levels,
    the distribution of  censored values in the data series (are they
    just at the beginning,  or do they occur seasonally,  etc.), and
    the amount of change in  above detect  data  concentrations over
    time .

         An  answer to the  question  concerning  the  best  choice of
    substitution  method is  also  data  dependent  and  application
    dependent, even in the estimation of simple quantities such as
    mean, median,  standard  deviation etc.   Relative performance of
    estimation methods  described in  (6)   for  the   mean,  standard
    deviation, median and interquartile range of actual river sample
    water quality data  series  of  25 observations depended on the
    level of censoring.   Most of the 8  methods  differed little at
    20%,  the lowest level of censoring they investigated.  At higher
    levels of  40%, 60%,  and  80%,  assuming a uniform distribution
    (UN)  or  a log-normal distribution (LR)  for BDL data produced the
    lowest  and similar  errors of  estimation  for   the mean  and
    standard  deviation.    For  the  estimation  of  the  median  and
    interquartile  range of  these  data  sets,  the  UN  method  was
    usually second best, but only greatly  at 80% censoring of the
    data,  to  a  maximum  likelihood  method   (LM).    Thus,  the
    uncomplicated assumption  of  a uniform distribution for BDL data
    in most cases  was quite adequate? for the estimation of a mean,
    this is equivalent to choosing 1/2  the detection limit for BDL
    data values.   These results differed  somewhat  from those the
    same  authors   achieved  on  simulated  data  sets  which  were
    generated from specific  types of distributions,  most of which
    were skewed to the  right, with  long tails  extending to higher
    concentrations  (1).    In  these  simulations,   the'  relative
    performance of LR ,  LM and UN methods  varied among cases, with
    the UN  method being only  really  poorer  at the  80% censoring
    level for estimation of median and interquartile range.
    V.  RECOMMENDATIONS

         It  is  difficult  to  advance  a  general  recommendation
    concerning the accommodation of BDL data  in  trend analysis of
    Chesapeake Bay Program water quali.ty monitoring data which will
    perform well in all  scenarios.  We  offer  the following simple
csc. SA.. 6/92    Incorporating Uncertainty in Parametric Trend Analysis • Page

-------
    recommendations for trend estimation based on our results, those
    of  papers  we  have consulted  and  the  characteristics  of  CBP
    mainstem data we have examined.  It  is important to keep in mind
    that no substitution method for BDL  data is completely accurate,
    and  that  the  degree  of  inaccuracy  imparted  by  substitution
    methods for  BDL data  to  estimated  statistics cannot  be known
    with real data  seri  s.

         1) Assume  a  uniform distribution of BDL  data  between the
    detection limit and  zero.  Substitute 1/2  the detection limit
    for BDL data.

         2) Use  weighted  regression, applying  a weighting  factor
    that represents uncertainty  in the  data.
esc.sA2.6/92    Incorporating Uncertainty in Parametric Trend Analysis • Page 15

-------
    VI. REFERENCES

    (1) Gilliom, R.J.; Helsel, D.R. Water Resour. Res., 1986,22,135-
         146.

    (2) Gilliom,  R.j.; Hirsch, R.M.; Gilroy, S.  J. Environ. Sci.
             Technol., 1984, 18, 530-535.

    (3) Gleit, A.  Environ. Sci. Technol., 1985, 19, 1201-1206.

    (4) Helsel, D.R. Environ. Sci. Technol., 1990, 24, 1766-1774.

    (5) Helsel, D.R.;  Cohn, T.A. Water Resour.  Res.,  1988,  24, 1997-
         2004.

    (6) Helsel, D.R.;  Gilliom,  R.  J. Water Resour. Res., 1986, 147-
          155.

    (7) Porter, P.S.;  Ward, R.C.; Bell, H.F. Environ. Sci.
           Technol.,1988, 22, 856-861.

    (8) Robinson,  P. M.  Econometrica,  1982,  50, 27-41.

    (9) Travis, C. C.; Land, M. L. Environ.  Sci. Technol.,  1990,
           24,961-962.

    (10) ACS  Committee  on Environmental  Improvement,  1980, Anal.
    Chem.,        52, 2242-2249.

    (11) ACS  Committee  on Environmental  Improvement,  1983, Anal.
    Chem.,        55, 2210-2218.

    (12) Carroll,  R.J.;  Ruppert,  D.  Transformation and weighting  in
    	     Regression. Chapman and Hall: New York, New York, 1988.

    (13) Taylor,  J.K.  Quality Assurance of Chemical Measurements,
            Lewis  Publishers, Inc.: Chelsea, Michigan, 1987.

    (14) Nagaraj,  N. K.; Brunenmeister, S. L.  In Proceedings of  a
         Conference "New Perspectives in the Chesapeake system:  A
            research and Management Partnership", Baltimore, MD.
               Chesapeake Bay Consortium Pub.  No. 137, 355-369.
csc.SA2.6/92    Incorporating Uncertainty  in Parametric Trend Analysis • Page 16

-------
                              APPENDIX 1

                        Annotated Bibliography


    Gilliom, R. J. and Helsel, D. R. (1916)

    "Estimation  of distributional  parameters for  censored trace
    level water quality data. 1. Estimation techniques"

    Water Resources Research, 22 (2):135-146.

         Several   methods    were   evaluated    for   estimating
    distributional parameters for censored data sets  using only the
    uncensored observations. Their reliabilities were evaluated by
    a Monte Carlo experiment in which "small samples were generated
    from  a  wide  range of  parent  distributions  and  censored  at
    varying levels. Eight methods were  used  to  estimate the mean,
    standard deviation, median  and  interquartile  range. The eight
    methods we re:

    1. (ZE)  Censored observations were set equal to  zero

    2. (DL) Censored observations were  set equal  to the detection
    limit

    3.  (UN)  Censored  observations  were  assumed follow a uniform
    distribution between zero and detection limit.

    4. (NR)  Censored observations were assumed follow the zero-to-
    detection limit portion of a normal distribution which was fit
    to the uncensored observations using least squares  regression.

    5. (LR) Censored observations were  assumed  to follow zero-to-
    detection limit portion  of  the  log-normal distribution fit to
    the uncensored data by least squares regression. The method is
    identical to NR, except that concentrations were log-transformed
    prior to the analysis.

    6. (KM) Concentrations  are  assumed  to be normally  distributed
    with parameters estimated  from  the  uncensored observations by
    the   maximum   likelihood   method   for   a   censored   normal
    distribution.

    7.   (LM)   Concentrations  are  assumed   to   be  log-normally
    distributed  with   parameters estimated   from  the  uncensored
    observations by the  maximum likelihood method  for  a censored
    lognormal distribution.
csc.SA2.6/92    Incorporating Uncertainty in Parametric Trend Analysis • Page  17

-------
    8.   (DT)  Censored  observations  are  assumed  to be  zero  and
    uncensored  observations  are   assumed  follow  a   lognormal
    distribution.  Estimates  of   parameters  of  the  overall  delta
    distribution  are   obtained   by  computing  maximum  likelihood
    estimates   of  the   uncensored   lognormal   portion   using
    relationships  between  these and the overall delta distribution.

         The evaluation of the reliability of estimation methods was
    based  on the root mean squared error (RMSE) computed from actual
    parameters underlying  the distribution.  The  average  RMSE,  over
    all  the  parent distributions, was expressed as fractions of the
    true  values.    The  methods   were  then  ranked  according  to
    increasing  value   for  the  RMSE,  for  various  parameter  and
    censoring level combination.  The best overall method was chosen
    by summing the ranks of RMSE's over all sample sizes,  censoring
    levels,  and parameters.   The method  with the smallest  sum of
    ranks  is declared  to  be the  best.

         Note  that  some   of  the  methods  considered have  certain
    distributional assumptions.  If a priori knowledge of  the  true
    distribution of  the data  is  known,  one  may simply consider the
    corresponding  method.  The paper also  gives results from a Monte
    Carlo  study where  certain sample characteristics are  used to
    classify the data  as  to what  the parent distribution  would be
    and  apply the  corresponding  method.

         The paper concludes  that the most robust estimation method
    for  minimizing errors  in estimates of  the mean, median, standard
    deviation and  interquartile  range of  censored data  was the log
    probability regression method (LR). This method  is based on the
    assumption  that   censored observations  follow  the  zero-to-
    censoring level portion of a  lognormal distribution obtained by
    a  least  squares  regression  between  logarithms  of  uncensored
    concentration  observations and their normal scores.  When method
    performance was evaluated separately for each  distributional
    parameter, LR  resulted in  the lowest RMSEs  for mean and standard
    deviation.   The  lognormal  maximum  likelihood  estimator  for
    censored data  (LM) produced  the  lowest  RMSE  for median  and
    interquartile  range.    The   authors  conclude  that  these  two
    methods   constitute  the best  procedures  for their  respe-ctive
    parameters.
    Gilliom, R. J.,  Hirsch, R.  M.,  and Gilroy,  E.  J.  (1984)

    "Effect of Censoring Trace-Level Water-Quality Data  on Trend-
    Detection Capability"

    Environmental Science and Technology,  18(7): 530-535.
esc.SA2.6/92    Incorporating Uncertainty in Parametric Trend Analysis  • Page 18

-------
         This paper makes a  strong  case against reporting values as
    below detect values.  The article decidedly argues that the data
    analyst is better off with an unreliable data value than to be
    given a value as simply  below  detect.  The following are a few
    chosen paragraphs from the article highlighting its argument.

    Summary: Monte Carlo experiments were used to evalua'e whether
    trace-level water quality data  that are routinely censored (not
    reported)  contain  valuable  information  for  trend  detection.
    Measurements are commonly  censored if  they  fall  below a level
    associated with  some minimum acceptable  level  of  reliability
    (detection limit).  Trace-level  organic  data were simulated with
    best-  and worst-case  estimates  of  measurement  uncertainty,
    various  concentrations  and  degrees  of  linear  trend,  and
    different censoring  rules.  The resulting classes of data were
    subjected to a nonparametric statistical  test for  trend.  For
    all  classes  of data  evaluated,  trends were  most  effectively
    detected in uncensored data  as  compared  to  censored data even
    when the data censored were highly unreliable.  Thus, censoring
    data  at  any  concentration   level   may  eliminate  valuable
    information.   Whether or  not  valuable  information  for trend
    analysis is, in fact, eliminated by censoring of actual rather
    than simulated data depends on whether the analytical process is
    in statistical control and bias  is predictable for a particular
    type of chemical analysis.

         The purpose of  the  report is to  demonstrate  that trace-
    level data  that  are  routinely  censored may  contain  valuable
    information despite  their greater uncertainty, and that, given
    certain conditions, adopting a "no censoring"  rule increases the
    data analyst's ability to detect the actual trend  in  data.  This
    conclusion was reached by the Monte Carlo  experiment  reported in
    the article.   Trace-level  data were simulated with best- and
    worst-case estimates  of measurement uncertainty, various degrees
    of linear trend,  and different censoring rules.   The resulting
    class of data were subjected to a nonparametr ic statistical test
    for  trend.   The  effect  of different degrees of  censoring  is
    shown  by  comparing  the  power  of   (effectiveness)  of  trend
    detection,  as  determined  from Monte  Carlo   experiments,  for
    different classes of data with different censoring rules.

         For  all   classes  of  data  evaluated,   trends  were  most
    effectively detected  in  uncensored data as compared to censored
    data even when the  data were highly unreliable.  Censoring data
    at  any  concentration  level,  especially  given  our  lack  of
    knowledge  about  prevailing   concentrations,   may   eliminate
    valuable information. The  more  reliable  the data censored, the
    greater  the  information  lost   and  the  more detrimental  the
    effects of censoring.
esc.sA2.6/92    Incorporating Uncertainty in Parametric Trend Analysis • Page 19

-------
         The results  of the present study lend  strong support to the
    argument, earlier  advanced by  others  in  Rhodes,  R.  C.  I  In
    "Environmetrics  81:   Selected  Papers";   SIAM  Institute  for
    Mathematics  and Society:  Philadelphia,  1981;  pp 157-162]
    and generally followed in new ASTM standards,  that data should
    not be  routinely censored by  laboratories.    Uncensored  data
    should always be  retained in permanent records available to data
    users even if  policy  makers  of a laboratory decide  that some
    censoring or other  form  of qualification is  necessary before
    general public  release of data. Measurement  data  should not be
    discarded unless  the lack  of  statistical  control  in  the
    measurement  process is clearly demonstrated.
    Gleit, Alan. (1985)

    "Estimation for Small Normal Data sets vith Detection Limits"

    Environmental Science and Technology,  19(12): 1201-1206.


         Summary: The paper identifies the detection limit problem
    as  a  problem of censoring.  The problem of optimal substitution
    for the values  reported as below  detection  limit is explored.
    The several  commonly used methods  are  considered.    Some  new
    methods,  such as using  EM-algorithm  to estimate the unavailable
    data;  using some optimal linear functions of order statistics,
    etc.  are  also considered.  The performance of these methods are
    compared,  via a  simulation study.   The  overall  winner was the
    method which uses  the  expected values as the  fill-in values.
    But it takes  a few iterations to obtain  the  substitution for
    each  below  detection limit value.  The  steps involved  in  the
    iteration are:

    (i) Select  any  initial  convenient  guesses  for  the  mean  and
    variance.

    (ii)  On the basis of our tentative current values for the mean
    and variance,  calculate the expected  values  for th'e  first  p
    order statistics conditional on being  less  than  L.  (Note that
    the p is the number  of  below detects  in  the  data and L is the
    detection limit.)

    (iii)  Now that  first  iterates are available for the below detect
    values the  initial  guess  for  the  mean  and  variance  can  be
    updated.

    (iv)  If  the current and updated  values are close  then STOP.
    Otherwise, go back to  step  (ii) using  the updated versions as
    current values.
csc.SA2.6/92    Incorporating Uncertainty in Parametric Trend Analysis  • Page 20

-------
         The procedure  described above can be programmed  in SAS with
    the .help  of  MACRO  processing.  In  general,  the  number  of
    iterations required will depend on the accuracy desired.

         The most  significant conclusion  of  the article  is  that
    fill-in by constants (zero or  the  detection  limit  or  half the
    detection limit)  can be, in general, orders of magnitude worse.

         The methods discussed  are under  the  assumption  that the
    observations are  independent.  The change in the detection limit
    problem  is not addressed.   The  fill-in  by expectation method
    given  in  the  paper assumes  normality.   The most  significant
    drawback of the results,  as  far  as  its  applicability to the CBP
    water quality monitoring data is concerned, is the assumption of
    independence.  A  considerable amount  of theoretical work may be
    involved before  the methods  described  in  the  paper  may  be
    applied to correlated data.


    Helsel, D.  R.  (1990)

    "Less  than  Obvious:  Statistical  Treatment of  data below the
    detection limit"

    Environmental  Science and Technology, 24(12): 1766-1774.


         This  is  by  far  the  most  comprehensive article,  from  a
    statistical stand point and  therefore a must-read in this list.
    The paper discusses the most appropriate statistical procedures
    for handling data that  have  been reported as below detects.

         First, the appropriateness of various commonly used summary
    statistics in the  presence of  below  detect data  is  discussed.
    It points  out  that mean  and  standard deviation may  be quite
    sensitive to the  deletion or addition of even one observation.
    Since environmental data are typically skewed, mean and standard
    deviation make  poor measures of central value and variability.
    Alternative measures suggested  in  this article  are  median and
    interquartile  range.   Also  the geometric  mean,  the  mean  of
    logarithms  of  the data  also  serves well since it estimates the
    median (in original units) when  the  logarithms  are  symmetric.
    The  paper  points  out  an important  advantage  of  median  and
    interquartile  range when applied to censored data.   The sample
    median is completely known when up to, but less than, 50% of the
    data were reported as below  detects.  Similarly,  interquartile
    range for the  sample  is known when  up to,  but less than, 25% of
    the data is reported as  below  detects.   Thus, no  fix-ups are
    necessary to obtain these sample summary statistics.

         The paper  considers three  classes of estimation methods.
esc.sA2.6/92    Incorporating Uncertainty in Parametric Trend Analysis  • Page 21

-------
    1.  Substitution methods: These methods substitute a single value
    for the unseen below detect  observations  in  the data.   These,
    generally,  lead  to biased estimation of mean, median,  standard
    deviation and interquartile range.

    2.  Distributional method:  An assumption  about the  underlying
    distribution  which  produced  the data  is  made.    Giver  the
    distribution, estimates of summary  statistics  that  best match
    the observed concentrations  above the  detection limit  and the
    percentage  of below detects  are  obtained.   Maximum  likelihood
    estimation  and method of probability plotting are  two  methods
    that can be used.  The two methods  are unbiased as sample size
    gets large, provided the  assumption  about the  distribution is
    correct.

    3.  Robust methods:  This is,  in fact, a combination  of methods
    1 and 2t  Under certain distributional assumption, the first few
    (depending  on how many  non-detects are in  the data) percentiles
    are estimated.  Then  these percentiles are used in the  place of
    the below  detect data  values  in  the ensuing calculations.  The
    author  reports,  with references, that  the robust  methods have
    consistently produced small errors.

         The author strongly recommends the use of robust methods in
    practice.  The implementation of the robust method in  commonly
    available software (SAS,  for  example)  is  described  here.  The
    method  is more involved if there are multiple detection limits
    present.  The FORTRAN  code to implement  this procedure may be
    obtained from the author.

         The references  cited by  the  author explore the  problem
    under  the assumption of independence.  Thus,  a  study extending
    these  ideas and  exploring their usefulness for correlated data
    is  called for.
    Helsel,  D. R. and Conn, T. A. (1988)

    "Estimation  of  Descriptive  Statistics  for Multiply  Censored
    Water Quality Data"

    Water Resources Research, 24(12): 1997-2004.
         This paper  extends the work of Gilliom and Helsel (1986) to
    the situation when multiple detection  limits are  present.   In
    addition to the  eight methods in that  paper, substitution of all
    below detects by a value of one half  their detection limit; an
    extension  of the  log  probability  method  (LR)  to  multiple
    detection  limit situation, namely   MR;  an  extension  of  the
    maximum  likelihood  method  for  censored  data  to  multiple
esc.SA2.6/92    Incorporating Uncertainty in Parametric Trend Analysi-  • Page 22

-------
    censoring limit  situation,  namely  MM;  and an adjusted maximum
    likelihood procedure [due to Tim Cohn discussed  in an USGS Tech
    Report] namely, AM.

         The major conclusions of the paper are:

    1. Estimates  of  descriptive statistics  for  multiply censored
    data may contain no  more  error  than those for singly censored
    data, if efficient methods are used.

    2. The AM and  MR methods  are found to  be substantially better
    than any of the simple substitution methods.


    Helsel, D. R. and Gilliom, R. J. (1986)

    "Estimation  of Distributional  Parameters for  Censored Trace
    Level Water Quality Data. 2. Verification and Applications"

    Water Resources Research, 22(2): 147-155.
         This is a  companion paper to Gilliom and Helsel  (1986); the
    approach is  basically  the  same.   Real  data  sets  were used  in
    place of simulated data sets.   Uncensored data sets with more
    than 50 observations for suspended sediment, total phosphorus,
    total Kjeldahl  nitrogen,  and nitrate  nitrogen concentrations
    were obtained from 313  stations  of the U.S. Geological Survey's
    National Stream Quality Accounting Network (NASQAN).  Most data
    sets were monthly samples taken during  1974-1981, resulting  in
    917 data sets having more than 50 observations and  no censoring.
    The  sample  values from the  uncensored  data were used  as the
    population parameters  in the  computation of RMSE.  The LR method
    produced the lowest RMSE for the moment parameters.


    Porter, P. S.,  Ward,  R. C., and Bell, H.  F.  (1988)

    "The Detection Limit"

    Environmental Science and Technology, 22(8): 856-861.

         This article argaes strongly that  it  is  _mportant give the
    actual measurement and  the margin of error rather than report  it
    a below detect.  They emphasize the loss  of  information due  to
    the practice of reporting the value as  below detect.
esc.SA2.6/92    Incorporating Uncertainty in Parametric Trend Analysis • Page  23

-------
    Travis,  Curtis, C. and Land, Miriam  L.  (1990)

    "Estimating the mean of data sets with  nondetectable values"

    Environmental Science and Technology, 24(7):  961-962.

         This  paper  mentions  the  use  of  log-probit analysis  to
    estimate the mean,  when a possibly  a very high  percentage  of
    data sets are censored.  Several relevant references are listed.
    The technical  description of  the log-probit  analysis  is  not
    available here.  An example is available.
csc.sA2.6/92    Incorporating Uncertainty in Parametric Trend Analysis •  Page

-------
                               APPENDIX 2

                   Proof  of  upper  bound on variance


        In this appendix  we  give the  pro f of Result 1.

    RESULT 1

    Let X be a random variable taking values in the interval [a,b].
    Then variance of X does  not exceed (b-a)2/4.

    PROOF :

       Let Y=(X-a)/(b-a) .  Since Var ( X)= ( b-a ) 2 Var(Y),  it is enough
    to. show that  0 < Var(Y)  < 1/4.


        Since Y  takes  values  in the  interval  [0,1],  Y2 also takes
    values in the interval  [0,1] and  we have  that

                        0 j<  E(Y2)  <. E(Y) 1  1.

    Therefore ,

         Var(Y)               = E(Y2)  -  (E(Y) ]2

                              1     E(Y) - [E(Y) ]2
    Now the conclusion follows noting that maximum of the function
    p(l-p), where p  ranges  in  the  interval  [0,1], is 1/4.
esc.SA2.6/92    Incorporating Uncertainty in  Parametric Trend Analysis • Page 25

-------

-------