Technical Manual for Mcestimate


Technical Manual for MCestimate
            Matthew Etterson
     U.S. Environmental Protection Agency
           12 September 2013

-------
Contents
1. Introduction	3
  1.1. Example: Estimating causes of avian nest failure	3
  1.2. Example: Distinguishing causes of mortality in toxicity tests	3
  1.3. Example: Estimating the numbers of animals killed by contaminants and collisions	4
  1.4 Notation	5
2. Mathematical background (binomial nest survival)	6
  2.1 The Mayfield estimator	6
  2.2 Generalization of Mayfield assumption (1)	7
  2.3 Maximum likelihood estimation	8
  2.4 Generalization of Mayfield assumption (2)	8
  2.5 AIC and model selection	9
  2.6 Fledging rates	10
  2.7 Variance	10
     2.7.1 Variance of homogeneous daily survival rates	10

     2.7.2 Variance of heterogeneous daily survival rates	11

     2.7.3 Variance of fledging rates under homogeneous daily survival	11

     2.7.4 Variance of fledging rates under heterogeneous daily survival	11

  2.8 Confidence limits	12
  2.9 Binomial nest survival as a Markov chain	13
3. Competing risks (multinomial nest survival)	14
  3.1. Homogeneous failure rates	14
  3.2. Heterogenous failure rates	14
  3.3 Fledging rates and overall probabilities of failure	15
  3.5 Variance	16
     3.5.1. Daily probabilities	16

     3.5.2. Overall rates	16

  3.6. Confidence limits	17
4. Appendix: useful derivatives	18

-------
4.1. The derivatives of the binomial logit 18
4.1. The derivatives of the multinomial logit 19
5. Acknowledgements 20
6. Literature cited 21
1. Introduction
The analysis of competing risks has a long history in human health research, but has received limited
formal attention in wildlife ecology and ecotoxicology (Heisey and Patterson 2006). Informally, however,
the concept of competing risks has arisen many times in these disciplines (Abbott 1925, Ricklefs 1969).
A common and accessible motivating example for defining competing risks is the concept of cause-of-
death. Thus, when considering the possibility that an animal will suffer mortality in either acute or
chronic response to chemical exposure, one must also consider the possibility that the animal will die
due to disease, predation, or other causes, the occurrence of which will alter the rate at which animals
succumb to exposure to a toxicant. Although such cause-specific failure analyses remain an important
application, competing risks naturally arise in virtually all stochastic processes. The model described in
this technical manual provides a unified common framework for analysis of data arising from serial
observations on subjects in the presence of competing risks. The three examples below illustrate some
of the diverse contexts in which such data arise.

1.1. Example: Estimating causes of avian nest failure
Most studies of nest productivity recognize multiple causes of nest failure, but focus on nest survival as
the ecological process of interest. However, nests fail due to many causes, including nest predation
(Martin 1995), nest parasitism (Trine 1998), interspecific competition (Radunzel et al. 1997), adverse
weather (Martin 1992), nest abandonment (Traylor et al. 2004), nestling starvation (Martin 1992), and
egg failure (Westemeier et al. 1998). The estimation of cause-specific nest-failure rates is a problem in
competing risks (Etterson et al. 2007a, b).

1.2. Example: Distinguishing causes of mortality in toxicity tests
A common objective in wildlife toxicity testing is to estimate the mortality associated with a given
chemical exposure. Sound experimental design for toxicity tests requires the use of control treatments
with no chemical exposure, which are designed to allow separation of mortality due to intoxication from
either natural mortality or stress associated with experimental conditions, but not exposure (USEPA
1996). Proper estimation of the portion of mortality due to the toxicant alone is a problem in competing
risks and has long been of interest in toxicological research (Abbott 1925).

-------
1.3. Example: Estimating the numbers of animals killed by contaminants and
collisions
An unknown number of non-target animals are killed annually by pesticides in agroecosystems.
Similarly, an important source of mortality in many bird populations is collisions with anthropogenic
structures such as wind turbines, automobiles, windows, and power lines. In both cases, researchers
have tried to estimate the number of animals killed by counting carcasses, but many carcasses are
scavenged prior to discovery and some unscavenged carcasses remain undiscovered (Kostecke et al.
2001, Huso 2011). Carcass distribution trials, in which carcasses are placed in the field at known times
and locations, are a common method used to correct for imperfect detection and scavenging (DeVault
et al. 2003) and typically focus either on scavenging rates or on searcher efficiency (Morrison 2002). In
the former case, the average persistence time of carcasses post-distribution is generally the parameter
of interest. In the latter case, the parameter of interest is the detection probability, estimated as the
proportion of distributed carcasses found by searchers. Understanding the effects of the two processes
(scavenging and detection) in a simultaneous trial is a problem in competing risks.

An important barrier to the implementation of methods for estimating competing risks in animal
populations has been the lack of appropriate software. In human health risk assessment and
epidemiology, competing risks are traditionally modeled using extensions of classical survival analysis,
with hazard functions defined as continuous-time exponential functions (Pintillie 2006). The latter
methods focus on developing mathematical expressions for statistical distributions of survival times. In
contrast, survival analysis in wildlife sampling and demography has developed in discrete time, and has
focused on developing models for estimating probabilities of surviving a fixed period of time (typically in
days, though other temporal units are equally valid). Extension of discrete-time methods for competing
risks in animal populations has lagged behind the analogous methodological advances in survival
analysis that are available in the human health sciences.

The program MCestimate, which implements the algorithms described in this technical manual
within a user-friendly graphical user interface, is designed to provide some of this analytical capability
within the discrete temporal framework that is typically employed for the statistical analysis of wildlife
sampling data. The original motivation for the early development of MCestimate was the study of
competing risks in nest survival estimation (Example 1, above). Because this remains an important
application, and has a rich history in statistical and software development, I have continued to frame the
theoretical development of MCestimate in terms of nest survival estimation. Existing methods for
estimating nest survival parameters will also play an important role in helping to validate MCestimate.
To this end, I present some background in existing nest survival estimation models to which MCestimate
is equivalent under special circumstances. In these circumstances, MCestimate is expected to give the
same answers when applied to common data. Readers only interested in the competing risks algorithms
may wish to skip to Section 3, after browsing the notation that I (hopefully!) apply consistently
throughout this document.

-------
1.4 Notation
       S = the probability that an arbitrary nest in the target population is successful.

       t = the age (in days since the first egg is laid) that a nest fledges.

       N, n = sample size variables. May be used for nests (TV), observations (n) or exposure days (n),
               and exact meaning depends on context.

       Si = the probability that an arbitrary nest in the target population survives day /', where / is an
               ordinal date (by convention, / = 1 on January 1 of a non leap year).  The subscript is
               omitted when daily survival probability is constant (s = s, for all /').

       Wj = the number of times nest/ is visited. The subscript/ will often be omitted when it is obvious
               which nest is under consideration (for example when/ and w; co-occur as subscripts).

       Vj = reserved indexing variable for the w, visits to nest/ (as above,/ will often be assumed)

       ysjv = indicator variable for survival (ysjv = 1 if nest/ was alive on visit v.  ysjv = 0 otherwise).

       djV = the number of days separating visits v and v + 1 to nest/.

       Id (djv) = scalar indicator function returning 1 if d. = d, 0 otherwise.

       X; = vector of covariates to daily survival or failure on day /'.

       P = vector of linear coefficients to survival or failure probabilities.

       K= number of elements of X,.

       k = indexing variable to elements of X, and P (i.e., xik is the £th element of X, and ftk is the £th
               element of P).

       o(Vj) = function returning the ordinal date (/') of visit v to nest/.

       a(u) = function returning the ordinal date (/') on which a nest is aged u (in days since the first egg
               was laid).

       F = the number of causes of nest failure that are under study.

       /= indexing variable to  specific causes of failure (1 
-------
       I/(X;) = indicator function that returns a vector of zeros and ones indicating which elements of
              X; are covariates to fate/in a particular model.
2. Mathematical background (binomial nest survival)
This section provides much of the background material and history that led to the development of
MCestimate.  As such it draws heavily on the development of formal methods for estimating nest
survival rates, which is the primary context that motivated MCestimate. However, it is not intended to
be a complete history of nest survival, a much larger topic with a deeper history than can be covered
here. I also provide brief treatments of some standard mathematical techniques, such as maximum
likelihood and the delta method. In most of the following equations I have altered the notation used by
the original authors to facilitate the subsequent introduction of the Markov transition matrices in
Section 2.9.

2.1 The Mayfield estimator
The history of modern avian  nest survival estimation begins with the work of Harold Mayfield (1961,
1975).  Mayfield's original insight was the observation that, for most empirical samples of avian nests,
drawn from a defined population, the simple proportion of successful nests would likely overestimate
the probability that any given nest in the population would be successful.  The reason for this positive
bias is that many nests are not immediately discovered upon construction, but are nevertheless included
in the sample as they  are discovered, sometimes rather late in development. Therefore successful nests
are more likely to be included in a sample than failed nests. These ideas are expressed mathematically
below.

       For simplicity, assume that, if nest/ is successful, then its state on  the final visit (wj) is
"successful" (ysjw = 1),  whereas if a nest fails, its state on the final visit is "failed" (ys]-w = 0). Then the
                                                    N
number of successful  nests (Ns) in the population is Ns = 2_,y^w anc'tne simple proportion (P) of
                                                   ;=i
                     N
successful nests is P = —L.  Mayfield's (1961)  insight was that E  S  
-------
 (2.1)
    Equation (2.1) is commonly referred to as the Mayfield estimator.  Its logic can be dissected as
follows. Noting that N- Ns is the number of nests that failed and that a nest can fail only once, TV-TV,
also gives the total number of days on which a nest failed to survive. Similarly, the sum in the
denominator of the ratio in Eq. (2.1) is the total number of days that nests were exposed to failure over
all nests in the sample.  Therefore, the ratio on the right hand side of Eq. (2.1) gives the proportion of
the total number of exposure days that nests failed to survive, which is an estimator for the daily
probability of failure. Its complement is an estimator for the daily probability of survival, s.  At least
three important assumptions underlie Eq. (2.1):

    1.  The precise dates of nest failure are known exactly (often equated to the assumption that d]V = 1
       for ally, Vj).
    2.  The rate of nest survival/failure is constant within  and among nests.
    3.  The fates of nests (successful/failed) are correctly  assigned.

    While Eq. (2.1) is quite simple, and its assumptions are easily falsified, its impact on avian
productivity research can  hardly be overstated.  It now has a long history of direct application and
generalization  (Johnson 2007), of which  I review only a few highlights pertinent to the development of
MCestimate. The first generalization to  Mayfield's estimator (Johnson 1979) concerned assumption (1)
above, that the precise dates of nest failure are known exactly.  The second assumption, that nest
survival is constant within and among nests has been addressed by many, with the most comprehensive
treatment offered by Dinsmore et al. (2002). The third assumption, that fates are correctly assigned,
has received limited attention (Manolis et al. 2000,  Stanley 2004, Etterson and Stanley 2008).

2.2 Generalization of Mayfield assumption (1)
The first formal statistical  treatment of the Mayfield estimator was provided by Johnson (1979) who
generalized Eq. (1) to cases in which precise failure dates are unknown. In most cases, nests are not
monitored every day, so that when a nest fails, its precise  date of failure is known only to have occurred
within the interval spanned by two visits (which  I'll refer to hereafter as an  observation). Johnson (1979)
provided the following likelihood (though the notation is mine):

(2.2)   Is   nd  , nsd    =\ \\~"   \ sa "' l-sd " , where:
       nd = the total number of observations lasting d days: nd =
                                                           j=\ v=l
-------
       nsd = the number of observations of length d that are successful: nsd =

Johnson (1979) further showed that the Mayfield estimator (2.1) is the maximum likelihood estimate
(see below) for s when d]V = 1 for all observations on all nests in the sample.

2.3 Maximum likelihood estimation
Analysis of likelihood equations  such as (2.2) proceed according to a standard technique (Edwards
1992).  First, the log of the likelihood (/) over the full data set is taken, resulting in the log-likelihood (L).
Then the value of s at which L reaches its maximum is calculated. In rare cases this can be done
analytically, by taking the derivative of L with respect to s and  solving for the value of s at which dL/ds =
0. The resulting estimate of s is the maximum likelihood estimate (MLE), and is typically denoted s . In
general, MLEs cannot be found analytically and must be found using a computer to perform a numerical
search. The second derivative of the log-likelihood function evaluated ats  contains information about
the estimated sampling variance of s . In particular, the negative inverse of the second derivative of L
with respect to s, evaluated at s is an estimate of the sampling variance of  s conditional on the available
data.

       In  presenting the following likelihood functions, it is assumed that the MLEs are found using the
above procedure generalized to the case in which the MLE is not a single value (i.e., s ), but rather a
vector of values (P ).  The MLE is then the vector of values that jointly maximize L.  The estimated
covariance matrix of p , referred to below as  cov  P  , is the negative inverse of the matrix of second

derivatives evaluated  at p . The variances of the elements of p are along the diagonal of cov  P  . For
more information on maximum  likelihood estimation, see Edwards (1992), Williams et al.  (2002) or
Burnham and Anderson (2002).

2.4 Generalization of Mayfield assumption (2)
A major goal of research in avian nest ecology has been to investigate hypotheses concerning variation
in s . While this has been done  in numerous ways beginning with Green (1977) and Johnson (1979),
MCestimate uses methods similar to those that were articulated by Dinsmore et al. (2002), which draw
from the theory of generalized linear models (Dobson 2002) and  from model selection theory using
Akaike's Information Criterion (AIC, Burnham  and Anderson 2002).

       The probability that a nest, active on day /', survives to day / + 1, can be specified as a linear
function in the log-odds (or logit) of daily survival:
(2.3)    In  —*-
           1-5..
-------
In Eq. (3), jq, x2, ...,XK represent values of variables (covariates) thought to influence nest survival (st) on
day /'. For example, these could be ordinal date (/'), nest height above ground, age of the nest since the
first egg was laid, etc.  By convention, x1 = 1 and /?i represents an intercept parameter. Equations like
(2.3) are more succinctly expressed in matrix notation. Thus let:

       P = column-vector of K \\near coefficients (f}k)

       X; = row-vector of variables thought to influence st, where the first element of X, is always 1.

Then Eq. (2.3) can be rewritten:

          (  s   "i
(2.4)    In  —i- =XJI
          V-si)

The basic estimation problem is no longer to obtain maximum likelihood estimates for st, but rather for
P. This is done by using the inverse logit function (Eq. 2.5 below) and substituting the resulting
expression for st into the likelihood (2.6, below).

,2.5)    ,
            i + exp X;p

The basic likelihood function employed by Dinsmore et al. (2002) is a generalization of Johnson's (1979)
likelihood, Eq. (2.2) to allow nest survival to vary among days within a nest (again the notation is my
own, but the equation is that of Dinsmore et al. 2002).

                                (' o w-l -1
(2.6)    / P  Wi,ysiv,d         '
               j7sjv,jv
                                  i-o 1
2.5 AIC and model selection
Different hypotheses about influences on st can be formulated by specifying different variables in the
linear component of Eq. (2.3) and finding the MLEs as described above. The fitted models, representing
alternative hypotheses, can then be compared using Akaike's Information Criterion (AIC, Burnham and
Anderson 2002), where the smallest value of AIC corresponds to the best model (hypothesis) among all
models tested.
(2.7)    AIC = -2Z P  +2K

In Eq. (2.7), the log-likelihood (L) is evaluated at the maximum likelihood solution ( P ).  Both Dinsmore
et al. (2002) and MCestimate actually use a sample-size corrected version of AIC (AICC, Burnham and
Anderson 2002):
-------
(2.8)   AICC=-2Z
In Eq. (2.8) n is the sample size for nest-survival estimation. Dinsmore et al. (2002) used the total
number of observation days over all nests for n:
MCestimate uses a slightly modified calculation provided by Rotella et al. (2004) in which n is calculated
as the sum of the total number of successful observation days over all nests and the total number of
failed nests.
2.6 Fledging rates
Mayfield's original motivation, to better estimate the overall probability (S) that a nest will fledge at
least one young, remains an important goal in nest-survival estimation. When s is invariant, then:
(2.9)

When s is assumed to vary, then:

           a t -1
(2.10)  £=rp
           i=a 1

2.7 Variance

2.7.1 Variance of homogeneous daily survival rates
For a single estimated MLE (e.g., s ), its variance var s is estimated as the negative inverse of the
second derivative of the log-likelihood function, evaluated at  s (Edwards 1992).  For the Mayfield
estimator, this quantity is available as a simple formula:

                 s  1-s
(2.11)  var  s  =	:	,
v    '            n  w-1    '
                     x,
When the MLE must be estimated numerically, the variance is generally not available in simple form,
though it can be estimated numerically.
                                             10
-------
2.7.2 Variance of heterogeneous daily survival rates

When the estimated quantity is a vector of parameters (e.g., p ), the covariance matrix, cov P  , is
estimated as the negative inverse of the Hessian matrix (the matrix of second derivatives of the log-
likelihood, evaluated at p ). To get the variance of the sf from cov  P  we can use the delta method

(Seber 1982).  In the general case, let  6 represent an MLE (scalar or vector) and/( 6 ) represent some
function of 6 . Then the variance of thefunction/(6 ) is calculated from the estimated variance of 6 as:
(2.12)  var / 0   =
For the case of variable daily survival rates, st, 0 = p, and f  0 — f  P  is given by Eq. (2.5).
Therefore (for derivation of Eq. 2.13 below, see Section 4.1):

       df P    ds
(2.13)  - ^ = -J. = S.  l-s,  X;.,and:
                      '
df 0
dQ
cov 0
df 0
dQ
(2.14)  var st =st  l-s, X,.cov p  st l-st  Xj

2.7.3 Variance of fledging rates under homogeneous daily survival
The delta method can also be extended to overall fledging rates. It first requires calculation of the
derivative of the overall fledging rate (S) with respect to the estimated survival rates. In the simple
homogenous case (Eq. 2.9 above), the equation  is:
(2.15)  var S =
dS_
ds
var s
dS_
ds
= [is'"1]
                   var  s
2.7.4 Variance of fledging rates under heterogeneous daily survival
When the overall fledging rate is a function of variable survival rates (sf), which are in turn functions of
P, a relatively simple formula can still be found, but with a little more effort. First, from Eq. (2.10), note
that:

               a t -1
       In S  = X ln st
               i=a 1

Therefore:
                                            11
-------
        din  S    ^dln  s.
          dst      1=a !   dst

Therefore:
                         „«'-! i
         i  _  _ — C V* —
        dst       dst      t^i st '

Remembering Eq. (2.13) above, and applying the chain rule:

        do   do dsi   * ^-i
        — = -- = o /   -X.,-  1 — 5,
        dp   d?,dp    ,fl

Therefore, under variable 5. :

                  a / -1                  a / -1
(2.16)   var  S  = ££ X;.  l-st cov p S ^ x^ 1-*,-
                  i-a 1                   /-a 1

2.8 Confidence limits
Standard practice in linear modeling for generating confidence limits around estimated parameters is to
choose a tolerable Type I error rate (a) and find the  associated z-score from a standard normal
distribution corresponding to 100 (1- a}% coverage  (two-tailed). A common choice is z = 1.96,
corresponding to an approximate 95% coverage (a = 0.05). Confidence limits can then be generated as
s + zJvar s  .  However, both logic and experience suggest that this procedure will often produce
confidence limits outside the range [0, 1]. Thus, a better practice is to adopt the generalized linear
                                                                           /v
modeling approach to nest survival estimation (Eqs. 2.3 - 2.5, above) and use cov p  to generate
100(1- a)% confidence limits around the logit (Eq. 2.3). The resulting confidence limits can then be
transformed to the probability scale using the inverse logit (Eq. 2.5), and the resulting limits will always
lie in the interval [0, 1].  Because the mapping from the logit to the probability space is one-to-one, the
resulting coverage rate for the probabilities is equal  to that of the confidence interval on the logit (Rao
1973).  This procedure still requires the use of the delta method to calculate confidence limits on the

                 ~   ~         ~                  (  s  }     -      df P
logit. In this case 0 = p and / p  = logit 5.  = In — '—  =Xp,and - — = X  . Therefore:
                                               {1-sJ     '         dp

(2.17)   var  logit  5;.  =X;.cov P X;T.

Thus confidence limits [/, u\ around the logit can be  generated as follows:

                                             12
-------
(2.18)
Finally, back-transformed confidence limits around st , [L, U] are:
                  e*    e"
(2.19)   [L,U] =
2.9 Binomial nest survival as a Markov chain
Etterson and Bennett (2005) showed that the basic likelihood function of Johnson (1979) could be
formulated as a product of cell probabilities in Markov transition matrices. The notation used below
differs from Etterson and Bennett (2005), but the mathematical expressions are equivalent.  Let:
(2.20)   M =
             s   l-s
             0    1
and
           =rv   i-v
         jv   L-'S/V   Ssjv
Using the above notation a probability model for an arbitrary observation on a nest can be formulated
as follows:
(2.21)   Pr yhv+l\s,yiv,div  = YJM^Y*
The corresponding likelihood for all observations on all nests is:
(2.22)
             yjv,djv
Eq. (2.22) is equivalent to Eq. (2.2), except that the binomial coefficients are omitted (they don't affect
the solution). Similarly, when d]V = 1 for ally, v, the MLE corresponding to Eq. (2.22) is the original
Mayfield estimator (Eq. 2.1).

       To investigate variation in daily survival, we must have different matrices (M,) for each day,
where Sj is as defined in Eqs. (2.3 - 2.5). Thus:
               M =
                      0    1
The full likelihood over all observations on all nests is:
(2.23)
                                   fov+\-\
                                              13
-------
Eq.(2.23) is equivalent to Eq. (2.6).

Standard errors and confidence limits for these binomial (success/fail) models can be calculated exactly
as described in Sections 2.7 and 2.8.
3. Competing risks (multinomial nest survival)
With the Matrix notation of Markov chain transition matrices developed above, the generalization of
survival methods to handle competing risks is straightforward.  For simplicity, the equations below are
all presented with  two causes of failure, but all equations immediately generalize to an arbitrary number
of failure categories. The transition matrix for this probability model was originally published by
Etterson et al. (2007).

3.1. Homogeneous failure rates
(3.1)   M =
s  ml   m2
0   1    0
0   0    1
In Equation (3.1) and all subsequent similar matrices, note the constraints that:
       0
-------
(3.3)    X;y — If  X. «X., where the • operator represents the element-wise (dot) product.

By convention (as above) the first non-zero element of X,y will always be 1, corresponding to an
intercept parameter for that failure rate. Then, for a given failure rate, its multinomial logit (relative to
daily survival) is defined as:
(3.4)    In  -M = X^
          (  sf
The inverse function of (3.4) is:
(3.5)    my=
And
(3.6)    st=-
Again, the likelihood function is virtually identical to Eqs. (2.23):

             yjv,djv   -n
                                YT
                                1  j.v+l
                               V   V
In Eq. (3.7) M, is as defined in Eq. (3.1), except that s, m^ and m2 are replaced by Sj, mn and mi2, as
defined in Eqs. (3.4 and 3.5):
(3.8)    M,. =
s,   mn   mi2
0    1    0
0    0    1
3.3 Fledging rates and overall probabilities of failure
As with the binomial model it is of interest to know the overall probabilities that a nest will fledge or fail
due to the various identified causes. These are trivial to calculate under the Markov notation. In the
case of homogeneous failure rates (using the matrix M from Eq. 3.1), we can define a new matrix St:

(3.9)    S, =M'
                                              15
-------
The cell entries of St corresponding to particular probabilities in M give the overall probabilities that a
nest is in a given state after t days . Thus, for example, the entry in the first row and first column of St
gives the overall survival probability after t days. Similarly, the entry in the first row and second column
of St gives the overall probability that a nest has suffered the first fate (corresponding to/= 1) after t
days, and so on.

       For the case of heterogenous failure rates (mtf, M,), we use the product operator and define the
sequence of age-specific matrices.

            a t -1
(3.10)   S =F[M  =M ,M , M . ...M    ,
v     '    t   Xi   *     a 1   a 2   a 3    a t -1
            i=a 1
3.5 Variance
Variances in the competing risks models are calculated in similar fashion to the binomial models, using
the delta method (Eq. 2.12) and the estimated covariance matrix of MLEs.  However, the inverse
multinomial logit (expit, Eq. 3.4 3.5) complicates the calculation of the derivatives for the failure rates
(for derivation see Section 4.2).  For this section, I assume the generalized linear model framework is
used and provide the formulae only for those cases. Formulae for the homogeneous rates are obtained
as special cases in which X!y=l for all/

3.5.1. Daily probabilities
        dmf
(3-n)   -
For completeness it is worth noting the partial derivative with respect to the daily survival rate (st):
Application of the delta method (Eq. 2.12) is now possible, with 0 = p and f  0 —f P  is Eq. (3.4),

with derivatives as shown in Eqs (3.11 - 3.12).

3.5.2. Overall rates

Using Eq. (3.10):

(3.13)   Sj - Ma l , and, for arbitrary t > 1:


(3.14)   S^S^M^


                                              16
-------
First, given a matrix of the form (3.8), define - '- as the matrix of element-wise derivatives of M, with
respect to the £th element of P (fik). These can be obtained from Eqs. (3.11 3.12). Second, define: -
                                                                                        dpk
inductively as follows:

        d|§    dM-ai
(3.15)  - - = - and, for arbitrary (t > 1), we apply the product rule for matrices:
            TO
The matrix - - gives the required derivatives, which can be used with the delta method, to calculate
the desired variances.

3.6. Confidence limits
Confidence limits on the multinomial probabilities are more difficult to estimate than for binomial
probabilities. However, the multinomial case is a direct generalization of the binomial case. The
method used by MCestimate was described in detail by Sambamoorthi et al. (1994). In contrast to the
binomial model, in the multinomial case the expected coverage rate is not exact. Instead, it is a minimal
bound on the expected coverage rate. In other words, the true expected error rate is less than or equal
to the nominal error rate and the true expected coverage rate is greater than or equal to the nominal
coverage rate.  Thus the confidence intervals provided by MCestimate are, on average, conservative.

       As with the binomial case, generation of confidence limits begins with the logit. However, in the
multinomial case there are F logits that must be considered as well as the survival probability. The
logitsforthe estimated failure probabilities are given by Eq. (3.4).  For each of the logits, upper and
lower confidence limits can be calculated as described in Section 2.8 above.  Let L, and U, represent
column-vectors of .Flower and upper confidence limits around the logits calculated individually for each
logit.  Thus, for a specific fate,/:
(3.17)   L;/ =X!/p-zX!/cov P  Xj ,and:
(3.18)   U!/=X!/P +

In Equations (3.17 3.18) z is the two-tailed z-score from a standard normal distribution and L^and U,/
are the/h elements of L, and U,, respectively. Now, let Q represent an/ x 2 matrix with L, as column 1
and U, as column 2.  Further, let F, represent a 2^x/matrix consisting of all possible/tuples generated

                                              17
-------
by drawing one element from each row of C*. The 2^ rows of F, form the basis for deriving simultaneous
confidence intervals around the estimated probabilities. However, one additional step is needed. Let P,
represent a matrix of dimension 2^x (/+ 1) generated by applying the inverse multinomial logit (Eq. 3.5
and 3.6) to each row of F*. For  example, let F^r^/) represent the entry in the rth row and/h column of F*.
Then the rth row of P, is generated as follows:

(3.19)  P! r,l = - - - ,and:
(3.20)  P! rj
                           ,  r,f


                      exp  F r,f
                             F, rj
Sambamoorthi et al. (1994) showed that the column-wise minima and maxima of P, are conservative
(with respect to the nominally specified a) simultaneous confidence intervals on the estimated
probabilities (st and m^.


4. Appendix: useful derivatives

4.1. The derivatives of the binomial logit
         exp X,p
l_et s — _ £. _ 'Jl _
    '  1 + exp  X;p

Then, by the quotient rule:
                    s\                       £.
       1 + exp  X;p  — exp X,p   -exp  X;p — 1 + exp X,p
ds, _      F   'F  ap   F   'F          'dp          '
5p                       1 + exp X;p

Therefore:

ds.    1 + exp  X;p  exp  X;P X;.-exp  X;P exp  X;P X;.
                       1 + exp X,

And:
(4.D   ^=  exp Xf __ l- _ x,=,, i-,.  x,
       5P  1 + exp  X,p 1 + exp X,p   '    '
                                           18
-------
4.1. The derivatives of the multinomial logit

           exp X.,p
Let m =
    V     F
Then, by the quotient rule:
X. B  — exp  Xrp  -exp Xrp
 lg  I SB       lf           lf   spl
                                      X!gp
                                                            XJI
 sp
                       exp X;/p X^-exp  X;/
6mjf     exp X!7|
                              exp X!gp
                                                        X
                                 exp X!gp
 5wr     exp Xif
         g=i
             g=i
Noting that, by Eq. 3.3, X;, =1, X. -X.
3m,
                                         19
-------
And finally:


       dm.f         (         P
(4.2)   -X!7I  X,  -      X,  mi
To obtain the derivatives with respect to daily survival (st) under the multinomial logit, let

            1
Then, by the quotient rule:
 a
And:
 SP       r   ^
          l + £exp  X,,

Substituting back in the daily survival and failure probabilities:


       a*
5. Acknowledgements

M. Starus provided valuable editorial comments on this document.
                                            20
-------
6. Literature cited

Abbott, W.S. 1925. A method of computing the effectiveness of a pesticide. Journal of Economic
       Entomology 18:265-267.
DeVault, T.L, O.E. Rhodes, Jr. and J.A. Shivik. 2003. Scavenging by vertebrates: behavioral, ecological,
       and evolutionary perspectives on an important energy transfer pathway in terrestrial
       ecosystems. Oikos 102:255-234.
Dinsmore, S. J., G. C. White, and F. L Knopf. 2002. Advanced techniques for modeling avian nest survival.
       Ecology 83:3476-3488.
Etterson, M. and R. Bennett. 2005. Including transition probabilities in nest-survival estimation: A
       Mayfield Markov Chain. Ecology 86:1414-1421.
Etterson, M. and T. Stanley. 2008. Incorporating classification uncertainty in competing risks nest failure
       modeling. The Auk 125:687-699.
Etterson, M., B. Olsen, and R. Greenberg. 2007b. The analysis of covariates in multi-fate Markov chain
       nest failure models. Studies in Avian Biology 34:55-64.
Etterson, M., L. Nagy, and T. Robinson. 2007a. Partitioning risk among different sources of nest-failure.
       The Auk 124:432-443.
Hazier, K. R. 2004. Mayfield logistic regression: A practical approach for analysis of nest survival. The Auk
       121:707-716.
Heisey, D. M., and B. R. Patterson.  2006. A review  of methods to estimate cause-specific mortality in
       presence of competing risks. Journal of Wildlife Management 70:1544-1555.
Huso, M. M. P. 2011. An estimator of wildlife fatality from observed carcasses. Environmetrics 22:318-
       329.
Johnson, D. H.  1979. Estimating nest success: the Mayfield method and an alternative. The Auk 96:651-
       661.
Kostecke, R. M., G.M. Linz, and W.J. Bleier. 2001. Survival of avian carcasses and photographic evidence
       of predators and scavengers. Journal of Field Ornithology 72:439-447.
Manolis, J. C, D. E. Anderson, and F. J. Cuthbert. 2000. Uncertain fates in songbird studies and variation
       in Mayfield estimation. The Auk 117:615-626.
Martin, T. E. 1992. Breeding productivity considerations: What are the appropriate habitat features for
       management? Pages 455-473 in Ecology and Conservation of Neotropical Migrant Landbirds
       (J.M. Hagan and D.W. Johnston, Eds.). Smithsonian Institution Press, Washington, DC.
Martin, T.E. 1995. Avian life history evolution in relation to nest sites, nest predation, and food.
       Ecological Monographs 65:101-127.
Mayfield, H. F. 1961. Nesting success calculated from exposure. Wilson Bulletin 73:255-261.
Mayfield, H. F. 1975. Suggestions for calculating nest success. Wilson Bulletin 87:456-466.
Morrison, M.L 2002. Searcher bias and scavenging rates in bird-wind energy studies. National
       Renewable Laboratory Report NREL/SR-500-30876, Golden, CO.
       http://www.nrel.gov/wind/pdfs/30876.pdf. Accessed 26 April 2011.
Pintillie, M. 2006. Competing Risks: A practical perspective. John Wiley & Sons, Ltd. West Sussex, UK.

                                              21
-------
Rao, C.R. 1973. Linear Statistical Inference and its Applications, 2nd Ed. John Wiley & Sons, Inc. New York,
        NY.
Radunzel, L. A, D. M. Muschitz, V. M. Bauldry, and P. Arcese. 1997. A long-term study of the breeding
        success of Eastern Bluebirds by year and cavity type. Journal of Field Ornithology 68:7-18.
Ricklefs, R. E.  1969.  An Analysis of Nesting Mortality in Birds. Smithsonian Contributions to Zoology:9.
        Smithsonian Institution Press, Washington, DC.
Seber G. A. F. 1982.  The estimation of animal abundance and related  parameters, 2nd ed. Charles Griffin
        and Co., London.
Stanley, T. R.  2004a. When should Mayfield model data be discarded? Wilson Bulletin 116:267-269.
Traylor, J. J., R. T. Alisauskas, and F. P. Kehoe. 2004. Nesting ecology of White-winged Scoters (Melanitta
       fusca deglandi) at Redberry Lake, Saskatchewan. Auk 121:950-962.
Trine, C. L. 1998. Wood thrush population sinks and implications for the scale of regional conservation
        strategies. Conservation Biology 12:576-585.
Westemeier,  R. L., J. D. Brawn, S. A. Simpson, T. L. Esker, R. W. Jansen, J. W. Walk, E. L. Kershner, J. L.
        Bouzat, and K. N.  Paige. 1998. Tracking the long-term decline and recovery of an isolated
        population.  Science 282:1695-1698.
                                               22
-------