DISFIT : a distribution fitting system


               United States          Health Effects Research      EPA-600 2-81 -010
               Environmental Protection     Laboratory           January 1981
               Agency            Research Triangle Park NC 27711

               Research and Development
&ERA        Disfit: A  Distribution
               Fitting System


               Distributions

-------
                                    EPA 600/2-81-010
                                    January 1981
   DISFIT:  A DISTRIBUTION FITTING SYSTEM

         1.  DISCRETE DISTRIBUTIONS
               A User's Guide
              Victor Hasselblad
               Andrew G. Stead
              Helen S. Anderson
              BIOMETRY DIVISION
     HEALTH EFFECTS RESEARCH LABORATORY
    U. S. ENVIRONMENTAL PROTECTION AGENCY
RESEARCH TRIANGLE PARK, NORTH CAROLINA  27711

-------
                                  DISCLAIMER
     This report has been reviewed by the Health Effects Research Laboratory,
U.S. Environmental Protection Agency, and approved for publication.  Approval
does not signify that the contents necessarily reflect the views and policies
of the U.S. Environmental Protection Agency, nor does mention of trade names
or commercial products constitute^endorsement or recommendation for use.
                                    ii

-------
                                   CONTENTS

 1.   Introduction 	  1
 2.   Methods of Estimation  	  3
 3.   Use of the Program	  4
 4.   Constant Distribution	5
 5.   Binomial Distribution	8
 6.   Truncated Binomial Distribution	11
 7.   Mixture of Two Binomial  Distributions	14
 8.   Beta-Binomial  Distribution	17
 9.   Poisson Distribution	21
10.   Truncated Poisson Distribution	24
11.   Mixture of Two Poisson Distributions	27
12.   Negative Binomial Distribution	30
13.   Truncated Negative Binomial  Distribution	34
14.   Logarithmic Distribution	37
15.   References	40
     Appendix A.  Index	43
     Appendix B.  Flow Chart of the Program 	45
     Appendix C.  Error Messages	46
     Appendix D.  Fortran Program Listing  	47
                                     999
                                     m

-------
                                   ABSTRACT

     The DISFIT system is a series of programs and subroutines to fit
distributions to data.  This first volume describes the routines to fit
discrete distributions.  The distributions included are the binomial,
truncated binomial, mixture of two binomials, beta binomial, Poisson,
truncated Poisson, mixture of two Poissons, negative binomial, truncated
negative binomial, and logarithmic.  All  parameters are estimated using
maximum likelihood techniques.  Any of the parameters may be specified
instead of estimated.  Variances or estimated asymptotic variances of
the parameter estimates are also given.  Some tests of hypotheses are
possible using likelihood ratio tests.  This guide contains the
descriptions, calling sequences, documentation, and examples for each
distribution.  The program is written entirely in FORTRAN, and a listing
of the program is in the appendix.
                                    IV

-------
                               ACKNOWLEDGMENT

     The authors would like to express their appreciation to Dave Mage and
Gene Lowrimore for their helpful suggestions.  We are especially grateful
to Barbara Crabtree for her typing and editing of this guide.

-------
                                1.   INTRODUCTION


     Although distribution  fitting  methods abound  in the statistical
 literature, very few of  these methods are found  in the major statistical
 packages.  In particular, SPSS  [27], BMD-P [10]  and SAS [3] only give some
 overall  tests for normality.  There are at least three specialized distribu-
 tion fitting packages.   Bouver  and  Bargmann [6]  wrote a system to fit the
 Pearson  system of curves by the method of moments.  The system of All dredge
 and Bolstad [1] fits the normal, lognormal, exponential, gamma, and uniform
 distributions using maximum likelihood for some  distributions, and moment
 estimators for others.  Morgan, Nelson, and Caporal [25] have written a
 system that includes the normal, lognormal, Weibul and binomial.  This
 system has the advantage that linear models can  be fitted to the shape
 parameter, and maximum likelihood methods are used.  This feature limits
 the kinds of distributions which are fitted, however.   None of these
 systems  include a wide variety  of distributions commonly appearing in the
 literature.

     The examples chosen for this distribution fitting package include some
 of the classic ones, as well as those pertinent to the environmental  field.
 The analysis of both environmental monitoring and environmental  health data
 require distribution fitting programs.   For example,  ambient air standards
 are often based on the distribution of the "second maximum".  Health surveys
 often yield counts, and the distributional form of these counts  may be
 important for hypothesis testing.

     This first report will  be  restricted to discrete  distributions.   Much
 of the information on the distributions is found in "Discrete Distributions"
 by Johnson and Kotz [19].  This reference is extremely comprehensive, so
 only a small  portion of its information is contained  in this report.   In a
manner similar to Johnson and Kotz [19], each section  of this report discusses
 a particular distribution.  Included are definitions  of the distribution,
 estimating equations, goodness-of-fit tests, and actual  examples.   In general,
 parameter estimates and their variances are given, but tests of hypotheses
 are not usually given.   Tests of hypotheses often can  be made from the
 likelihood functions provided.  The appendix of the paper contains the
 FORTRAN routines to perform the estimation and distribution fitting.
A "main" program is also included to read in the data, and to enable the
user to fit several different distributions to the same data.

     The routines are written so that each parameter  may be specified or
estimated.  In some cases, specifying a parameter value may give a specific
distribution.   For example, if  N is set to 1, the negative binomial
distribution becomes the geometric distribution.  There may be other cases

-------
where one of the parameters is known, and the other(s) need to be estimated.

     We welcome suggestions and corrections to this report and its programs.
Two additional reports are planned.  They will contain continuous
distributions and grouped continuous distributions respectively.

-------
                           2.  METHODS OF ESTIMATION


     Estimates in this paper are calculated by maximum likelihood.  Any
estimates calculated by other means will be so indicated.  There are, of
course, several reasons for using maximum likelihood estimates.  The
estimates have several optimal properties including asymptotic consistency
and efficiency (Kendall and Stuart [20]).  The comparison of the likelihood
function for different distributions can give some measure of goodness-of-fit.
In some specific cases, it is possible to test the fit of one distribution
versus another using likelihood ratio tests.

     Maximum likelihood estimates also have certain problems.  Some of these
are caused by small sample sizes, although distribution fitting usually
requires moderate to large sample sizes.  The likelihood function for
mixtures of continuous distributions can have singularities corresponding
to each data point, as shown by Day [8],  The discrete distributions
considered in this paper do not have this particular problem.  Finally,  the
likelihood function may have more than one local  maximum.  This can occur
even with discrete distributions, and we can only hope that the initial
estimates lead to meaningful solutions in each case.

     Distributions such as the gamma or beta, or functions of them such  as
the negative binomial, require iterative methods  to obtain maximum likelihood
estimates.  The modified Qauss-Newton method is easy to compute,  and has
the advantage of being a second order method.  A description of the method
was given by Hartley [15],  In other cases the standard Newton-Raphson
method was used.

     Maximum likelihood estimation is especially useful for problems of
"missing information".  Missing information can take the form of censored
data, truncated distributions, grouping, and mixing of distributions.
All of these problems can be handled by using the "missing information
principle" (MIP)  of Orchard and Noodbury [28].  This method will  converge
under a wide variety of conditions, and the convergence is geometric as
has been shown by Dempster, Laird, and Rubin [9].  We have also used this
method in our routines.

-------
                             3.  USE OF THE PROGRAM
      The program to fit various discrete distributions was written entirely
 in FORTRAN 77 on a Univac 1100 computer.  It was written so that conversion
 to other FORTRAN compilers should be relatively simple.  The program
 consists of a main program to read in the data, and a series of subroutines
 one for each different distribution.  The exact calling sequence for the
 main program will depend on the job control  language of the computer used.

      The main program reads in the following cards:
Variable
TITLE
K
FMT
NF(I)


Discription
80 character title
Value of largest count
Variable format statement using integer mode only
K counts, NF(I),I=1,K+1, are read in, where NF(1) = no. of
O's, NF(2) = no. of Ts, NF(3) = no. of 2's,...,and
NF(K-H) = no. of values = K.
Format
20A4
15
20A4

FMT
 These counts,  NF(I)  will  be denoted by fj  ,  in  the  formulas  in  the  text,
 so that f-j  denotes the number of occurrences  of the value  "i".   For an
 explanation of a variable format statement,  see any FORTRAN  manual,  or
 the SPSS [27]  manual.

      This  information  is  followed by a card  for each distribution to be
 fitted.  The description  of each card  is given  in the chapter corresponding
 to the distribution  requested.

      The following example shows the cards just described.
     A blank card following the distribution cards finishes the analyses of
the data set.  The program will then attempt to read in additional data sets.
An end-of-file card will terminate the run.

-------
                           4.  CONSTANT  DISTRIBUTION
4.1  Definition
     The "constant" distribution  routine  is really a summary  routine.   It
computes the first four sample moments:

           K                      K
     Mi =  L, 1 f-:/n» where n =  £f,» and                      (4.1)
       1   i=0    n              i=0 1
           K
     M-. =  E (i-M1)Jf,/n, j = 2,3,4                            (4.2)
      J    i=0      '   1


The routine also prints the frequency and cumulative frequency distribution.

     It is also possible to specify the expected frequencies through
constants  (thus the name of the routine), C.'s, i = 0,1,...,K.  These
constants are normalized so that the sum of the C^'s = n.  Thus the C^'s
can be expected counts, probabilities, or just proportional numbers.
If the Cj's are not specified, they are assumed to be equal.  This
particular case is commonly called the discrete uniform or discrete
rectangular distribution.

4.2  Calling Sequence

     The constant routine is called by


           Col.    8 i    151        25 -        351        451        551
          (CONSTANT!  |      l      l      l


If no constants are specified, then each C^ is set to n/(K+l).  If more
than four constants are needed (K > 3), they are read in on additional
cards using the format (8F10.0).  NP is the number of parameters estimated
from the data itself, and is often left blank.

-------
4.3  Tests  for Goodness-of-Fit

     A  standard chi -square goodness-of-fit test is  computed for the
observed  frequencies:


     X2 • .Etf,  -  C//C,                                         (4.3)


which has K-NP degrees of  freedom.  The cells are collapsed from the  left
and right until  the end cells  have an expected  value of 5.   The  degrees of
freedom will be adjusted for collapsing.   The log-likelihood function  is
also computed:

          K
     L  =  .EfjlogfCj/n)                                            (4.4)


4.4  Example

     De Winton and Bateson collected counts of Primula in a  cross breeding
experiment.  The  results are given by Fisher [13].  The expectations
involvingjtwo  independent Mendel ian factors would produce counts in the
ratio of  9:3:3:1.  Figure 4.1 shows that the -fit using this  ratio is not
good.   If the  first two counts came from a more viable plant, then the
ratios  for  the  first two cells still would be 3:1 as would be the ratio for
the last  two.  The relative ratio of the first two cells to the last two
cells could be estimated from the data, giving the expected counts in
Figure  4.2.  Since one parameter is being estimated from the data, this
is indicated on the control card corresponding to Figure 4.2.  The result
is a satisfactory fit.  A likelihood ratio test for the difference in fit
can be computed as -2 (-61 3. 34 + 608.82) = 9.04.   This is asymptotically
distributed as a chi-square with one degree of freedom, which corresponds
to a p-value of .0026.

     Input for Primula data:
        MU  T*   55  -'                 ,     ••
                                    •  '  '   3.

-------
Outputs:
           DATA
                  DI KIMTQN AND 8ATESON
-:|;;.v»rf-;^at :^y.4; #."•» .;^.,^-%^
Figure 4.1
Figure 4.2

-------
                            5.  BINOMIAL DISTRIBUTION
 5.1   History  and  Definition

      The  binomial  distribution was derived by James Bernoulli  in  1713.   The
 variable, x,  has  the  possible values of 0, 1, 2,  ...  , N, where the
 probability of x  is given  by


                                                                   (5.1)


 subject to q  = 1  - p  and 0 < p <  1.  The number   N  will be assumed  to be
 constant  and  known.   It is assumed that the distribution is sampled  n
 times,  so that the outcome i  occurs f-j times.   Thus the fj's sum to  n.
 In  some data, the value N  varies  for each sample.  We have not allowed for
 this  possibility  since it  would require a very different data structure.
 The estimation equations for this and related distributions are easily
 generalized from  the  equations given.

 5.2  Calling  Sequence

      The  binomial  distribution routine is called  by
I          Col.     8|     1SJ        251

          BINOMIAL!   |      |
where  N   is an  integer, and  P  is the value of  p  to be fitted.   If  N
is not specified,  it will be set to K, the value of the largest count.  If
p  is not  specified, it will be estimated from the data.

5.3  Maximum Likelihood Estimate

     The maximum likelihood estimate for  p  is

     ~     K
     P =  £ i V(nN)                                              (5.2)
         1-0    1

The estimate is also the minimum variance unbiased estimate.  The variance  of
p is estimated by

     Var(p) = p(l-p)/(nN)                                           (5.3)


                                       8

-------
5.4  Test for Goodness-of-Fit

     A standard goodness-of-fit test is computed for the observed
frequencies


     X2 =  L(fn- - E.)2/E., where E. = nPr[x=i]                   (5.4)
          i=0            '           '

and the cells are collapsed from the left and right until the end cells
each have an expected value of 5.  The degrees of freedom will  be the number
of cells after collapsing minus two.  If  p  is specified, then only one will
be subtracted.

     A second goodness-of-fit test is commonly known as the homogeneity test.
It is a test of the variance of the sample, and is discussed by Potthoff and
Whittinghill [30].

            I/
     X2 = (Ef,2)/(p(l-p)) - Nnp/O-p).                          (5.5)
           1=0 n
The statistic is distributed approximately as a x2 with n-1 degrees  of
freedom if  p  is estimated, or  n  degrees of freedom if  p  is specified.

5.5  Example

     Fisher [13] also gives counts from a dice experiment of Weldon.   The
counts represent the number of 5's and 6's occuring in the toss of 12 dice.
The distribution of this number should be given by a binomial  distribution
where  p = 1/3.  In fact, using a  p  of 1/3 gives a very poor fit (Figure
5.1).  If instead  p  is estimated from the data, the estimated  p  of .3377
gives a very adequate fit.   A likelihood ratio test of  p = 1/3 gives an
asymptotic chi-square of -2(-50255.16 + 50241.65) = 27.02, which corresponds
to a p-value of less than .0001.

     Input for Weldon's dice data:

-------
 Outputs :
TOTAL SAWWJE
                     " •""•
                   I*'*
HCWG6iNmr TEST * 3»£y&.*fc» f * .;
••                          ,  ^   ,
   t^C^^GKdJi^C Attg|M^^ft ''filp' ^¥T T£ST
            *      •• N     /*.••.
VAISJE  WSEKVSO  IXPfCTiO

  «   ''
        M14
      , -«194
 19
 11
 12:
    44t
.,,v, 14
           *, ,
 Figure 5.1
                              *>««-
                                             Fit W A

                                                    *

                                             TOTAL
*   *
'"••$
, &
                                                                OATA
                                             Figure  5.2
                                                              V    s'
                                                                                  12
                                                                       "  - >   4 s

                                                                     "" '3^8'* ft af*"
                                               10

-------
                      6.  TRUNCATED BINOMIAL  DISTRIBUTION


6.1  History and Definition

     The truncated binomial distribution occurs when  sampling  from  the
binomial distribution where the outcome of  "0" cannot be observed:

     PH>j] =  0)  PJqN-j/0-qN)                                (6.1)

subject to q = 1 - p, 0 < p < 1, and j > 0.  The distribution was expressed
in this form by Finney [11] in 1949.  The distribution with just the zeros
truncated has also been called the positive binomial  distribution,  since the
truncation could include more than the zeros, or could occur on the right.
As with the binomial distribution, we assume that the distribution  (6.1) is
sampled  n  times, so that the outcome  i  occurs f.  times.  Thus the f.'s
sum to  n.

6.2  Calling Sequence

     The truncated binomial distribution routine is called by

          .Col.     8]     15,        25|
Col.    81      15.        25|
ITBINOMAJ     |       |
where  N  is an integer, and  P  is the value of  p  to be fitted.  If  N
is not specified, it will be set to K, the value of the largest count.  If
p  is not specified it will be estimated from the data.  Note that the main
program requires that the number of "0" counts be read in, even though it
is ignored by this routine.

6.3  Maximum Likelihood Estimates

     The estimate of  p  is calculated using the MIP (or EM algorithm) of
Dempster, Laird, and Rubin^ [9].  The initial estimate comes from Mantel
[22] in 1950.

     p = (x-f1/n)/(N-f1/n)                                        (6.2)

           N
where x =


The MIP iteration scheme is then computed by estimating the "missing"  cell,

fo
                                      11

-------
     f0 = (q)Nn/(l-qN)                                            (6.3)

       »^       /\
where  q = 1 - p.  The parameter,  p, is then reestimated
      8     A       -N
      P =   E ifjO-q )/(nN).                                     (6.4)


 Equations 6.3 and 6.4 define the iteration scheme.  The iterations are
 continued until the relative change in  p  is less than 10"

      The asymptotic variance of  p  is estimated by
     Var(p) - pqd-qCNnd-q-Nw-)]                         (6.5)

which comes from the second partial derivatives of the likelihood function.

6.4  Test for Goodness-of-Fit

     A standard goodness of fit test is computed for the observed
frequencies:



     X2 =   E (f,- - E.)2/E.9 where E, = nPr[x=i]                 (6.6)
           i=l   11    i         i

and the cells are collapsed from the left and right until the end cells
each have an expected value of 5.  The degrees of freedom will  be the number
of cells after collapsing minus two.  If  P  is specified, then only one will
be subtracted.

6.5  An Example

     Finney and Varley [12] investigated the distributions of gall-fly
eggs and gall-eelIs in the production of flowerheads of black knapweed in
1935 and 1936.  This example is also discussed in section 10.5.  Since the
maximum number of gall-fly eggs per flowerhead is about 15, and since no
flowerheads are observable with no eggs, a truncated binomial with N = 15 was
fitted to the data.   The fit is not as good as the fit of the truncated
Poisson distribution (see section 10.5).
                                    12

-------
Input  for Varley's data:

        »t> (wwti-rtv tea* c§««n  No* 'ifis

          a*   it  % »'' ti     *    s   .  5    t     t- ;^i t   t
Output:


VftRUT'S BALL-rtT C6& O3UHT9

MT fO A tWO«AHO eiHtSUAL

TQTAt S*BPtS SIZE «   14«* « *  IS

£STIfttT£D VAUK Of P « .
   CHI-SQUARE GOODNESS Of PIT TOST
  I       29 ..      28,88
  f.       SI       3?»4§-       .*t
  $       3»^      18.*8       4«
  4       m       27.6«
  9       »       I't.SK
  ft   ,    $        5.83
  7       «        1.S8
  ft       e         .45
 11

 13*
 Figure 6.1
                                            13

-------
                    7.   MIXTURE  OF TWO BINOMIAL  DISTRIBUTIONS


 7.1   History  and  Definition

      If  samples are taken  randomly from  two  different  binomial  distributions
 with  a common known N  , then the result is  a mixture  of  two binomial
 distributions.  If  the  probability of sampling  the  first  binomial  distribution
 (with parameter px)  is  a,  the resulting  distribution is
subject to qi = 1 - pi and q2 = 1 - Pz«                           (7.1)


This distribution was described by Rider [32] and moment estimators were
given.  The estimation of parameters from mixtures of distributions is
difficult by any method, and usually requires large sample sizes.

7.2  Calling Sequence

     The binomial mixture routine is called by

          Col.    81    15 I        25 I        35 I       45
          |Col.    81   15 I        25 I        35 I
          IBINOMIX |  |      |      |
where  N  is an integer, and PI, P2, and A are the values of pls p2, and a to
be fitted.  If  N  is not specified it will be set to the value of  the  largest
frequency,  K.  If any of the other parameters are not specified, they  will
be estimated from the data.

7.3  Maximum Likelihood Estimates

     Arbitrary initial estimates are first calculated:


     a = 1/2, pi = (3/4)p, p2 = (4/3)p!                           (7.2)


where p is given by equation (5.2).  The initial estimates are constrained
to the interval [.05, .95].  The (MIP or EM)  iteration scheme has been  given
by Hasselblad [17].

                                     14

-------
            K
     Pi =  E  agi(i)f.i/[(ag1(i)+(l-a)g2(i))Nn]                  (7.3)
           i=0        1


     x      K
     f\       >     f\ *s          s\ y\        y\ /V
           .=Q    a  2    1              a g2 i   n
and
     x      K
     o =    E  ogid)V[(agidHl-a)g2d))Nn]                  (7.5)
           1=0         1

      s\         y\
where gi(i) and g2(i) are in equation (7.1), with the parameters replaced
by their estimates.  The estimated asymptotic covariance matrix is calculated
from the second partial derivatives of the likelihood function, which are
given by Hasselblad [17].

7.4  Test for Goodness-of-Fit

     A standard goodness-of-fit test is computed for the observed
frequencies:

            I/
     X2 =   E (f< - E.)2/E., where E. = nPr[x=i]                 (7.6)
           1=0   n    n    n

and the cells are collapsed from the left and right until the end cells
each have an expected value of 5.  The degrees of freedom will be the number
of cells after collapsing minus four.  If any of the parameters are
specified, the degrees of freedom will be adjusted.

7.5  An Example

     Sokal and Rohlf [36] give the observed frequencies of the number of
males in 6115 sibships of size 12.  The data are from Saxony from 1876 to
1885.  A mixture of two binomial distributions gives a reasonable fit
(p=.0898) to the data, whereas a single binomial gives a very poor fit
(X2 = 105.79, D.F. = 9,, p<.0001).  A likelihood ratio test of the difference
between the two distributions gives an asymptotic X2 of -2(-12534.17 +
12492.54) = 83.26 for 2 degrees of freedom (p<;.001).

Input for Sokal and Rohlf's data:
                                      15

-------
 Output:

 SOKAL AND ROHLF'S SEX DATA FROH SAXONY

 FIT TO A MIXTURE OF TWO BSNQtttM.

 PtXJ * mtALPHA(Pl**X)U-PlJ**t>'HU
    Cl-ALPHAK P2**XM 1-
 TOTAL SAMPLE SIZE *  6115 > H *  IS

 ESTiH*T8»  VALUE §f l»l *     .4756?
 ESTIMATES  VALUE OF 9t *
 ESTSHftTHS  VAtWE OF ALTO* *
 ASYMPTOTIC COVARIAHCE HATBIX OF ESTIMATES
              PI           P2
 PI      ,«*851-00J   ,42582-083
 Pg      .*&&~$W   .6S190-003   .44*59-802
 ALPM
               pflSD«ESS Of FIT TEST

 VALUE  OBSERVED  EXPECTfB  CHI-S<3UARE

   6        3        1,77
   1       24       19.51     1,55

   3      es&      31»l24     lla*
   4      &?&      667.14  t     ,«
   5-     3-'Q *£3     tt^S^ •*• 7^-       * 50
   6     3.343     1»S6.34     S.79
,   7-    11M     1169.00     «.74
   8      829      943.86       .26
   *      478      «62..Sa       .SI
  10      181      141.69       .00
  11       4S       4S,59       .00
                13.70>
 Figure 7.1
                                               16

-------
                      8.  THE BETA-BINOMIAL DISTRIBUTION


8.1  History and Definition

     If the sampling occurs from a binomial distribution where the parameter
itself is the outcome of a beta distribution, then the resulting distribution
is a beta-binomial distribution.  This distribution was described by Skellam
[35] and can be written
=j] - (J)
     Pr[x=j] -         a           , J-0.1.2.....I                (8.1)


where B(a,g) is the beta function.

Equivalently
                       j!                  KJT        '


where C -     j       .                                             (8.2)



8.2  Calling Sequence

     The beta-binomial distribution routine is called by


          Col.    8      151        25        35
I          Col.    8|
          BETBINOM)
N  is the parameter in (8.1), read in as an integer, and if not specified
is set to K, the value of the largest count.  A and B are the values of a
and 3 to be fitted.  If either or both are not given, they are estimated
from the data.


8.3  Maximum Likelihood Estimates

     The estimates of  a  and  6  are calculated using a modified Gauss-
Newton method.  The initial estimates are computed from the first two
factorial moments, RI and R2:
                                      17

-------
           N
     RI =  E  1Vn                                            (.8.3)
           i=0    ]


           N
     R2 =  E  i(i-Df1-/(nRi).                                  (8.4)
           i=l         1

 The  initial estimates are

     a = (RiR2-(N-l)Ri)/((N-l)R1-NR2)                          (8.5)


     3 = a(N/R!-l).                                            (8.6)

 The  Gauss-Newton iteration scheme requires the elements of the first
 partial derivatives:

     dlj = *(a+3)-*(a)-hp(a+1)                                  (8.7)

                              )                                (8.8)
where if/  is the derivative of the log-gamma function (digamma function) and
a  and   e are evaluated using their previous estimates.  The first partial
derivatives are

           N
     DI  =  E dlM>     J=1»2    .                               (8.9)
      J   i=Q  Ji

The matrix of cross products is
The iteration scheme is
                                                               (8.11)
The estimates are constrained to be positive.  If equation (8.11) does not
improve the likelihood, the correction is multiplied by 1/2 until the
likelihood is improved.  The iteration scheme is continued until the change
in  a  and  3  is less than .00001.
                                      18

-------
8.4  Test for Goodness-of-Fit

     A standard goodness-of-fit test is computed for the observed
frequencies:
X2-
           K
'i  "  Ei)2/Ei' where  Ei = nPr[x=i]
                                                               (8.12)
and the cells are collapsed from the left and right until  the  end  cells
each have an expected value of 5.  The degrees of freedom  will  be  the
number of cells after collapsing minus three.  If  a  or   e  are specified
then one degree of freedom will be added for each parameter  specified.
8.5  Examples

     Haseman and Scares [16] fit a beta-binomial  distribution to fetal
mortality of mice.  The  N  of the distribution  is  the  number of implants.
Some 524 CF1S mice were tested, where  N  ranged  from 1  to 20.  Figures
8.1 and 8.2 give the fits for N's of 12 and 13,  both of which give adequate
fits as measured by a chi-square test.

Input for Haseman and Scares data:
m^i^Kiiiiji,
/litiiicii ::;,;li-ii:;
                            •;::;.:. ;'•: ,'•••:
                            ;M
                                  II
                                           lllllli
                                         Itllll
11
                                    19

-------
Outputs:

HASI-HAN AW SHARES SAT* K» N - 12

FIT TO A 6ETA~8JB*M«At. 0ISTRI6UTION
TOTAL SAHPtE SHE »    a*» K *  J*

ESTIHATBO VALUE OF AM1** *      .73841
6STSHATE8 ¥AU« Of BET* *     $,6626*
           COVA8IANCE HATRIX Of ESTIf«T«S
        .t322g+«00   ,11538*001

   CHI-SQU*RE GOODNESS OF rrr TEST

VALUE  OBSIRV60  CXPECTB0 CHI-SWA

  0       S3       SB. 31       ,15
S3
24
W

 4
           ft
           0
           I
LOS-LlKttlHOOO

 Figure 8.1
  4
  S
  6
  7
 19

 U
                               ,57
                    2.57
                    1*47
                   133.2663
                                                        AHB SOARI3 C*tA Fd» N * 13

                                                FIT TO A BETA-HBXNQraAI. BISTRI8UTIOH
                                                TOTAL SAMPLE SIZE  »   92» H

                                                         VALOE OF ALPHA »
                                                E5T3HATEO VALUE Of BETA
                                                                             10,53389
                                                ASYMPTOTIC
                                                          ALPHA
                                                ALPHA
                                                SETA
                                                                     MATRIX OF ESTIHATES
                                                                       BETA
                                                                     ,15081+001
                                                   O«-SWA»C 6000NESS Of FIT TEST
X

f
                                                  ?
                                                  8
39


 4

 2

 ft

 9
                                                                  3».9>7
                                                                  13.60
                                                                   7.S4
                                                                    .36
                                                                              .26
                                                                   2, P = ,6140
                                                Figure  8.2
                                             20

-------
                         9.  THE POISSON DISTRIBUTION
9.1  History and Definition
     In 1837 Poisson derived the distribution which has since been given his
name.  The Poisson distribution takes on the values 0, 1, 2,....where
     Pr[x=j] = e~V/j!  for x>0.                                 (9.1)
It is assumed that the distribution is sampled  n  times so that the outcome
i  occurs f. times.  Thus the f.'s sum to  n.
9.2  Calling Sequence
     The Poisson distribution routine is called by
I     Col.    81              25 I
     POISSON I             |

where L (optional) is a specified value of x to be fitted and tested for
goodness-of-fit.  If not specified, x is estimated from the data and then
fitted.
9.3  Maximum Likelihood Estimate
     The maximum likelihood estimate for x is
     X = £ if./n                                                (9.2)
         i=0   1
This estimate is also the minimum,variance unbiased estimate.  The variance
of this estimate is estimated by x/n.
9.4  Tests for Goodness-of-Fit
     A standard goodness-of-fit test is computed for the observed
frequencies:
     X2 = £  (V^WEj.                                        (9.3)
                                     21

-------
where EI = n Pr[x=i], and the value of x is taken to be x unless specified
by L.  The cells are collapsed from the left and right until the end cells
each have an expected value of 5.  The degrees of freedom will be the
number of cells after collapsing minus two.

     A second goodness-of-fit test is commonly known as the homogeneity
test.  It is a test of the variance of the sample, and is discussed by
Potthoff and Whittinghill [31].  If  X  is estimated, then

           K
     X2 =  £  f-i2/x - nx                                         (9.4)
          i=0  1

is approximately  x2  with n-1 degrees of freedom.  If  X  is specified, the
quantity

           K              K
     X2 =  £  f,i2/x - 2  £  f,1/n + nx                           (9.5)
          i=0  n         i=0  n

is approximately  x2  with  n  degrees of freedom.

9.5  An Example

     Mutagenicity testing using the Ames test [2] has recently received a
great deal of interest.  The test consists of counts of revertants  per plate
for each dose of a given compound.  The following example is the result of
196 tests at a zero dose (controls) for the strain TA1537 without metabolic
activation.  The plates were all  made on the same day.   After a 48-hour
incubation period revertants per plate were counted manually.   The  fit is
adequate as measured by both the variance test and the standard chi-square
goodness-of-fit test.

Input for TA1537 Ames test data:

           COUNTS FOR STRAIN TA15

           1  3  4  6 18 20 23 19 18 22 14 19 11  f  4  I  3  $
                                     22

-------
 Output:

 REVERTANT COUNTS FOR STRAIN TA1537

 FIT TO A PQISSON DISTRIBUTION

 PfXI * EXP

 TOTAL SAHFti $121 *
 tSTJHAT£» VALUi OF IAHB8A *  10^082
 VARIANCE OF  LAHBDA *   .55614-061
 VARIANCE TEST *  2«5»8«, P » .

       -S«M«E «30UHESS OF FIT TEST

        OBSERVES  EXPECTED  CHI-SWARE
   X       0         ,«4
   2       6         .81
   5       1         .7®
   4       3        2.18
   *       4        4.62       .61
   *       *        «.
-------
                      10.  TRUNCATED POISSON DISTRIBUTION
 10.1   History and Definition

       The  truncated Poisson distribution occurs when sampling from a
 Poisson  distribution where the "0" outcome cannot be observed:
                     -f

      Pr[x=j] = e"V/(j!(l-e'A)), j = 1,2,3,...                   (10.1)

 The distribution was described as early as 1926 by McKendrick [23].  The
 distribution has also been referred to as the positive Poisson distribution.
 It is assumed that the distribution is sampled  n  times and that the
 outcome  i  occurs f. times.  Thus the f.'s (i = 1, 2, 3,...) sum to  n.

 10.2   Calling Sequence

       The  truncated Poisson distribution is called by

I      Col.    81               2Bj
      TPOISSONl              |


 where L   (optional) is a specified value of  x  to be fitted and tested for
 goodness-of-fit.  If not specified,  x  is estimated from the data, and then
 fitted.  Note that the main program requires that the number of "0" counts
 be read  in, even though it is ignored by this routine.

 10.3   Maximum Likelihood Estimates

       The estimate of  x  is calculated using the MIP (or EM alogirthm).
 The missing information in this case is f0 (the number of O's).  The initial
 estimate was provided by Irwin [18] who ascribed it to McKendrick [23].


      f0 =  (.E  i fi)2/(EKl-l)f1) - n                          (10.2)


The initial estimate of  x  then becomes


                                                                  (10.3)
                                     24
-------
  The iteration scheme is to reestimate f
                                     o
                                                             (10.4)
 where x  is the arithmetic mean of the observed counts.

 10.4  Test for Goodness-of-Fit


                         °f f1t test is ""P"** *>r the observed
                      I2/E., where E. = nPr[x=i]                (10>6)

 10.5  An  Example
eaas and^n^^i^-^J121 investi9ated the distributions of gall-fly
?935 and llll   ill™, *!! Produc";n of flowerheads of black knapweedin

         ™

                     iSTB? s.^ffhos-^o ss riaittTA
   each year is  -2(.-443.1378 + 276.5876 + 166.5481)  = .004.  This 5a ue
is approxnmately distributed as a chi-square with one degree of freedom
indicating that  there was  no difference in the two years?        reea01",

Input for Varley data:
                »'«-•
                                 25
-------
 Outputs:
¥A8LE1f'S SJM.««-F».y E6G COUNTS FRON 1955
PIT TO A tRUNCATEO POISSOH DISTRIBUTION
TOTAL SAMPLE SIZE «   1*8
                                                 ¥ARLEY*S 6ALL~FLf 186 C0UBTS f»OH 1*36
                                                 f IT TO A TSUNCATEO POISSON WSTjnSUTION
                                                       SAWUE
f STOOlf 0 VALUE OF LATOOA
ASWP. VAR, DP UrtBOA s
CHI-SQUARE 6000NE5S OF
VALUE
1
2
3
4
5

1>
g
9
10
11
lH
13*
OBSERVED
29
38
36
£3
8

f
2
1
0
0
j_
0
gxwEcreo
86, 00
36.97
35.06
24.93
14*18
6.73
2.73

.31
.69
.02
.01
.00
a> e.8446
.2196«-001
FJT TtST
CHI-S»!*,  *
UJG-LlXEtlHOO) *   -3,64*5431
                                                  Figure 10.2
                                               26
-------
                   11.  MIXTURE OF TWO POISSON DISTRIBUTIONS


11.1 History and Definition

     If samples are taken randomly from two different Poisson distributions
the result is a mixture of two Poisson distributions.  If the
probability of sampling the first Poisson distribution (with parameter
Xi) is a, the resulting distribution is


     Pr[x=j] = a  e'XlX!j/jl + (1-a) e"X2x2/j!                 (11.1)


             = agi(j) + (l-a)g2(j)

This distribution was described by Schilling [34] and moment estimators were
given.  The estimation of parameters from mixtures of distributions is
difficult by any method, and usually requires large sample sizes.

11.2  Calling Sequence

     The Poisson mixture routine is called by

     • Col.     8.               251        351        45
I     Col.     8.               251         351

     POISSMIXI             I
where L1} L2 and A are the values of Als X2, and  a  to be fitted.  If any
of the parameters are not specified, they will be estimated from the data.

11.3  Maximum Likelihood Estimates

     Arbitrary initial estimates are first calculated:


     x\ = (3/4)x  and A2 = (4/3)x                              (11.2)

where x is given by equation  (9.2).  The (MIP or EM) iteration scheme has
been given by Hasselblad [17].
                                     27
-------
     ••>      K
     AI =   E «gi(1)f4l/[(agi(1)+(l-i)52(1))n]                   (11-3)
            i=0         1

            K
     A2 =   E (l-a)g2(i)f,i/[(ag1(i)+(l-a)g2(1))n]               (11.4)
            i=0             ]


and

     ~      I/
     «B    E  agi(1)f4/[(aJi(1)+(l-a)52(1))n]                   01.5)
            1=0          n


where gi(i) and g2(i)  are  in equation (11.1), with the parameters replaced
by their estimates.  The estimated asymptotic covariance matrix is calculated
from the second partial derivatives of the likelihood function.

11.4  Test  for Goodness-of-Fit

     A  standard goodness-of-fit test is computed for the observed
frequencies:
      o     K
     X  =  E  (f-  -  E.)2/E., where E. = nPr[x=i]                  (11.6)
           i=0   I!1         n

and the cells  are  collapsed from the left and right until the end cells
each have  an expected  value of 5.  The degrees of freedom will be the
number of  cells after  collapsing minus four.  If any of the parameters are
specified,  the degrees of  freedom will be adjusted.

11 .5  An Example

     Additional Ames test  data similar to that in section 9.5 was collected
by pooling  the controls from several experiments.  This data was  fitted
to a mixture of two  Poisson distributions, since the variance was too  large
to fit a single Poisson distribution.  Although it is hard to justify  this
biologically,  the  mixture  does provide an adequate fit as measured by  the
standard chi-square  goodness-of-fit test.  The results are in Figure 11.1.

Input for  TA1537 Ames  test data:

€0(iNT» S
     1-  * » 21 %*- ££ 2$ 22 2* 28  * 17 U 11  4  5  3  4  9  0  I  9  2  0
  $  I  I
FOISSHIX
                                      28
-------
 Output:

COUHTS OF »EVEST*KTSJ TA1S37-

f IT TO A MIXTURE OF TWQ POISSON DISTRIBUTIONS

PCX! *
       i l-*U>HA >EXPt -LAMBDA2 )X**LAN&B*2 I/XT

TOTAL SAHPLf SIZE =   236
          *AU»E OF U*SBAi *    6,49579
ESTJHATED VALUE Qf UMBDA2 *   1 5,
EST331ATE8 VA106 Of ALPHA -
ASYMPTOTIC COVARIANCE MATRIX OF ESTIMATES
            LAW80AI      Ut«OA£      ALPHA
LAHBflAi   ,13070*000   .18212+000   .17625-001
LA«BOA2   -1S2124&00   ,6«U*000
   CHI-SWARE ©OOTNESS OF FIT TEST
                '   ~674.6d29

Figure  11.1
   D        9         ,85
   1        I        1.60
   «        *        5.*9       .06
   3       13       11,27       .26
   4       «       16.59       ,37
   S       2*       Z<*-12       .00
   6       £2       26.61       .60
   7       83       «5.6*       »27
   8       22       28*49       .01
   4       *9       ?8,49       .12
  10       20       1&.09      1.60
  11        6       12. 6t      3.-V9
  i&       1?       10,*7      3.32
  13       11        9.70       .!&
  1*       11        S,S8       .7*
  1^        4        7,22      l.<*4
  16        S        5.B7       .13
  17        S        *,S4       .52
  1»        *        3,3<>       .13
  19        &        2.33       .22
  8«        0        1.S4
  21        1         ,9«
 26        9         ,05
 27        1         ,03
 8*        1         *fll
 2**       0         .01
                                               29
-------
                      12.  NEGATIVE BINOMIAL DISTRIBUTION
12.1  History and Definition

     The negative binomial distribution was discussed by Pascal [99] and
Fermat.  The earliest known publication was by Montmort [24] in 1714.
An excellent review of the distribution was given by Bartko [4].  The
distribution can be written in several forms, but we will use the following
one
     Pr[x-J]«          P(Hp)'          J-0,1.2....          (12.1)


where p>0 and N>0.  For the special case where  N  is an integer, the
distribution is sometimes known as the Pascal distribution.  Unlike the
binomial distribution, both  N  and  p  are parameters to be estimated.

     For the case of N = 1 , the distribution is known as the geometric
distribution.  This may be fitted by specifying N = 1 on the control card.

12.2   Calling Sequence

     The negative binomial distribution routine is called by

     .Col.     8.               25        35
     .o.      .                           i
     INEGBINOMI              |       |
where  P  and  N  are the (optional) values of the parameters  p  and  N
to be fitted.  If either or both are left blank, the parameters will be
estimated from the data.

12.3  Maximum Likelihood Estimates

     Initial estimates can be obtained by the method of moments:


     p = s2/x - 1                                              (12.2)

     N = P/(s2-x2)                                            (12.3)
                                     30
-------
where x is the sample mean and s2 is the sample variance.  Maximum
likelihood estimates can be obtained by rearranging the first partial
derivatives of the log-likelihood function,
log(H-x/N)
                - £ S. = 0,                                     (12.4)
            A   -      i
where S. =  £  (N+j-1)" f •   .
       1   j=i            J
This equation involving just  N  can be solved with a Newton-Raphson scheme.
Denote the left hand side of (12.2) by  F; then
                           ESI                                   (12.5)




where  S]  =  L (N+j-1 T2^.
             J ~ *

The iteration scheme is

     N = N -F/llj-   .                                              (12.6)


This procedure is continued until the change in  N  is less  than  .00001.
If this does not occur within 100 iterations, the routine indicates  that
the procedure did not converge.  The estimate for  p  is then

     p = x/N   .                                                  (12.7)

The equations must be modified slightly if  p  is specified.   If   N
is specified, then just equation (12.7) is used.
                                  *S       A
     The asymptotic variances of  p  and  N  are  estimated from the  inverse
of the second partial derivatives of the log-likelihood function,


          . _n(Np2_2px-x)/(p(H-p)),
                                                                  (12.9)



                                                                  (12.10)




                                      31

-------
      The  two  integer  values of  N  closest to  N^ are tried for the  integer
 (Pascal)  solution  to  the  problem.  The  integer  N  giving the  largest
 log-likelihood  is  given as an alternative solution.

 12.4   Test  for  Goodness-of-Fit

      A standard goodness-of-fit test is computed for the observed
 frequencies:


            K
      X2 = E(f, --E.)2/E., where E. = nPr[x=i]                (12.11)
          i=l   !    i     i         i

 and the cells are  collapsed from the left and right until the end cells
 each  have an  expected value of 5.  The degrees of freedom will be the
 number of cells after collapsing minus three.  If  p  or  N  are specified,
 then  one  degree of freedom will be added for each parameter specified.

 12.5   Example

      Love,  et al.  [21] measured the incidence of acute respiratory disease
 in children.  Counts of illnesses in children aged 6 to 12 from two
 communities in  New York are shown in Figures 12.1 and 12.2.   Both fits are
 adequate  as measured by a standard chi-square goodness-of-fit test.  A fit
 of the two  communities combined gives a log-likelihood of -598.2169.  The
 likelihood  ratio test for similar distributions in each community is
 -2(-598.2169 +  288.8186 + 304.1975) - 10.4.  This value is approximately
 distributed as  a chi-square with 2 degrees of freedom, and therefore
 indicates that  the distribution of illnesses is not similar in the two
 communities.

 Input for Love's illness data:
UWF'i tWER KESttftATftR* ttUtt»$»
                              "
CSI5J '"      '           ,  '
          "  25 '   *   ' f   •; I
                .         .   . t- ^^
                                     32
-------
Outputs:
u^siusi^.^                             u^E's 1^^

 ::'' to ^it^SlitW                            ?it;3&::*.^^                       ''  ."

                  ^^B^^I^^pS^^s^^.

                                               ^ilWi^^
                         •" |^^|p^|s:l;ill
                                §i|:«l|is5ii! f ;f


                                                            .'*.;•:
                                                            ':• J. *
 Figure 12.1
Figure 12.2
                                          33
-------
                13.  THE TRUNCATED NEGATIVE BINOMIAL DISTRIBUTION
 13.1   History  and Definition

      The  truncated negative binomial distribution occurs when sampling from
 a  negative  binomial distribution where the "0" outcome cannot be observed:
      P  r—| =  /IN+J-H DJn+DrlN+J'/M-n+DrlM     i-i  2 3       m n
      rFLA JJ    \ N 1  I P \ 'TP/      /U V'TP/  /     J l»t,«5»i«t    IIJ'U
where  p>0 and N>0.  Examples of this distribution were given by
Sampford [33] and Brass [7].  The article by Sampford gives the maximum
likelihood equations.  Wyshak [37] actually gives a computer subroutine
to  estimate the parameters  N  and  w, where  w = l/(l+p).   The
subroutine also calculates the estimated asymptotic variances of ^N
and w.  Our program modified these equations to obtain  N   and  p,
along  with their estimated variances.

13.2   Calling Sequence

     The truncated negative binomial distribution is called by

I     Col.    8 I               251        351
     TNEGBINM!              l       |

where   P  and  N  are the (optional) values of the parameters  p  and  N
to  be  fitted.  If either or both are left blank, the parameters will  be
estimated from the data.  Note that the main program requires that the
number  of "0" counts be read in, even though it is ignored  by this routine.

13.3  Maximum Likelihood Estimates

     Wyshak [37] gives a FORTRAN algorithm for obtaining maximum
likelihood estimates using a Newton-Raphson method.  For simplicity,  the
parameter  p  is replaced by  w  where w = l/(l+p).  Initial estimates
for  w  and  N  were given by Brass [7]:
and
     w = x/s2(l-f0/n),                                            (13.2)
     N = (wx-f0/n)/(l-w)                                          (13.3)


                                     34
-------
where  x  is the sample mean and  s2  is  the  sample  variance.  The first and
second partial derivatives of the log-likelihood were  given by Wyshak
and are not repeated here.  The covariance  terms for   p  are obtained from
those of  w  using standard calculus  formulas.

     The iteration scheme is continued  until the first derivatives with
respect to  w  and  N  are less than  .0001.  If this does not occur within
100 iterations, the routine indicates that  the procedure did not converge
This routine can fail to converge if  the variance  of the data is too small.
In such cases, the truncated Poisson  distribution  may provide a good fit
to the data.

13.4  Test for Goodness-of-Fit

     A standard goodness-of-fit test  is computed for the observed
frequencies:

           K
     X2 = E(f, - E.)2/E., where E. = nPr[x=i]                   (13.4)
          i=l  "'I         i

and the cells are collapsed from the left and right until  the  end cells
each have an expected value of 5.   The degrees of freedom will  be the
number of cells after collapsing minus three.  If  p  or  N are  specified,
then one degree of freedom will  be added for each parameter specified.

13.5  Example

     Sampford [33] gives counts of chromosome breaks.   Figure  13.1  shows  the
fit of a truncated negative binomial distribution to this  data.   The standard
chi-square goodness-of-fit test shows that the data are adequately  fitted by
this distribution.   The data can also be described  by a logarithm distribu-
tion (see section 14.5).

Input for Sampford's chromosome break data:

    F8KB*
   -3-
       ,          '                   ^           '
      - it    *    4    3    a     i   »    2   i    t
                                     35
-------
Output:

SWFOKJ'S $RGHMS®381E BfciAKS

FIT TO A
TOTAL SAttPUe SIZE *    38

EStlWTEB VALOg OF * «     3.7394*
ESTIMATES VALUE OF H *      ,4934&

ASYMPTOTIC COVARIANCC IWTUJX Of

                        H
              BOOOMESS er fir its?
VAl«E  OBSERVED  IXPiCTtB
Figure  13.1
  1       11       10.80       .$$
  2        t>        fc.36       .Sif
  5        4        4.17       »frj
  4        5        ?,«7     3U5S
  5        0        S»«3     2.0$
  *        1        1,47       *«
  7        0        3..81?       —
  «        2         ,79
  9        1         .59
 18        8         ,44     ,  —
 II        I         ,33
 12        8         »:£&
 13        1         .19
           «         *W  r ~    —

           » "   3.6S> B.F* ^  3>- P a
                                             36
-------
                      14.  THE LOGARITHMIC DISTRIBUTION


14.1  History and Definition

      The logarithmic distribution (more formally referred to as the
logarithmic series distribution) is a single parameter distribution.  The
distribution takes on the values 1, 2, 3,..., where


      Pr[x=j] = -ex/(xlog(l-e)) for 0|

where  T  (optional) is a specified value of  9  to be fitted and tested
for goodness-of-fit.  If not specified,  e  is estimated from the data and
then fitted.  Note that the main program requires that the number of "0"
counts be read in, even though it is ignored by this routine.

14.3  Maximum Likelihood Estimates

      The estimate of  e  is calculated using a Newton-Raphson iteration
scheme.  The initial estimate comes from Birch [5]:

      9=1- (H-[(5/3-logx/16)(x-l)+2]logx)"1  .                     (14.2)

The Newton-Raphson iteration scheme is

      0= e - (I+e/((l-e)log(l-e)))(((l-6)2log2(l-e)/(log(l-6)+e))) . (14.3)
                                     37
-------
The iteration scheme is continued until the change in  e  is less than
.00001.  The asymptotic variance of  §  is estimated by

     Var(e) = e2/[nx2(l/(l-e)-x)]  .                            (14.4)

14.4  Test for Goodness-of-fit

      A standard goodness-of-fit test is computed for the observed
frequencies:


      X2  «  £  (VEJVE,.                                   (14-5)
            i=l    i  i    !
                                                           f\
where Ei = n Pr[x=i], and the value of  e  is taken to be  e unless
specified by  T.  The cells are collapsed from the left and right until  the
end cells each have an expected value of 5.  The degrees of f reedont- wi 1 1  be
the number of cells after collapsing  minus two.  If  e  is specified, an
additional degree of freedom will be  added.

14.5  Example

     The data of Sampford [33] given  in section 13.5 can also be fitted by
a logarithmic distribution.  The output is shown in Figure 14.1.

Input for Sampford 's chromosome break data:
if 14ISS-'    - ' *     * "*      ' ' '      '  ' * «
    '    U    I  '*\  I  "*'   l>* t   2
                                     38
-------
 Output:
SAMPFORO'S CHROHOSQffi BREAKS PER CtU. "•
                    .86;
              .;|

 Figure  14.1
                                            39
-------
                                15.  REFERENCES


 1.  Alldredge, J. R., and D. D.  Bolstad (1977).   GDIST:   A Computer Code for
     Analysis of Statistical Distributions of Physical  Data, United States
     Department of Commerce, NTIS TN33.071, 57 pp.

 2.  Ames, B. N., J. McCann, and E. Yamasaki (1975).   Methods for Detecting
     Carcinogens and Mutagens with the Salmonella/Mammalian-Microsome
     Mutagenicity Test, Mutation Research, 31, pp 347-364.

 3.  Barr, A. J., J. H. Goodnight, J.  P. Sail, and J.  T.  Helwig (1976).
     A User's Guide to SAS 76, SAS Institute, Incorporated, Raleigh, North
     Carolina, 330 pp.

 4.  Bartko, J. J. (1961).  The Negative Binomial  Distribution:   A Review
     of Properties and Applications, The Virginia Journal  of Science (New
     Series), 12, pp 18-37.

 5.  Birch, M. W. (1963).  An Algorithm for the Logarithmic Series
     Distribution, Biometrics, 19, pp 651-653.

 6.  Bouver, H., and R. E. Bargmann (1975).  Computational  Algorithms and
     Modules for the Evaluation of Statistical Distribution Functions,
     Univ. of Georgia, Dept. of Statistics, Athens, Georgia.

 7.  Brass, W. (1958).  Simplified Methods of Fitting  the  Truncated Negative
     Binomial Distribution, Biometrika, 45, pp 59-68.

 8.  Day, N. E. (1969).  Estimating the Components of  a Mixture of Normal
     Distributions, Biometrika, 56, 3, pp 463-484.

 9.  Dempster, A. D., N. M. Laird, and D. B. Rubin (1977).   Maximum
     Likelihood from Incomplete Data Via the EM Algorithm,  Journal of the
     Royal Statistical Society B, 38,  pp 1-22.

10.  Dixon, W. J. (1975).  Biomedical  Medical Programs  (BMD-P), University
     of California Press, Berkeley, California, 792 pp.

11.  Finney, D. J. (1949).  The Truncated Binomial Distribution, Annals  of
     Eugenics, London, England, 14, pp 319-328.

12.  Finney, D. J., and G. C. Varley (1955).  An Example of the Truncated
     Poisson Distribution, Biometrics, 11, pp 387-394.
                                      40
-------
13.  Fisher, R. A. (1958).  Statistical Methods for Research Workers,
     Thirteenth Revised Edition, Hafner Publishing Company, Incorporated,
     New York, New York, 356 pp.

14.  Fisher, R. A., A. S. Corbet, and  C. B. Williams (1943).  The Relation
     Between the  Number of Species and the Number of Individuals in a
     Random Sample of an Animal Population, Journal of Animal Ecology, 12,
     pp 42-57.

15.  Hartley,  H.  0. (1961).  The Modified Gauss-Newton Method for Fitting
     of Non-Linear Regression Functions by Least Squares, Technometrics,
     3,2, pp 269-280.

16.  Haseman,  J.  K., and E. R. Scares  (1976).  The Distribution of Fetal
     Deaths in Control Mice and Its Implications on Statistical Tests for
     Dominant  Lethal Effects, Mutation Research, 41, pp 277-288.

17.  Hasselblad,  V. (1969).  Estimation of Finite Mixtures of Distributions
     from the  Exponential Family, Journal of the American Statistical
     Association, 64, pp 1459-1571.

18.  Irwin, J. 0. (1959).  On the Estimation of the Mean of a Ppisson
     Distribution with the Zero Class  Missing, Biometrics, 15, pp 324-326.

19.  Johnson,  N.  I., and S. Kotz (1969).  Discrete Distributions, John
     Wiley and Sons, Incorporated, Somerset, New Jersey, 328 pp.

20.  Kendall, M.  G., and A. Stuart (1961).  The Advanced Theory of
     Statistics,  Volume 2, Hafner Publishing Company, New York, New York,
     676 pp.

21.  Love, G.  J., S. Lan, C. M. Shy, and R. J. Struba (1980).   The Incidence
     and Severity of Acute Respiratory Illness in Families Exposed to
     Different Levels of Air Pollution, New York Metropolitan Area, 1971-
     1972, Dept.  of Epidemiology, School of Public Health, Univ. of N.  C.,
     Chapel Hill, North Carolina.

22.  Mantel, N. (1951).  Evaluation of a Class of Diagnostic Tests,
     Biometrics,  7, pp 240-246.

23.  McKendrick, A. G. (1926).  Applications of Mathematics to Medical
     Problems, Proceedings of Edinburgh Mathematical Society, 44, pp 98-130.

24.  Montmort, P. R.  (1714).  Essai D'Analyse Sur Les Jeux de Hasards,
     Paris, France.

25.  Morgan, C., W. Nelson, and P. Caporal (1979).  General Maximum
     Likelihood Fitting with STATPAC,  General Electric Technical Report.
                                      41
-------
26. 'Nelson, W. C., and H. A. David (1967).  The Logarithmic Distribution:
     A Review, Virginia Journal of Science, 18, pp 95-102.

27.  Nie, N. H., C. H. Hull, J. G. Jenkins, K. Steinbrenner, and D. H.
     Brent  (1975).  Statistical Package for the Social Sciences, Second
     Edition, McGraw Hill Book Company, New York, New York, 673 pp.

28.  Orchard, T., and M. A. Woodbury (1972).  A Missing Information Principle:
     Theory and Applications, Proceedings of the 6th Berkeley Symposium on
     Mathematical Statistics and Probability, 1, pp 697-715.

29.  Pascal, B. (1679).  Varia Opera Mathematica D. Petri de Fermat
     (Tolossae).

30.  Potthoff, R. F., and M. Whittinghill (1966).  Testing for Homogeneity,
     I.  The binomial and multinomial  distributions, Biometrika, 53,
     pp 167-182.

31.  Potthoff, R. F., and M. Whittinghill (1966).  Testing for Homogeneity,
     II.  The Poisson distribution, Biometrika, 53, pp 183-190.

32.  Rider, P. R. (1961).  Estimating the Parameters of Mixed Poisson,
     Binomial, and Weibull Distributions by the Method of Moments,  Bulletin
     of International Statistical Institute, Vol. 39, pp 225-232.

33.  Sampford, M. R. (1955).  The Truncated Negative Binomial Distribution,
     Biometrika, 42, pp 58-69.

34.  Schilling, W. (1947).  A Frequency Distribution Represented on the
     Sum of Two Poisson Distributions, Journal of American Statistical
     Association, 42, pp 407-424.

35.  Skellam, J.  G.  (1948).  A Probability Distribution Derived from the
     Binomial Distribution by Regarding the Probability of Success  as
     Variable Between the Sets of Trials, Journal of the Royal  Statistical
     Society, Series B, London, 10, pp 257-261.

36.  Sokal, R. R.  and F. J. Rohlf (1973).  Introduction to Biostatistics,
     Freeman and Company, San Francisco, California, pp 66-67.

37.  Wyshak, G. (1974).  A Program for Estimating the Parameters of the
     Truncated Negative Binomial  Distribution, Journal of Applied
     Statistics,  23, 1, pp 87-91.
                                     42
-------
                                  APPENDIX A.

                                     INDEX
Ames test data, 22, 28
Bernoulli, James, 8
Beta-binomial distribution, 17
Beta distribution, 3, 17
Binomial distribution, 1, 8, 15, 17, 30
Chromosome break data, 35, 38
Constant distribution, 5
Cumulative frequency distribution, 5
Dice data, Wei don's, 9
Digamma function, 18
Discrete rectangular distribution, 5
Discrete uniform distribution, 5
EM algorithm, 11, 14, 24, 27
Exponential distribution, 1
Frequency distribution, 5
Gall-fly eggs, 12, 25
Gamma distribution, 1
Gauss-Newton method, 3, 17
Geometric distribution, 1, 30
Goodness of fit, 1, 6, 9, 12, 15, 19,  21,  25,  28, 32, 35, 38
Homogeneity test 9, 22
Illness data, 32
Likelihood ratio test, 3, 6, 9, 15, 25, 32
Logarithmic distribution, 35, 37
Lognormal distribution, 1
Maximum likelihood, 3
Mendelian factors, 6
Mice data, 19
Missing information principle, 3, 11,  14,  24,  27
Mixture of two binomials, 14
Mixture of two Poissons, 27
Moment estimates, 14, 27, 30
Moments, sample, 5
Negative binomial distribution, 3, 30, 34
Newton-Raphson method, 3, 31, 34, 37
Normal distribution, 1
Pascal distribution, 30, 32
Pearson curves, 1
                                     43
-------
Poisson distribution, 21, 28
Positive binomial distribution,  11
Positive Poisson distribution,  24
Primula counts, 6
Rectangular distribution, discrete,  5
Sample moments, 5
Sex distribution data, 15
Singularities, 3
Sokal and Rohlf s data, 15
Truncated binomial distribution,  11
Truncated negative binomial  34,  37
Truncated Poisson distribution,  12,  24
Uniform distribution, 1
Uniform distribution, discrete,  5
Variance test, 22
Varley's data, 12, 25
Weibul, 1
Weidon's dice data, 9
                                     44
-------
APPENDIX B.  FLOW CHART OF THE DISCRETE  PROGRAM















DISCRETE ,
(main)













/

>















' >•















CONST


BINOML


TBINOM


BINOMX


POISSN


RFTRTN
Dt 1 Din

TPOISN


POISMX


NEGBIN


TNFGBN


LOGRTH





— v
J


i




— V


— s,



.















/*-!




i



/^


l/*-
















GnFTT

t
rurcn

t
run DM






> PSI


	 	 ^ 	 	 ALGMA














                        45
-------
                          APPENDIX C.  ERROR MESSAGES
INVALID COMMAND:
     The name of the distribution fitted does not match any names in the
     program.

INVALID PARAMETER SPECIFICATION:

     The values for one or more of the parameters (XP's) are outside the
     allowable limits.  These limits are given in the definition section
     of each distribution.

PROCEDURE DID NOT CONVERGE:

     This message can occur in any of the routines which require iterative
     methods to obtain the estimates.  There is no easy way to correct the
     problem, although the estimates given by the program still may be
     useful.  We would appreciate receiving such examples so that we can
     improve our procedures in the future.

INSUFFICIENT EXPECTED FREQUENCIES:

     The goodness-of-fit routine (GDFIT) did not find sufficient expected
     frequencies to perform a goodness-of-fit test.

UNEXPECTED END OF FILE ENCOUNTERED:

     An end-of-file was encountered while reading a card other than a
     title card.  Check sequence of input cards.
                                     46
-------
                                  APPENDIX D.

                            LISTING OF THE PROGRAMS


C     DISCRETE DISTRIBUTION FITTING PACKAGE            REVISED  9-24-80
C          PROGRAMMERS'- ANDERSON, HASSELBLAD
C
C     MAIN PROGRAM
C
C     THE FOLLOWING ARE VARIABLES IN COMMON:
C     NF IS AN ARRAY OF OBSERVED COUNTS
C     E IS AN ARRAY OF EXPECTED COUNTS
C     XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C     K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED  FREQUENCY.
C     M IS THE LENGTH OF THE E ARRAY
C     INT IS AN OPTIONAL INPUT INTEGER.
C     NP IS TH NUMBER OF PARAMETERS EXTIMATED
C     IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR  MOST  DISTRIBUTIONS}
C        IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C     NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C
      DIMENSION FMTC20)
      COMMON TITLE(20),NF(200),E(200),XPC4),K,M,INT,NP,IF,NTOT
      CHARACTERX8 NC,NCON(12)
      DATA NCON/'CONSTANT','BINOMIAL','TBINOMAL','BINOMIX ','BETBINOM',
     1 'POISSON ','TPOISSON','POISSMIX','NEGBINOM','TNEGBINM1,'LOGARITH'
     2 ,'        V
      DATA ND/12/
 2    CONTINUE
      NTOT = 0
      DO 3 I = 1, 200
      NF(I) = 0
 3    CONTINUE
C
C     NC IS THE NAME OF THE DISTRIBUTION  TO BE FITTED
C     READ IN THE CONTROL CARDS
C
      READ (5,5,END=910) TITLE
 5    FORMAT(20A4)
      READ C5,10,END=920) K
 10   FORMATCI5)
      IFCK.GE.200) GO TO 930
      KP = K + 1
      READ C5,5,END=920) FMT
      READ (5,FMT,END=920)  CNF(I),1 = 1,KP)
      DO 4 I = 1, KP
         NTOT = NTOT + NF(I)
 4    CONTINUE
 6    CONTINUE
      READ C5,15,END=920) NC, INT, XP
 15   FORMAT(A8,I7,4F10.0)
      DO 7 I = 1, ND
         IF(NC.NE.NCONCI)) GO TO 7
         GO TO (11,12,13,14,16,17,18,19,21,22,23,2)  I
 7    CONTINUE
      WRITE(6,55) NC, K,XP
 55   FORMAT('0',10X,'INVALID COMMAND: »,A8,I5,4F10.4)
      GO TO 6
 11   CONTINUE
      CALL CONST


                                     47
-------
      GO  TO  6
 12    CONTINUE
      CALL BINOML
      GO  TO  6
 13    CONTINUE
      CALL TBINOM
      GO  TO  6
 14    CONTINUE
      CALL BINOMX
      GO  TO  6
 16    CONTINUE
      CALL BETBIN
      GO  TO  6
 17    CONTINUE
      CALL POISSN
      GO  TO  6
 18    CONTINUE
      CALL TPOISN
      GO  TO  6
 19    CONTINUE
      CALL POISMX
      GO  TO  6
 21,    CONTINUE
      CALL NEGBIN
      GO  TO  6
 22    CONTINUE
      CALL TNEGBN
      GO  TO  6
 23    CONTINUE                           ,
      CALL LOGRTH
      GO  TO  6
 910  CONTINUE
      STOP
 920  CONTINUE
      WRITE  (6,925)
 925  FORMAT('1',10X,'UNEXPECTED END-OF-FILE ENCOUNTERED')
      STOP
 930  CONTINUE
      WRITE  (6,935)  K
 935  FORMATC1',10X, 'VALUE OF K TOO  LARGE:',17)
      STOP
      END
C     CNORM                                            REVISED  5- 9-80
C        PROGRAMMER? HASSELBLAD
C     COMPUTES THE NORMAL PROBABILITY INTEGRAL FROM MINUS INFINITY TO X
C     REFERENCE: APPROX. FOR DIG. COMP. - HASTINGS, P. 169
      FUNCTION CNORM(X)
      DATA S2P,P,A1,A2,A3,A4,A5,S2/1.12837917 ,.3275911,.225836846,
     1  -.252128668,1.25969513,-1.28782245 , .94064607,.707106781/
      XN = !./(!. + PXABS(XXS2>)
      PHI = S2PXEXP(-XXX2/2.)
      PHIS=1.-XNX(A1+XNX(A2+XNX(A3+XNXCA4+XN*A5))))XPHI
      TERM = .5XPHIS
      IF(X.LT.O.) TERM = -TERM
      CNORM = 0.5 + TERM
      RETURN
      END

                                     48
-------
C     CONSTANT DISTRIBUTION FITTING ROUTINE INCLUDING SAMPLE MOMENTS
c          PROGRAMMERS: HASSELBLAD, ANDERSON           REVISED  5-  s-so
c
      SUBROUTINE CONST
      COMMON TITLE(20),NFC200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C
C     NF IS AN ARRAY OF OBSERVED COUNTS
C     E IS AN ARRAY OF EXPECTED COUNTS
C     XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C     K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY.
C     M IS THE LENGTH OF THE E ARRAY
C     INT IS AN OPTIONAL INPUT INTEGER.
C     NP IS THE NUMBER OF PARAMETERS ESTIMATED.
C     IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C        AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C     NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C     P IS THE ESTIMATED PARAMETER OF THE BINOMIAL DISTRIBUTION:  B(N,P)
C
      WRITE (6,10) TITLE
 10   FORMAT('1',10X,20A<»)
      M = K + 1
      TOT = NTOT
      NFL = 0
      PSUM = 0.
      DO 4 I = 1, 4
      Ed) = XP(I)
      IF(Ed).GT.O.) NFL = 1
 4    CONTINUE
      IF(NFL.EQ.O) GO TO 6
      IFCM.LE.4) GO TO 7
      READ (5,5) (Ed),1=5,M)
 5    FORMAT(SFIO.O)
      GO TO 7
 6    CONTINUE
C
C     ASSUME A UNIFORM DISTRIBUTION IF NOT SPECIFIED.
C
      DO 11 I = I, M
         Ed) = 1.
 11   CONTINUE
 7    CONTINUE

C     SUM UP EXPECTED FREQUENCIES SO THEY CAN BE NORMALIZED.
C
      DO 12 I = 1, M
         IFCECD.LT.O.) GO TO 900
         PSUM = PSUM + Ed)
 12   CONTINUE
      XLK = 0.

C     NORMALIZE EXPECTED FREQUENCIES AND COMPUTE LOG-LIKELIHOOD.
C
      DO 8 I = 1, M
         Ed) = E(I)/PSUM
         IF(E(I).GT.O.) XLK = XLK + NFd)XALOG(Ed))
         Ed) = Ed)*TOT
 8    CONTINUE
      IF = 1
      IF(Ed).EQ.O) IF = 2
      XM1 = 0.
      XM2 = 0.
      XM3 = 0.
      XM4 = 0.
      NP = INT
C
C     COMPUTE SAMPLE MOMENTS
C
      DO 2 I = 1. M
         X = I - 1
         XM1 = XM1 + NFd)*X

                                      49
-------
 2    CONTINUE
      XM1 = XM1/TOT
      DO 3 I = 1,  M
         X = I - 1 - XM1
         X2 = XXX
         X3 = X2XX
         X4 = X2XX2
         XM2 = XM2 + NF(I)*X2
         XM3 = XM3 + NF(I)XX3
         XM* = XM4 + NFd)xx4
 3    CONTINUE
      XM2 = XM2/TOT
      XM3 = XM3/TOT
      XM4 = XM4/TOT
C
C     WRITE OUT RESULTS
C
      WRITE (6,15) XM1
 15   FORMAT(1HO,10X,'SAMPLE MEAN =',5X,F14.6)
      WRITE (6,20) XM2
 20    FORMATUH ,10X,'SAMPLE VARIANCE =',F15.6)
      WRITE (6,25) XM3
 25   FORMATdH ,10X,'THIRD MOMENT =',4X,F14.6)
      WRITE (6,30) XM4
 30   FORMATdH ,10X,'FOURTH MOMENT =',3X,F14.6)
      WRITE (6,^0)
 40   FORMAT('0',10X,'VALUE  COUNT  PERCENT CUMULATIVE CUMULATIVE')
      WRITE (6,45)
 45   FORMATC ',37X,'COUNT',4X,'PERCENT')
      NCUM = 0
      CUMP = 0.
      DO 9 I = 1,  M
         J = I - 1
         PCT = NF(I)X100/TOT
         NCUM = NCUM + NF(I)
         CUMP = CUMP + PCT
         IF(E(I).GT.O..OR.NF(I).GT.O) WRITE (6,50)  J,NF(I),PCT,NCUM,CUMP
 50      FORMATC  ',10X,15,17,F9.2,I11,F11.2)
 9    CONTINUE
C
C     COMPUTE GOODNESS OF FIT
C
      CALL GDFIT
      WRITE (6,35) XLK
 35   FORMAT('0',10X,'LOG-LIKELIHOOD =',F12.4)
      RETURN
 900  CONTINUE
      WRITE (6,905) INT, (XP(I),I=1,4)
 905  FORMAT('0',10X,'INVALID PARAMETER SPECIFICATION'/11X,I7,4F11.5)
      IF(M.GT.4) WRITE (6,910) (Ed),1=5,M)
 910  FORMATC ', 10X,8F11.5)
      RETURN
      END
                                      50
-------
C     SUBROUTINE TO FIT A BINOMIAL DISTRIBUTION        REVISED  9-25-80
c        PROGRAMMERS:  HASSELBLAD, ANDERSON
c
      SUBROUTINE BINOML
      COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C
C     NF IS AN ARRAY OF OBSERVED COUNTS
C     E IS AN ARRAY OF EXPECTED COUNTS
C     XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C     K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY.
C     M IS THE LENGTH OF THE E ARRAY
C     INT IS AN OPTIONAL INPUT INTEGER.
C     NP IS THE NUMBER OF PARAMETERS ESTIMATED
C     IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C        AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C     NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C     P IS THE ESTIMATED PARAMETER OF THE BINOMIAL DISTRIBUTION:  B(N,P)
C
      IF = 1
      WRITE (6,30) TITLE
 30   FORMAT('1',10X,20A4)
      WRITE (6,5)
 5    FORMAT ('0',10X,'FIT TO A BINOMIAL DISTRIBUTION')
      WRITE (6,40)
 40   FORMAT('0',10X,'P(X,N) = N!P*xX(l-P)xx(N-X)/(X!
-------
C     SUBROUTINE TO FIT A TRUNCATED BINOMIAL DISTRIBUTION
c        PROGRAMMERS:  HASSELBLAD, ANDERSON             REVISED io-16-so
c
      SUBROUTINE TBINOM
      COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C
C     NF IS AN ARRAY OF OBSERVED COUNTS
C     E IS AN ARRAY OF EXPECTED COUNTS
C     XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C     K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY
C     M IS THE LENGTH OF THE E ARRAY
C     INT IS AN OPTIONAL INPUT INTEGER.
C     NP IS THE NUMBER OF PARAMETERS ESTIMATED
C     IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C        AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C     NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C     PT IS THE NEW ESTIMATE OF THE PARAMETER P
C     P IS THE OLD ESTIMATE OF THE PARAMETER P
C     ZN IS THE ESTIMATE OF THE UNOBSERVED CELL F(0)
C
      WRITE (6,30) TITLE      ;
 30   FORMAT('1',10X,20A4)
      WRITE (6,5)
 5    FORMAT('0',10X,'FIT TO A TRUNCATED BINOMIAL DISTRIBUTION')
C
C     CHECK FOR IMPROPER PARAMETER VALUES.
C
      IFCXPU).LT.O.O.OR.XP(l).GT.l.O) GO TO 900
      IF (INT.LT.O) GO TO 900
      KP = K + 1
      NP = 1
      IFCXPCD.GT.O.) NP = 0
      N = MAX(INT,K)
      M = MIN(KP+1,N-H)
      IF = 2
      CN = N
      NT = 0
      P = XPU)
      XM1 = 0.
      DO 2 I = 2, KP
         CI = I - 1
         NT = NT + NF(I)
         XM1 = XM1 + CIXNF(I)
 2    CONTINUE
      TOT = NT
      XM1 = XM1/TOT
C
C     COMPUTE THE INITIAL ESTIMATE OF P IF NOT SPECIFIED.
C
      IF(XP(1).GT.O.) GO TO 6
      P = (XM1 - NF(2)/TOT)/(CN - NF(2)/TOT)
      WRITE (6,10) NT, N                      ....
 10   FORMAT('0',10X,'TOTAL SAMPLE SIZE =',16,', N =',!<»)
C
C     BEGIN ITERATION PROCEDURE
C
      DO 4 IT = 1, 50
         QN = (1. - P)*XN
         ZN = QN*TOT/(1. - QN)
         PT = 0.
         DO 3 I = 2, KP

            PT = PT"+ICIXNF(D/((TOT + ZN)*CN)

         IF(ABS(PT-P).LE.0.00001) GO TO 7
         P = PT
 4    CONTINUE

 95   FORMAT? VflOX, 'PROCEDURE DID NOT CONVERGE')
 7    CONTINUE

                                       53
-------
      P = PT
 6    CONTINUE
      IF(XPC1).EQ.O.) WRITE (6,40) P
 40   FORMAT('0',10X,'ESTIMATED VALUE OF P =',F6.4>
      IF(XPd).GT.O.) WRITE (6,45) P
 45   FORMAT('0',10X,'SPECIFIED VALUE OF P =',F6.4)
      XL = 0.
      PT = P
      QT = 1. - PT
      ALP = ALOG(PT)
      ALQ = ALOG(QT)
      XDC = 2.
      XNC = CN - 1.
      CUM = 0.
      QN = 1. - QTXXN
      TERM = ALOG(CN) + ALP + XNCXALQ - ALOG(QN)
C
C     COMPUTE EXPECTED COUNTS, ASYMPTOTIC VARIANCE
C
      Ed) = 0.
      VAR = -TOTX(CNX(CN ~ l.)XQTXX(N " 2)/QN + (QTXX(N - 1)XCN)XX2/QN)
      DO 8 I = 2, M
         CI = I - 1.
         QI = QTXXd - 1)
         VAR = VAR + CIXNF(I)/PTXX2 + (CN - CI)XNF(I)/QTXX2
         Ed) = EXP(TERM)XTOT
         CUM = CUM + Ed)
         XL = XL + NF(I)XTERM
         IFd.NE.M) TERM = TERM + ALP - ALQ + ALOG(XNC) - ALOG(XDC)
         IF(XNC.GT.L) XNC = XNC - 1.
         XDC = XDC + 1.
 8    CONTINUE
      VAR = l./VAR
      IF(XP(1).EQ.O.) WRITE (6,50) VAR
 50   FORMATC ',10X,'ASYMP. VAR. OF P =',E13.5)
C
C     COMPUTE GOODNESS OF FIT TESTS.
C
      CALL GDFIT
      WRITE (6,20) XL
 20   FORMAT('0',10X,'LOG-LIKELIHOOD  =',F12.4)
      RETURN
 900  CONTINUE
      WRITE (6,905) INT, XPd)
 905  FORMATCOSIOX,'INVALID PARAMETER SPECIFICATION'/11X,I7,F11.5)
      RETURN
      END
                                      54
-------
C     SUBROUTINE TO FIT A BETA BINOMIAL DISTRIBUTION    REVISED 4-21-80
C          PROGRAMMER: HASSELBLAD
C
      SUBROUTINE BETBIN
      COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C
C     NF IS AN ARRAY OF OBSERVED COUNTS
C     E IS AN ARRAY OF EXPECTED COUNTS
C     XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C     M IS THE LENGTH OF THE E ARRAY
C     K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY
C     INT IS AN OPTIONAL INPUT INTEGER.
C     NP IS THE NUMBER OF PARAMETERS ESTIMATED
C     IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C        AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C     NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C     ALPHA AND BETA ARE THE PARAMETERS TO BE ESTIMATED (GIVEN AS  A
C          AND  B  IN THE FORMULA OF FORMAT 10)
C
      DATA EPS/0. 00001/
      WRITE (6,30) TITLE
 30   FQRMATC1',10X,20A4)
C
C     CHECK FOR IMPROPER PARAMETER VALUES
C
      IF(XP(1).LT.O..OR.XP(2).LT-0.) GO TO 900
      IF = 1
      N = MAX(INT,K)
      CN = N
      KP = K + 1
      M = MIN(KP+1,N-H)
      TOT = NTOT
      NP = 2
      IF(XP(1).GT.O.) NP = NP - 1
      IF(XP(2).GT.O.) NP = NP - 1
      WRITE (6,5)
 5    FORMATCO' ,10X,'FIT TO A BETA-BINOMIAL DISTRIBUTION')
      WRITE (6,10)
 10   FORMATCO', 10X,'P(X) = N! ( A+X-1) ! (B+N-X-1) ! (A-l) ! (B-l) !/V18X,
      WRITE  (6,15) NTOT, N
  15   FORMATCO', 10X, 'TOTAL SAMPLE SIZE =',16,', N ='
 C
 C    COMPUTE  FACTORIAL MOMENTS FOR INITIAL ESTIMATES
 C
      Rl  = 0.
      R2  = 0.
      DO  6 I = 1, KP
          CI  =  I - 1
          Rl  =  Rl + CIXNF(I)/TOT
          R2  =  R2 + CI*(CI - l.)XNF(I)/TOT
  6    CONTINUE
      R2  = R2/R1

 C    ESTIMATE THE INITIAL VALUES FOR ALPHA AND BETA

 °    ALPHA  =  (R1XR2  -  (CN -  l.)XRl)/((CN - l.)*Rl - CNXR2)
      IF(XP(1).GT.O.) ALPHA = XP(1)
      BETA = ALPHA*(CN/R1 ~ !•>
      IF(XP(2).GT.O.) BETA = XP(2)
      IF(XP(1).GT.O..AND.XP(2)-GT.O.) GO TO 16

 C    BEGIN  GAUSS-NEWTON ITERATION SCHEME
 C
      DO  12  IT = 1, 100
          ALBE  = ALPHA + BETA
          DELTA = 1.

 C        COMPUTE CURRENT LIKELIHOOD FUNCTION
 C

                                     55
-------
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
COMPUTE ELEMENTS OF FIRST PARTIAL DERIVATIVES
SUM THEIR CROSS PRODUCTS

XLK1 = TOTXCALGMACCN + 1.) + ALGMA(ALBE) - ALGMA(ALPHA)
 - ALGMA(BETA) - ALGMA(ALBE + CN>>
T = PSKALBE) - PSICALBE + CN)
Tl = T - PSKALPHA)
T2 = T - PSKBETA)
Dl = 0.
D2 = 0.
Dll = 0.
D12 = 0.
D22 = 0.
DO 2 I = 1,
CI = I -
KP
1
   XLK1 = XLK1 + NFCDXCALGMACALPHA + CI) + ALGMACBETA + CN-CI)
    - ALGMACCI + 1.) - ALGMACCN - CI + 1.))
   El = Tl + PSICALPHA + CI)
   E2 = T2 + PSICBETA + CN - CI)
   Dl = Dl + E1XNFCI)
   D2 = D2 + E2*NFCI)
   Dll = Dll + E1XX2XNFCI)
   D12 = D12 + E1XE2XNFCI)
   D22 = D22 + E2XX2XNFCD
CONTINUE

CHECK FOR SPECIFIED PARAMETERS

IFCXPCD.GT.O..OR.XPC2).GT.O.) D12 = 0.
IFCXPCD.GT.O.) Dl = 0.
IFCXPCD.GT.O.) Dll = 1.
IFCXPC2)-GT.O.) D2 = 0.
IFCXPC2).GT.O.) D22 = 1.
DET = D22XD11 - D12XD12
TEMP = Dll
Dll = D22/DET
D22 = TEMP/DET
D12 = -D12/DET
DELA = D11XD1 + D12XD2
DELB = D22XD2 + D12XD1

COMPUTE CONSTRAINED NEW ESTIMATES AND THEIR LIKELIHOOD
THE ESTIMATES ARE CONSTRAINED TO BE GREATER THAN EPS

ALPHP = AMAX1CALPHA + DELA,EPS)
BETAP = AMAX1CBETA + DELB,EPS)
ALBEP = ALPHP + BETAP
XLK2 = TOTXCALGMACCN + 1.) + ALGMACALBEP) - ALGMACALPHP)
 - ALGMACBETAP) - ALGMACALBEP + CN))
DO 3 I = 1, KP
   CI = I - 1
   XLK2 = XLK2 + NFCDXCALGMACALPHP + CI) +
    - ALGMACCI + 1.) - ALGMACCN - CI + 1.))
CONTINUE
                                                     ALGMA(BETAP+CN-CI)
FIND BEST ESTIMATES USING A CHOPPING METHOD

CONTINUE
DELTA = DELTAX0.5
XLK3 = XLK2

IF CHOPPING GETS TOO SMALL, QUIT

IFCDELTA.LE. 0.000001. AND. IT.GT.l) GO TO 16
IFCDELTA.LE. 0.000001) GO TO 18
ALPHP = AMAXKALPHA + DELTAXDELA, EPS)
BETAP = AMAXKBETA + DELTAXDELB, EPS)
ALBEP = ALPHP + BETAP
XLK2 = TOTXCALGMACCN + 1.) + ALGMACALBEP) - ALGMACALPHP)
 - ALGMACBETAP) - ALGMACALBEP + CN))
                                      56
-------
         DO 4 I = 1, KP
            CI = I - 1
            XLK2 = XLK2 + NFCI)X(ALGMA(ALPHP + CI) + ALGMACBETAP+CN-CI)
     1       - ALGMACCI + 1.) - ALGMA(CN - CI + 1.))
 4       CONTINUE
         IFCXLK2.LE.XLK1) GO TO 7
C
C        OPTIMIZE STEP LENGTH
C
         DOPT = (XLK3 - 4.XXLK2 + 3.XXLKD/C2.XCXLK3 - 2.XXLK2 +  XLK1))
         IFCDOPT.GE.1..0R.DOPT.LE.O.) DOPT =1.
         DELTA = DELTAXDOPT
C
C        SAVE NEW IMPROVED ESTIMATES
C
         ALPHA = AMAXKALPHA + DELTAXDELA,EPS)
         BETA = AMAX1CBETA + DELTAXDELB, EPS)
C
C        CHECK FOR CONVERGENCE - A CHANGE IN PARAMETERS OF LESS
C        THAN EPS
C
         IFCDELTAX(ABSCDELA) + ABS(DELB)).LE.EPS) GO TO 16
 12   CONTINUE
 18   CONTINUE
      WRITE (6,95)
 95   FORMATC'OPROCEDURE DID NOT CONVERGE')
 16   CONTINUE
C
C     COMPUTE FINAL LIKELIHOOD, EXPECTED COUNTS
C
      XLK = 0.
      TERM = ALGMACBETA + CN) - ALGMACALPHA + BETA + CN)  +
     1 ALGMACALPHA + BETA) - ALGMACBETA)
      DO 17 I = 1, M
         CI = I - 1
         ECI) = TOTXEXPCTERM)
         XLK = XLK + NFCDXTERM
         TERM = TERM + ALOGCALPHA + CI) - ALOGCCI + 1.)
         IFCI.NE.M ) TERM = TERM + ALOGCCN - CI) - ALOGCBETA+CN-CI-1.)
 17   CONTINUE
      IFCXPC2).GT.O.) D22 = 0.
      IFCXPCD.GT.O.) Dll = 0.
C
C     WRITE OUT ESTIMATES AND THEIR VARIANCES
C
      IFCXPCD.GT.O.) WRITE C6,20) ALPHA
 20   FORMATC'0',10X,"SPECIFIED VALUE OF ALPHA =',F12.5)
      IFCXPCD.EQ.O.) WRITE C6.25) ALPHA
 25   FORMATC'0',10X,'ESTIMATED VALUE OF ALPHA =',F12.5)
      IFCXPC2).GT.O.) WRITE C6,35) BETA
 35   FORMATC* ',10X,'SPECIFIED VALUE OF BETA  =',F12.5)
      IFCXPC2).EQ.O.) WRITE C6,40) BETA
 40   FORMATC' ',10X,'ESTIMATED VALUE OF BETA  =',F12.5)
      IFCXPCD.GT.O. .AND.XPC2).GT.O.) GO TO 19

 45   FORMATC'0',10X,'ASYMPTOTIC COVARIANCE MATRIX OF ESTIMATES')
      WRITE C6,50)
 50   FORMATC' ',21X,'ALPHA',8X,'BETA')
      WRITE C6,55) Dll, D12, D12, D22
 55   FORMATC' ',10X,'ALPHA',2E13.5/11X,'BETA ',2E13.5)
 19   CONTINUE
C
C     COMPUTE GOODNESS OF FIT
C
      CALL GDFIT

 65   FORMATC'0',10X,'LOG-LIKELIHOOD =',F12.4)
      RETURN
 900  CONTINUE
      WRITE C6,905) INT,  XPC1), XPC2)

                                      57
-------
 905  FORMATCOMOX, 'INVALID PARAMETER SPECIFICATIONV11X, I7,4F11.5>
      RETURN
      END
C     CHISQ                                           REVISED  5- 9-80
C        PROGRAMMER:  HASSELBLAD
C     COMPUTES THE CHI-SQUARE INTEGRAL TO X FOR N DEGREES OF FREEDOM
C        REFERENCE:  HANDBOOK OF MATHEMATICAL FUNCTIONS
C
      REAL FUNCTION  CHISQ(X.N)
      SUM = 0.
      IF(X.GT.O.O.AND.N.GT.O)GO TO 2
      CHISQ = 1.
      RETURN
 2    CONTINUE
      IFCN.GT.50) GO TO 6
      IF(CN/2)X2.NE.N)GO TO 11
      IF(N.LE.2)GO TO 3
      M = N/2 - 1
      TERM = 1.
      DO 4 I = 1, M
      CI = 2.XFLOAT(I)
      TERM = TERMXX/CI
      SUM = SUM + TERM
 ft    CONTINUE
 3    CONTINUE
      CHISQ = EXP(-X/2.0)*(1.0+SUM)
      RETURN
 11   CONTINUE
      TERM = 1.0/X
      IFCN.LE.DGO TO 12
      M = (N-D/2
      DO 13-1 = 1, M
      CI = 2.XFLOAT(I)-1.
      TERM = TERMXX/CI
      SUM = SUM + TERM
 13   CONTINUE
 12   CONTINUE
      XS = SQRT(X)
      CHISQ = 2.  - 2.*CNORM(XS> + XSXEXP(-X/2.>*.7978846XSUM
      RETURN
 6    CONTINUE
      CHISQ = l.-CNORM«(X/N>xx.3333333-l.+2./C9.XN»/SQRTC2./(9.*N))>
      RETURN
      END
                                     58
-------
C     SUBROUTINE TO FIT A MIXTURE OF TWO BINOMIAL DISTRIBUTIONS
c          PROGRAMMER: HASSELBLAD                      REVISED  9-25-80
c
      SUBROUTINE BINOMX
      COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C
C     NF IS AN ARRAY OF OBSERVED COUNTS
C     E IS AN ARRAY OF EXPECTED COUNTS
C     XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C     K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY
C     M IS THE LENGTH OF THE E ARRAY
C     INT IS AN OPTIONAL INPUT INTEGER.
C     NP IS THE NUMBER OF PARAMETERS ESTIMATED
C     IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C        AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C     NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C     PI IS THE ESTIMATED PARAMETER OF THE FIRST BINOMIAL DISTRIBUTION
C     P2 IS THE ESTIMATED PARAMETER OF THE SECOND BINOMIAL DISTRIBUTION
C     ALPHA IS THE ESTIMATED PROPORTION CORRESPONDING TO THE FIRST
C        BINOMIAL DISTRIBUTION
C
      IF = 1
      WRITE (6,20) TITLE
 20   FORMATC1' ,10X,20A4)
C
C     CHECK FOR IMPROPER PARAMETER VALUES
C
      IF(XP(l).LT.O..OR.XP(l).GT.l.) GO TO 900
      IF(XP(2).LT.O. .OR.XP(2).GT.l.) GO TO 900
      IF(XP(3).LT.O..OR.XP(3).GT.l.) GO TO 900
      N = MAX(INT.K)
      CN = N
      KP = K + 1
      M = MIN(KP+1,N+1)
      NP = 3
      IF(XP(1).GT.O.) NP = NP - 1
      IF(XP(2).GT.O.) NP = NP - 1
      IF(XP(3).GT.O.) NP = NP - 1
      P = 0.
C
C     COMPUTE ESTIMATE OF AVERAGE P
C
      DO 2 I = 1, KP
         P = P +  CI-l)XNF(I)
 2    CONTINUE
      TOT = NTOT
      P = PX(TOTXCN)

 5    FORMAT ((''0',1 OX, 'FIT TO A MIXTURE OF TWO BINOMIAL DISTRIBUTIONS')

 15   FORMAT('o"lOX,'PCX> = N! (ALPHACP1XXX) C1-P1)*X(N-X) +'/15X,
     1  ' (1-ALPHA) (P2XXX) (1-P2)XX(N-X) )/(X! CN-X) ! ) ' )
      WRITE (6,10) NTOT, N
 10   FORMAT ('O'/lOX, 'TOTAL SAMPLE SIZE =',16,', N -
C
C     COMPUTE INITIAL  ESTIMATES
C
      ALPHA = 0.5
      IF(XP(3).GT.O.) ALPHA = XP(3)
      PI = 0.75XP
      IF(Pl.LE.O.OS)  PI  = 0.05
      IF(XP(1)-GT.O.) PI = XP(1)
      P2 = 1.333XP
      IF(P2.GE.0.95)  P2  = 0.95

                                                      GO TO 8
C     BEGIN MIP-EM  ITERATION SCHEME
C
      DO 6 IT = 1»  100Q
                                     59
-------
C
C
C
C
C
C
C
         XI = 0.
         X2 = 0.
         XS = 0.
         Ql = 1. - PI
         Q2 = 1. - P2
         ALP1 = ALOG(Pl)
         ALQ1 = ALOG(Ql)
         ALP2 = ALOGCP2)
         ALQ2 = ALOGCQ2)
         XDC = 1.
         XNC = CN
         TERM1 = CNXALQ1
         TERM2 = CNXALQ2
         DO 7 I = 1, KP
            CI = I - 1
            Dl = EXPCTERM1)
            D2 = EXPCTERM2)
            D = ALPHAXD1 + (1. - ALPHA)XD2
            XI = XI + D1XCIXNF(I)/D
            X2 = X2 + D2XCIXNF(I)/D
            XS = XS + ALPHAXD1XNF(I)/D
            IF(I.NE.M) TERM1 = TERM1
            IF(I.NE.M) TERM2 = TERM2
            IF(XNC.GT.l.) XNC = XNC - 1.
            XDC = XDC + 1.
         CONTINUE
         XI = Xl/CTOTXCN)
         X2 = X2/(TOTXCN>
         XS = XS/TOT
         IFCXPCD.GT.O.) XI = XP(l)
         IF(XPt2).GT.O.) X2 = XPC2)
         IF(XP(3).GT.O.) XS = XP(3)
                                      ALP1-ALQ1 + ALOG(XNC) - ALOG(XDC)
                                      ALP2-ALQ2 + ALOG(XNC) - ALOG(XDC)
        TEST FOR CONVERGENCE
                                      ABS(ALPHA-XS) . LE. 0 .0001) GO TO
        IF(ABSCPl-Xl) + ABS(P2-X2)
        PI = XI
        P2 = X2
        ALPHA = XS
6    CONTINUE
     WRITE (6,95)
95   FORMATC'0',10X, 'PROCEDURE DID NOT CONVERGE1)
4    CONTINUE
     PI = XI
     P2 = X2
     ALPHA = XS
     CONTINUE

     COMPUTE FINAL LIKELIHOOD, EXPECTED FREQUENCIES,
     AND SECOND PARTIAL DERIVATIVES
      XLK =
      Ql = 1
      Q2 = 1
      ALP1 =
      ALQ1 =
      ALP2 =
      ALQ2 =
      XDC =
      XNC =
      TERM1
      TERM2
      ESUM =
      Dll =
      D12 =
      D22 =
      D13 =
      D23 =
      D33 =
      DO 9 I
           0.
           .  - PI
           .  - P2
            ALOG(Pl)
            ALOG(Ql)
            ALOGCP2)
            ALOGCQ2)
           1.
           CN
           =  CNXALQ1
           =  CNXALQ2
            0.
           0.
           0.
           0.
           0.
           0.
           0.
            =  1,  KP
                                    60
-------
c
c
c
c
c
c
 25

 30

 35

 40

 45

 50
 55

 60

 65
    CI = I - 1
    Dl = EXPUERM1)
    D2 = EXP(TERM2)
    D = ALPHAXD1 + (1. - ALPHA)XD2
    Ed) = TOTXD
    ESUM = ESUM + Ed)
    XLK = XLK + NFd)xALOG(D)

                               ALP1
                                           ALQ1 + ALOG(XNC)  -  ALOG(XDC)
                                           AL92 + ALOG(XNC)  -  ALOG(XDC)
         XDC = XDC + 1.
         Tl = NF(I)/D
         T2 = -Tl/D
         T3 = D1XALPHA
         T4 = D2X(1. - ALPHA)
         T5 = CI/P1 - (CN - CD/C1. - PI)
         T6 = CI/P2 - (CN - CI)/(1. - P2)
                                 T1XT3*CT5X*2 - CI/P1XX2  -
         DH = Dll + T2XT3XX2XT5XX2
          (CN - CI)/d. - PDXX2)
         D22 = D22 + T2XT4XX2XT6XX2 + T1XT4XCT6XX2 - CI/P2XX2 -
          (CN -
         D12 = D12
    D13 = D13
    D23 = D23
    D33 = D33
 CONTINUE
 E(KP) = E(KP)
                   - P2)XX2)
                T2XT3XT4XT5XT6
                T2XT3XT5x(Dl -
                                    D2)
                   + T2XT4XT6XCD1 - D2)
                   + T2X(D1 - D2)XX2

                    + TOT - ESUM
 ALLOW FOR SPECIFIED PARAMETERS
IF(XPd)
IF(XP(2)
IF(XP(3)
IF(XPd)
IF(XP(1)
.GT.
.GT.
.GT.
.GT.
.GT.
ooooo
) Dll = 1
) D22 = 1
) D33 = 1
.OR.XP(2)
,OR.XP(3)
!GT
.GT
.0.)
.0.)
D12
D13
=
0.
0.
 IF(XP(2).GT.O..OR.XP(3).GT.O.)  D23  =  0.
 DET  =  D11XD22XD33 + 2.XD12XD23XD13  -  D13XX2XD22 - D23XX2XD11 -
L D12XX2XD33

 WRITE  OUT ESTIMATES AND THEIR VARIANCES

 IF(XPd).GT.O-)  WRITE (6,25)  PI
 FORMATCO',10X,'SPECIFIED VALUE OF  PI =',F11.5)
 IF(XP(1).EQ.O.)  WRITE (6,30)  PI
 FORMATCO',10X,'ESTIMATED VALUE OF  PI =',F11.5)
 IF(XP(2)-GT.O.)  WRITE (6,35)  P2
          ',10X,'SPECIFIED VALUE OF  P2 =',F11.5)
          EQ.O.)  WRITE (6,40)  P2
          ',10X,'ESTIMATED VALUE OF  P2 =',F11.5)
          GT.O.)  WRITE (6,45)  ALPHA
          ',10X,'SPECIFIED VALUE OF  ALPHA  =',F8.5)
 IF(XP(3).EQ.O.)  WRITE (6,50)  ALPHA
 FORMATC  ',10X,'ESTIMATED VALUE OF  ALPHA  =',F8.5)
 IF(XPd).GT.O..AND.XP(2).GT.O..AND.XP(3).GT.O.) GO TO 11
       (D23XX2 -  D22*D33)/DET
       (D12XD33  - D13XD23)/DET
       (D13XD22  - D12XD23)/DET
       (D13*X2 -  D11XD33)/DET
       (D11XD23  - D13*D12)/DET
 FORMATC
 IF(XP(2)
 FORMATC
 IF(XP(3)
 FORMATC
      Vll
      V12
      V13
      V22
      V23
      V33
     = (D12XX2 -
                 D11XD22)/DET
                 Vll  =  0.
                 V22  =  0.
 IF(XP(1).GT.O,
 IF(XP(2).GT.O,
 IF(XP(3).GT.O.) V33 = 0.

 FORMATC OS10X,'ASYMPTOTIC COVARIANCE MATRIX OF ESTIMATES')

 FORMAT c''!21X,'   PI ',8X,'  P2 ' ,8X,'ALPHA')
 WRITE  6,65) Vil,  V12,  V13, V12,  V22, V23,  V13, V23,  V33
 FORMATC  '7lOX,'Pl   ',3E13.5/11X,'P2   ',3E13.5/11X,'ALPHA',
I 3E13.5)
                                     61
-------
 11   CONTINUE
C
C     COMPUTE GOODNESS OF FIT
C
      CALL GDFIT
      WRITE (6,70) XLK
 70   FORMAT('0',10X,'LOG-LIKELIHOOD =',F12.4)
      RETURN
 900  CONTINUE
      WRITE (6,905) INT, XP(l), XP(2),  XP(3)
 905  FORMAT('0',10X,'INVALID PARAMETER SPECIFICATION'/11X,I7,4F11.5)
      RETURN
      END
C     SUBROUTINE TO FIT A POISSON DISTRIBUTION         REVISED  9-25-80
C        PROGRAMMERS: HASSELBLAD, ANDERSON
      SUBROUTINE POISSN
      COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C
C     NF IS AN ARRAY OF OBSERVED COUNTS
C     E IS AN ARRAY OF EXPECTED COUNTS
C     XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C     K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY
C     M IS THE LENGTH OF THE E ARRAY
C     INT IS AN OPTIONAL INPUT INTEGER.
C     NP IS THE NUMBER OF PARAMETERS ESTIMATED
C     IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C        AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C     NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C     XLMB IS THE ESTIMATE OF THE PARAMETER LAMBDA
C
      WRITE (6,30) TITLE
 30   FORMAT ('1',10X,20A4)
      WRITE (6,5)
 5    FORMAT CO',10X,'FIT TO A POISSON DISTRIBUTION')
      WRITE (6,40)
 40   FORMAT('0',10X,'P(X) = EXP(-LAMBDA)LAMBDA*XX/X!')
      IF(XP(1).LT.O.) GO TO 900
      IF = 1
      KP = K + 1
      M = K + 2
      NP = 1
      IF(XPU).GT.O) NP = 0
      TOT = NTOT
      XLMB = XP(1)
      IF(XP(1).GT.O.) GO TO 4
C
C     COMPUTE ESTIMATE OF LAMBDA
C
      XM2 = 0.
      DO 2 I = 1, KP
         CI = I - 1
         XLMB = XLMB + CI*NF(I)
         XM2 = XM2 + CIXX2*NF(I)
 2    CONTINUE
      XLMB = XLMB/TOT
 4    CONTINUE


                                     62
-------
     WRITE  (6,10)  NTOT
 10   FORMAT  CO', 10X, 'TOTAL  SAMPLE SIZE =M6>
     IF(XP(1).EQ.O.)  WRITE  (6,15)  XLMB
 15   FORMATCO'.IOX, 'ESTIMATED VALUE OF LAMBDA  =',F9.4)
     IF(XPd).GT.O-)  WRITE  (6,20)  XLMB
 20   FORMAT('0',10X,'SPECIFIED VALUE OF LAMBDA  =',F9 4)
     VAR  =  XLMB/TOT
     IF(XP(1).EQ.O.)  WRITE  (6,25)  VAR
 25   FORMATC  ',10X,'VARIANCE OF LAMBDA =',E13.5)
     XLK  =  0.
     ALP  =  ALOG(XLMB)
     CUM  =  0.
     XDC  =  1.
     TERM = -XLMB
C
C    COMPUTE EXPECTED COUNTS, LOG-LIKELIHOOD
C
     DO 3 I = 1,  KP
         Ed) = EXP(TERM)*TOT
         CUM = CUM +  Ed)
         XLK = XLK +  NFd)XTERM
         TERM = TERM  + ALP - ALOG(XDC)
         XDC = XDC +  1.
 3   CONTINUE
      E(M) = TOT - CUM
C
C    COMPUTE GOODNESS OF FIT TESTS
C
      NDF = NTOT - 1
      VART = XM2/XLMB - TOTXXLMB
      PVAL = CHISQ(VART,NDF)
      IF(XP(1).EQ.O.) WRITE (6,45) VART, PVAL
 «   FORMAT('0',10X,'VARIANCE TEST =',F8.2,',  P =',F6.4)
      CALL GDFIT
      WRITE (6,35) XLK
 35   FORMAT('0',10X,'LOG-LIKELIHOOD  =',F12.«)
      RETURN
 900  CONTINUE
      WRITE (6,905) XP(1)
 905  FORMAT('0',10X,'INVALID PARAMETER SPECIFIEDV11X,F11.5)
      RETURN
      END
                                      63
-------
C     SUBROUTINE TO FIT A TRUNCATED POISSON DISTRIBUTION
c        PROGRAMMERS: HASSELBLAD, ANDERSON             REVISED  9-25-80
c
      SUBROUTINE TPOISN
      COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C
C     NF IS AN ARRAY OF OBSERVED COUNTS
C     E IS AN ARRAY OF EXPECTED COUNTS
C     XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C     K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY
C     M IS THE LENGTH OF THE E ARRAY
C     INT IS AN OPTIONAL INPUT INTEGER.
C     NP IS THE NUMBER OF PARAMETERS ESTIMATED
C     IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C        AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C     NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C     XLMB IS THE CURRENT ESTIMATE OF THE PARAMETER LAMBDA
C     XLMBP IS THE NEW ESTIMATE OF THE PARAMETER LAMBDA
C     ZN IS THE ESTIMATE OF THE UNOBSERVED CELL F(8>
C
      WRITE (6,30) TITLE
 30   FORMAT('1',10X,20A4)
      WRITE (6,5)
 5    FORMATCO',10X,'FIT TO A TRUNCATED POISSON DISTRIBUTION1)
      WRITE (6,20)
 20   FORMAT('0',10X,'P(X) = EXP(-LAMBDA)LAMBDA*XX/(X!(l-EXP(-LAMBDA)))'
     1)
      KP = K + 1
      M = K + 2
      IF = 2
      NT = 0
      NP = 1
      IF(XP(1).GT.O.) NP = 0
      IF(XP(1).LT.O.) GO TO 900
      XM1 = 0.
      XM2 = 0.
C
C     COMPUTE ESTIMATE OF LAMBDA AND FCO)
C
      XLMB = XP(1)
      DO 2 I = 2, KP
         CI = I - 1
         NT = NT + NF(I)
         XM1 = XM1 + CIXNF(I)
         XM2 = XM2 + CI*(d - l.)XNF(I)
 2    CONTINUE
      TOT = NT
      ZN = XM1XX2/XM2 - TOT
      WRITE (6,10) NT
 10   FORMATCO',10X, 'TOTAL SAMPLE SIZE =',I6)
      IF(XP(1).GT.O.) GO TO 6
      XLMB = XM1/(ZN + TOT)
C
C     BEGIN ITERATION PROCEDURE
C
      DO 4 IT = 1, 50
         EXLMB = EXP(-XLMB)
         ZN = EXLMBXTOT/U. - EXLMB)
         XLMBP = XM1/(ZN + TOT)
         IF(ABS(XLMBP - XLMB)/AMAX1(XLMB,1.).LE.0.00001) GO TO 7
         XLMB = XLMBP
 4    CONTINUE
      WRITE (6,40)
 40   FORMATCO',10X, 'PROCEDURE DID NOT CONVERGE')
 7    CONTINUE
      XLMB = XLMBP
 6    CONTINUE
      IF(XP(1).EQ.O.) WRITE (6,15) XLMB
 15   FORMATCO',10X,'ESTIMATED VALUE OF LAMBDA =',F8.4)
      IF(XP(1).GT.O.) WRITE (6,25) XLMB


                                      64
-------
 25   FORMAT('0',10X,'SPECIFIED VALUE OF LAMBDA = ',F8 4)
C
C     COMPUTE ESTIMATED VARIANCE OF ESTIMATE
C
      VAR = XLMBXX2/(XM1X(XLMB + i. - XM1/TOT))
      IF(XP(1).EQ.O)WRITE (6,45) VAR
 45   FORMATC ',10X,'ASYMP. VAR. OF LAMBDA =',E13.5)

      ALP = ALOG(XLMB)
      CUM = 0.
      XDC = 2.
      TERM = ALP - XLMB - ALOGU. - EXP(-XLMB))
C
C     COMPUTE EXPECTED COUNTS
C
      Ed) = 0.
      DO 3 I = 2, KP
         ECI) = EXP(TERM)*TOT
         CUM = CUM + ECI)
         XL = XL + NFCI)XTERM
         TERM = TERM + ALP - ALOG(XDC)
         XDC = XDC + 1.
 3    CONTINUE
      ECM) = TOT - CUM
C
C     COMPUTE GOODNESS OF FIT
C
      CALL GDFIT
      WRITE (6,60) XL
 60   FORMATC0',10X,'LOG-LIKELIHOOD =',F12.4)
      RETURN
 900  CONTINUE
      WRITE (6,905) XP(1)
 905  FORMATC OSIOX, 'INVALID PARAMETER SPECIFICATIONV11X,F11.5)
      RETURN
      END
C     PSI                                              REVISED  7-14-80
C        PROGRAMMER: HASSELBLAD
C     COMPUTES THE PSI FUNCTION (ALSO KNOWN AS THE DIGAMMA  FUNCTION):
C     PSI(Z) = D(LN(GAMMA(Z)))/DZ.
C
C     THE PROGRAM USES AN ASYMPTOTIC FORMULA AS GIVEN ON  P.  259 OF
C     "HANDBOOK OF MATHEMATICAL FUNCTIONS".
C
      REAL FUNCTION PSI(Z)
      X = Z
      5 = 0.
 2    CONTINUE
      IF(X.GE.5.) GO TO 3
      S = S + l./X
      X = X + 1.
      GO TO 2

                                                      . 0031349208/X)))
     1 /X - S
      RETURN
      END
                                      65
-------
 C      SUBROUTINE TO  FIT A MIXTURE OF TWO POISSON DISTRIBUTIONS
 c           PROGRAMMER: HASSELBLAD                      REVISED  4-30-so
 c
       SUBROUTINE POISMX
       COMMON  TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
 C
 C      NF  IS AN  ARRAY OF OBSERVED COUNTS
 C      E IS  AN ARRAY  OF EXPECTED COUNTS
 C      XP  IS AN  ARRAY OF OPTIONAL INPUT CONSTANTS
 C      K IS  THE  VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY
 C      M IS  THE  LENGTH OF THE E ARRAY
 C      INT IS  AN OPTIONAL INPUT INTEGER.
 C      NP  IS THE NUMBER OF PARAMETERS ESTIMATED
 C      IF  IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
 C         AND  IF = 2  FOR TRUNCATED DISTRIBUTIONS.
 C      NTOT  IS THE SUM OF THE COUNTS IN THE NF ARRAY
 C      XLMI  IS THE ESTIMATE OF THE PARAMETER LAMBDA1
 C      XLM2  IS THE ESTIMATE OF THE PARAMETER LAMBDA2
 C      ALPHA IS  THE ESTIMATED PROPORTION CORRESPONDING TO THE FIRST
 C         POISSON DISTRIBUTION

 °      IF  .  X
       WRITE (6,20) TITLE
 20    FORMAT('1',10X,20A4)
 C
 C      CHECK FOR IMPROPER PARAMETER VALUES
 C
       IF(XPd).LT.O-) GO TO 900
       IF(XP(2).LT.O.) GO TO 900
       IF(XPC3).LT.O..OR.XP(3).GT.l.) GO TO 900
       KP  =  K  +  1
       M = K + 2
       NP  =  3
       IFCXPCD.GT.O.) NP = NP - 1
       IF(XPC2).GT.O.) NP = NP - 1
       IF(XP(3).GT.O.) NP = NP - 1
       XLM = 0.
 C
 C      COMPUTE ESTIMATE OF AVERAGE P
 C
       DO  2  I  =  1, KP
          XLM  =  XLM + (I-l)XNF(I)
 2     CONTINUE
       TOT = NTOT
       XLM = XLM/TOT
       WRITE (6,5)
 5     FORMAT  CO',10X,'FIT TO A MIXTURE OF TWO POISSON DISTRIBUTIONS')
       WRITE (6,15)
 15    FORMAT('0',10X,'P(X) = ALPHA*EXP(-LAMBDA1)X**LAMBDA1/X!
     1 'U-ALPHA)EXP(-LAMBDA2)XXXLAMBDA2)/X!')
       WRITE (6,10) NTOT
 10    FORMAT  CO',10X,'TOTAL SAMPLE SIZE =',I6)
C
C      COMPUTE INITIAL ESTIMATES
C
       ALPHA = 0.5
       IF(XP(3).GT.O.) ALPHA = XP(3)
      XLMI  =  0.75XXLM
       IF(XP(1).GT.O.) XLMI = XP(1)
      XLM2  =  1.3333XXLM
       IF(XP(2).GT.O.) XLM2 = XP(2)
       IF(XP(1).GT.O..AND.XP(2).GT.O..AND.XP(3).GT.O.) GO TO 8
C
C     BEGIN MIP-EM ITERATION SCHEME
C
      DO 6 IT = 1, 2000
         XI = 0.
         X2 = 0.
         XS = 0.
         ALM1 = ALOG(XLMl)

                                      66
-------
        ALM2  = ALOG(XLM2)
        XDC = 1.
        TERM! = -XLM1
        TERM2 = -XLM2
        DO 7  I =  1,  KP
           CI — I  —  1
           Dl = EXPCTERM1)
           D2 = EXPCTERM2)
           D  = ALPHAXD1  +  (1.  -  ALPHA)XD2
           XI = XI + D1XCIXNF(I)/D
           X2 = X2 + D2XCIXNFCI)/D
           XS = XS + ALPHAXD1XNFCI)/D
           TERM1  = TERM1 +  ALM1  - ALOG(XDC)
           TERM2  = TERM2 +  ALM2  - ALOG(XDC)
           XDC =  XDC +  1.
 7      CONTINUE
        XI =  Xl/TOT
        X2 =  X2/TOT
        XS =  XS/TOT
        IFCXPCD.GT.O.)  XI  = XP(l)
        IF(XP(2).GT.O.)  X2  = XPC2)
        IFCXP(3).GT.O.)  XS  = XPC3)
C
C       TEST  FOR  CONVERGENCE
C
        IFCABS(XLM1-X1)+ABS(XLM2-X2)+ABS(ALPHA-XS).LE.0.0001) GO TO
        XLM1  = XI
        XLM2  = X2
        ALPHA = XS
 6    CONTINUE
      WRITE  (6,95)
 95   FORMATCO'.IOX,'PROCEDURE DID  NOT  CONVERGE')
 4    CONTINUE
      XLM1 = XI
      XLM2 = X2
      ALPHA  =  XS
 8    CONTINUE
C
C     COMPUTE  FINAL LIKELIHOOD, EXPECTED FREQUENCIES,
C     AND  SECOND PARTIAL  DERIVATIVES
C
      XLK  =  0.
      ALM1 = ALOG(XLMl)
      ALM2 = ALOGCXLM2)
      XDC  =  1.
      TERM1  =  -XLM1
      TERM2  =  -XLM2
      ESUM = 0.
      Dll  =  0.
      D12  =  0.
      D22  =  0.
      D13  =  0.
      D23  =  0.
      D33  =  0.
      DO 9 I = 1,  KP
        CI  =  I  -  1
        Dl  =  EXP(TERMl)
        D2  =  EXPCTERM2)
        D = ALPHAXD1 +  (I-  - ALPHA)XD2
        EC!)  =  TOTXD
        ESUM  =  ESUM  *  Ed)
        XLK = XLK +  NFm*ALOG(D>   ,vn_
        TERM1 = TERM1  +  ALM1 - ALOG XDC)
        TERM2 = TERM2  +  ALM2 - ALOG(XDC)
        XDC = XDC +  1.
        Tl  =  NF(I)/D
        T2  =  -Tl/D
        T3  =  D1XALPHA
        T4  =  D2*C1.  -  ALPHA)
        T5  =  (CI/XLM1  -  1.)

                                      67
-------
         T6 = (CI/XLM2 - 1.)
         Dll = Dll + T2XT3XX2XT5XX2 * T1XT3X(T5XX2 - CI/XLM1XX2)
         D22 = D22 + T2XT4XX2XT6XX2 * T1XT4XCT6XX2 - CI/XLM2XX2)
         D12 = D12 + T2XT3XT4XT5XT6
         D13 = D13 + T2XT3XT5XCD1 - D2)
         D23 = 023 + T2XT4XT6X(D1 - D2)
         D33 = D33 + T2X(D1 - D2)XX2
 9    CONTINUE
      E(M> = TOT - ESUM
C
C     ALLOW FOR SPECIFIED PARAMETERS
C
      IF(XP(1).GT.O.) Dll = 1.
      IF(XP(2).GT.O.) D22 = 1.
      IFCXP(3).GT.O.) D33 = 1.
      IF(XP(1).GT.O..OR.XP(2).GT.O.) D12 = 0.
      IF(XP(1).GT.O..OR.XP(3).GT.O.) D13 = 0.
      IF(XP(2).GT.O..OR.XP(3).GT.O.) D23 = 0.
      DET = D11XD22XD33 + 2.XD12XD23XD13 - D13XX2XD22 - D23XX2XD11 -
     1 D12XX2XD33
C
C     WRITE OUT ESTIMATES AND THEIR VARIANCES
C
      IF(XP(1).GT.O.) WRITE (6,25) XLM1
 25   FORMATCO',10X,'SPECIFIED VALUE OF LAMBDA1 ='fF11.5>
      IF(XP(1).EQ.O.) WRITE (6,30) XLM1
 30   FORMATCO'»10X,'ESTIMATED VALUE OF LAMBDA1 =',F11.5>
      IF(XP(2).GT.O.) WRITE (6,35) XLM2
 35   FORMATC ',10X,'SPECIFIED VALUE OF LAMBDA2 =',F11.5>
      IF(XP(2).EQ.O.) WRITE (6,40) XLM2
 40   FORMATC ',10X,'ESTIMATED VALUE OF LAMBDA2 s'.Fll.S')
      IF(XP(3).GT.O.) WRITE (6,45) ALPHA
 45   FORMATC ',10X,'SPECIFIED VALUE OF ALPHA =',F13.5)
      IF(XP(3).EQ.O.) WRITE (6,50) ALPHA
 50   FORMATC ',10X,'ESTIMATED VALUE OF ALPHA =',F13.5)
      IF(XP(1).GT.O..AND.XP(2).GT.O..AND.XP(3).GT.O.) GO TO U
      Vll = (D23XX2 - D22XD33)/DET
      V12 = (D12XD33 - D13XD23)/DET
      V13 = (D13XD22 - D12XD23)/DET
      V22 = (D13XX2 - D11XD33)/DET
      V23 = (D1IXD23 - D13XD12)/DET
      V33 = (D12XX2 - D11XD22)/DET
      IF(XP(1).GT.O.) Vll = 0.
      IF(XP(2)-GT.O.) V22 = 0.
      IF(XP(3).GT.O.) V33 = 0.
      WRITE (6,55)
 55   FORMATCO',10X,'ASYMPTOTIC COVARIANCE MATRIX OF ESTIMATES')
      WRITE (6,60)
 60   FORMATC ',22X,'LAMBDA1',6X,'LAMBDA2',6X,'ALPHA')
      WRITE (6,65) Vll, V12, V13, V12, V22, V23, V13, V23, V33
 65   FORMATC ',10X,'LAMBDA1',3E13.5/11X,'LAMBPA2',3E13.5/11X,'ALPHA',
     1 2X,3E13.5)
 11   CONTINUE
C
C     COMPUTE GOODNESS OF FIT
C
      CALL GDFIT
      WRITE (6,70) XLK
 70   FORMATCO'.IOX,'LOG-LIKELIHOOD =',F12.4)
      RETURN
 900  CONTINUE
      WRITE (6,905) INT, XP(1), XP(2), XP(3)
 905  FORMATCO',10X,'INVALID PARAMETER SPECIFICATIONT/UX,I7,4F11.5)
      RETURN
      END
                                      68
-------
C
c
c
C
C
C
C
C
C
C
C
c
     SUBROUTINE TO FIT A NEGATIVE BINOMIAL DISTRIBUTION
        PROGRAMMER: HASSELBLAD                        REVISED
     SUBROUTINE NEGBIN
     COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
                                                               9-30-80
C
C     NF IS AN ARRAY OF OBSERVED COUNTS
C     E IS AN ARRAY OF EXPECTED COUNTS
C     XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C     K IS THE VALUE CORRESPONDING TO THE LARGEST  OBSERVED FREQUENCY
C     M IS THE LENGTH OF THE E ARRAY
C     INT IS AN OPTIONAL INPUT INTEGER.
C     NP IS THE NUMBER OF PARAMETERS ESTIMATED
C     IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR  MOST  DISTRIBUTIONS;
C        AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C     NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C     P IS THE ESTIMATE OF THE PARAMETER P
C     XN IS THE ESTIMATE OF THE PARAMETER N
C     XNP IS THE IMPROVED ESTIMATE OF THE PARAMETER  N
C     Nl, N2, CN1, CN2, XLK1, XLK2, TERM1, TERM2,  ALT1, ALT2  ARE USED
C        TO FIND THE OPTIMAL INTEGER SOLUTION FOR  N
C
      WRITE (6,30) TITLE
 30   FORMAT ( ' 1 ' ,10X,20A4)
      WRITE (6,5)
 5    FQRMAT('0',10X,'FIT TO A NEGATIVE BINOMIAL DISTRIBUTION')
      WRITE (6,10)
 10   FORMAT('0',10X,'P(X) = (N+X-1) !PX*X/(X! (N-l) ! (H-P)XX(N+X) ) ' )
      WRITE (6,15) NTOT
 15   FORMAT('0',10X, 'TOTAL SAMPLE SIZE =',I6)
      IF(XP(1).LT.O..OR.XP(2).LT.O.) GO TO 900
      IF = 1
      KP = K + 1
      M = K + 2
      NP = 2
      IF(XP(1).GT.O.) NP = NP - 1
      IF(XP(2).GT.O.) NP = NP - 1
      TOT = NTOT

      COMPUTE MOMENTS

      XM1 = 0.
      XM2 = 0.
      DO 2 I = 1, KP
         CI = I - 1
         XM1 = XM1 + CIXNF(I)
         XM2 = XM2 + CIXX2XNF(I)
      CONTINUE
      XM1 = XM1/TOT
      V = XM2/TOT - XM1XX2
      COMPUTE INITIAL ESTIMATES

      P = XP(1)
      V IJ •- YP f J \
      IFCXP(1).GT.O..AHD.XPC2).GT.O.) GO TO
      IF(XPU) EQ.O..
      IF(XP(2)iEQ.O.) XN = XM1/P
      IF(XP(2)-GT.O.) GO TO
      USE NEWTON-RAPHSON ITERATION SCHEME FOR  N  IF  NOT SPECIFIED
      DO 7 IT = 1, 100
         SUM = 0.
         SUM1 = 0.
         S = 0.
         SI = 0.
         TERM = XN - 1.
                                      69
-------
   DO
 7
 9

 95
 8
C
C
C
 17
            6 I = 1, K
            TERM = TERM + 1.
            S = S + l./TERM
            SI = SI + l./TERM**2
            SUM = SUM + SXNFCI+l)
GT
EQ
GT
                   Fl
                   Fl
            - F/F1
                 GO TO 9
                 XN)/XN).
      SUM1 = SUM1
   CONTINUE
   IFCXPCD.EQ.O
   IFCXPCD
   IFCXPCD
   IFCXPCD
   XNP = XN
   IFCXNP.LE.O.)
   IFCABSCCXNP -
   XN = XNP
CONTINUE
CONTINUE
WRITE C6,95)
FORMATC'0',10X, 'PROCEDURE
CONTINUE
XN = XNP
IFCXPCD.EQ.O
CONTINUE
NXB = XM1
NI = XN
N2 = Nl + 1
IF(N1.LE.O)N1
CN1 = Nl
                          S1XNFCI + D
F = TOTXALOGC1. +
F = TOTXALOGC1. +
                                     XM1/XN) -
                                     P) - SUM
                                                     SUM
                            = -CTOTXXMD/CXNXX2
                            = SUM1
                       + XM1XXN) + SUM1
                               LE.0.0001) GO TO 8
                          DID NOT CONVERGE')
                    ) P = XM1/XN
                    = 1
CN2 = N2
PI = XPCD
P2 = XPCD
IFCXPCD.EQ.O,) PI
IFCXPCD.EQ.O.) P2
= XM1/CN1
= XM1/CN2
COMPUTE LIKELIHOOD, EXPECTED COUNTS

XLK = 0.
XLK1 = 0.
XLK2 = 0.
ESUM = 0.
XNC = XN
XNC1 = CN1
XNC2 = CN2
Q = 1. + P
Ql = 1. + PI
Q2 = 1. + P2
ALT = ALOGCP) - ALOGCQ)
ALT1 = ALOGCP1) - ALOGCQ1)
ALT2 = ALOGCP2) - ALOGCQ2)
TERM = -XNXALOGCQ)
TERM1 = -CN1XALOGCQ1)
      = -CN2XALOGCQ2)
      I = 1, KP
      = I - 1
          TOTXEXP(TERM)
               + ECI)
TERM2
DO 17
   CI
   ECI) =
   ESUM = ESUM
   XLK = XLK +
   XLK1 = XLK1 +
   XLK2 = XLK2 +
   TERM = TERM +
   TERM1 = TERM1
   TERM2 = TERM2
   XNC = XNC + 1
   XNC1 = XNC1 +
   XNC2 = XNC2 +
CONTINUE
ECM) = TOT - ESUM
Dll = -TOTXCXNXPXX2
D12 = TOT/Q
D22 = SUM1
                     NFCDXTERM
                     + NFCDXTERMl
                     + NFCDXTERM2
                       ALT + ALOGCXNC) -
                                   ALOGCCI + 1.)
                   ALT1 •»• ALOG(XNCl) - ALOGCCI
                   ALT2 + ALOGCXNC2) - ALOGCCI
                                                       1.)
                                                       1.)
                       1.
                       1.
                          - 2.XPXXM1 - XM1)/CPXQ)XX2
                                      70
-------
c
C     WRITE OUT ESTIMATES AND THEIR VARIANCES
C
      IFCXPCD.GT.O.) WRITE (6,20) P
 20   FORMATCO',10X, 'SPECIFIED VALUE OF P =',F12 5)
      IF(XP(1).EQ.O.) WRITE (6,25) P
 25   FORMATCO'.IOX, 'ESTIMATED VALUE OF P =',F12 5)
      IF(XP(2).GT.O.) WRITE (6,35) XN
 35   FORMATC ', 10X, 'SPECIFIED VALUE OF N =',F12 5)
      IF(XP(2).EQ.O.) WRITE (6,40) XN
 40   FORMATC ',10X, 'ESTIMATED VALUE OF N =',F12.5)
      IF(XP(1).GT.O..AND.XP(2).GT.O.) GO TO 19
      IF(XP(1).GT.O.) Dll = 1.
      IF(XP(2).GT.O.) D22 = 1.
      IF(XP(1).GT.O..OR.XP(2).GT.O.) D12 = 0.
      DET = D11XD22 - D12XX2
      TEMP = Dll
      Dll = D22/DET
      D22 = TEMP/DET
      D12 = -D12/DET
      IF(XP(1).GT.O.) Dll = 0.
      IF(XP(2).GT.O.) D22 = 0.
      WRITE (6,45)
 45   FORMATC 0',10X,' ASYMPTOTIC COVARIANCE MATRIX OF ESTIMATES')
      WRITE (6,50)
 50   FORMAT('0',21X,'P',12X,'N')
      WRITE (6,55) Dll, D12, D12, D22
 55   FORMATC ' ,10X, 'P' ,2X,2E13.5/11X, 'N' ,2X,2E13.5)
 19   CONTINUE
      IF(XP(2).GT.O.) GO TO 22
C
C     FIND INTEGER SOLUTION
C
      IF(XLK2.LE.XLK1) GO TO 21
      PI = P2
      Nl = N2
      XLK1 = XLK2
 21   CONTINUE
      WRITE (6,70) PI, Nl, XLK1
 70   FORMATCO'.IOX, 'SOLUTION FOR INTEGER N (PASCAL DISTRIBUTION):'/
     1 11X,'P =',F8.5,', N =',15,', LOG-LIKELIHOOD =',F12.4)
 22   CONTINUE
C
C     COMPUTE GOODNESS OF FIT
C
      CALL GDFIT
      WRITE (6,65) XLK
 65   FORMATCO',10X, 'LOG-LIKELIHOOD =',F12.4)
      RETURN
 900  CONTINUE
  905  FORMATCoaOX'INVALIDpARAMETER SPECIFICATION'/18X,2F11.5)
      RETURN
  920  CONTINUE

  925  FORMATCO%10X%ARIANCE  C,F10.5,') < MEAN ( ' ,FIO .5, ' ) '/
     1 11X,'NO FIT ATTEMPTED')
      RETURN
      END
                                      71
-------
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
 5

 35

 15


 25
c
c
c
SUBROUTINE TO FIT A TRUNCATED NEGATIVE BINOMIAL DISTRIBUTION
   PROGRAMMERS: ANDERSON, HASSELBLAD             REVISED  10-20-80

SUBROUTINE TNEGBN
DIMENSION R(200)

NF IS AN ARRAY OF OBSERVED COUNTS
E IS AN ARRAY OF EXPECTED COUNTS
XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY
M IS THE LENGTH OF THE E ARRAY
INT IS AN OPTIONAL INPUT INTEGER.
NP IS THE NUMBER OF PARAMETERS ESTIMATED
IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
   AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
XN IS THE ESTIMATE OF THE PARAMETER N.
W IS THE ESTIMATE OF THE PARAMETER W, WHERE P = (1 - W)/W
P IS THE ESTIMATE OF THE PARAMETER P
SW IS THE FIRST PARTIAL WITH RESPECT TO W
SN IS THE FIRST PARTIAL WITH RESPECT TO N
R IS THE ARRAY OF CUMULATIVE FREQUENCIES

COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
WRITE (6,5) TITLE
FORMAT('1',10X,20A*)
WRITE (6,35)
FORMAT('0',10X,'FIT TO A TRUNCATED NEGATIVE BINOMIAL')
WRITE (6,15)
FORMAT('0',10X,'P(X) = (N+X-1) !PXXX/(X! (N-l) ! (l+P)*x(N+X) (1-
WRITE (6,25) NTOT
FORMAT('0',10X, 'TOTAL SAMPLE SIZE =',I6)
IF(XP(1).LT.O..OR.XP(2).LT.O.)GO TO 900
IF = 2
NT = 0
KP = K +• 1
M = K + 2
NP = 2
IF(XP(1).GT.O
IF(XP(2).GT.O
XM = 0.
DO 2 I = 2,
   CI = I -
   XM = XM
   NT = NT
CONTINUE
TOT = NT
XM = XM/TOT
Q = 0.
QS = 1. - XM
DO 3 I = 2, KP
   Q = Q + NF(I)XQSXX2
   QS = QS + 1.
CONTINUE
S = NF(2)/TOT

COMPUTE INITIAL ESTIMATES

W = XMX(1. - S)X(TOT -
                    ) NP = NP - 1
                    ) NP = NP - 1

                  KP
                  1
                   NF(I)XCI
                   NF(I)
      IF(W.LE.O.l) W = 0.1
      IF(W.GE.0.9) W = 0.9
      XN = (WXXM - S)/(l. - W)
      IF(XN.LE.O.S) XN = 0.5
      S = TOTXXM
      P = (1.  -W)/W
      IF(XP(1).GT.O.) P = XP(1)
      IF(XP(2).GT.O.) XN = XP(2)
      IF(XP(1).GT.O.) W = l./U. + P)
                                     72
-------
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
 8
 7

 95
 6
FORM THE CUMULATIVE DISTRIBUTION OF OBSERVATIONS

R(KP) = NF(KP)
DO 4 I = 1, K
   J = KP - I
   R(J) * RCJ+1) + NF(J)
CONTINUE
IF(XP(1).GT.O..AND.XP(2).GT.O.) GO TO 6

ITERATE USING A NEWTON-RAPHSON ITERATION SCHEME

DO 8 IT ~ I, 100
   Dll = 0.
   D22 = 0.
   Q = XN
   DO 9 I = 2, KP
      QS = RCD/Q
      Dll = Dll + QS
      D22 = D22 + QS/Q
      Q - Q + 1.
   CONTINUE
   XK = TOTXXN
   WM = 1. - W
   Q = WXXXN
   QS = 1. - Q
   AL = ALOG(W)
   SU = XK/CWXQS) - S/WM
   SN = TOTXAL/QS + Dll
   IF(XP(l).GT.d.) SW a o.
   IFCXP(2).GT.O.) SN = d.
   QM = QSXX2

   COMPUTE TERMS OP INFORMATION MATRIX

   D12 = WXQM
   Dll = XKXC1. - (XN + l.)XQ)/CWXD12> + S/WMXX2
   D22 = D22 - TOTXAIXX2XQ/QM
   D12 = TOTxd. - (1. - XNXAL)XQ)/D12
   IF(XP(1).GT.O..OR.XP(2).GT.O.) D12 = 0.
   DET = D11XD22 - D12XX2
   IFCDET.LE.O.) DET = D11XD22
   VW = D22/DET
   VN = Dll/DET
   COV = D12/DET

   CHECK FOR NEGATIVE VARIANCE ESTIMATES

   IF(VW.LT.O.) GO TO 7
   IF(VN.LT.O.) GO TO 7

   ADJUST N AND W FOR NEXT ITERATION

   UP = W
   y&jp — YM
   IFCXP(1).EQ.O.) U = W + SWXVW + SNXCOV
   IF(XP(2).EQ.O.) XN = XN + SWXCOV + SNXVN
   IF(W.GT.l-) U = (1. + WP)/2.
   IF(W.LE.O.) W = WP/2.
   IF(XN.LE.O-) XN = XNP/2.

   CHECK FOR CONVERGENCE

   IF(CABSCSW) + ABS(SN)).LT.0.0001)  GO TO  6
CONTINUE
CONTINUE

FORMAK'OSIOX, 'PROCEDURE DID NOT CONVERGE')
CONTINUE
   IFCXPUKEQ.O.) P = (1. -
                                     73
-------
C     COMPUTE LIKELIHOOD, EXPECTED COUNTS
C
      XLK = 0.
      ESUM = 0.
      XNC = XN

      TERM = XNXALOG(W) * ALOG(XN) + ALOGd. - W) - ALOGd.  - UXXXN)
      DO 12 I = 2, KP
         CI = I - 1
         ECI) = TOTKEXPCTERM)
         ESUM = ESUM + Ed)
         XLK = XLK + NF(I)*TERM
         XNC = XNC + 1.
         TERM = TERM + ALOGd. - W) + ALOG(XNC) - ALOGCCI +  1.)
 12   CONTINUE
      E(M> = TOT - ESUM
C
C     WRITE OUT ESTIMATES AND THEIR VARIANCES
C
      IF(XP(1).GT.O.) WRITE (6.60) P
 60   FORMAT('0',10X,'SPECIFIED VALUE OF P =',F12.5)
      IF(XP(1).EQ.O.) WRITE (6,65) P
 65   FORMATC 0',10X,'ESTIMATED VALUE OF P =',F12.5)
      IF(XP(2).GT.O.) WRITE (6,70) XN
 70   FORMATC ',10X,'SPECIFIED VALUE OF N =',F12.5)
      IF(XP(2).EQ.O.) WRITE (6,75) XN
 75   FORMATC ',10X,'ESTIMATED VALUE OF N =',F12.5)
      IF(XP(1).GT.O..AND.XP(2).GT.O.) GO TO 13
      Dll = D11XUXX4 + 2.XSWXWXX3
      D12 = -D12XWXX2
      IF(XP(1).GT.O.) Dll = 1.
      IF(XP(2).GT.O.) D22 = 1.
      IF(XP(1).GT.O..OR.XP(2).GT.O.) D12 = 0.
      DET = D11XD22 - D12XX2
      TEMP = Dll
      Ml = D22/DET
      D22 = TEMP/DET
      D12 = D12/DET
      IF(XPd).GT.O-) Dll = 0.
      IF(XP(2).GT.O.) D22 = 0.
      WRITE (6,80)
 80   FORMATCO'.IOX,'ASYMPTOTIC COVARIANCE MATRIX OF ESTIMATES')
      WRITE (6,85)
 85   FORMAT(tO',21X,'P',12X,'N')
      WRITE (6,90) Dll, D12, D12, D22
 90   FORMATC ',10X,'P',2X,2E13.5/11X,'N',2X,2E13.5)
 13   CONTINUE
C
C     COMPUTE GOODNESS OF FIT
TD
      CALL GDFIT
      WRITE (6,40) XLK
 40   FORMATC0',10X,'LOG-LIKELIHOOD =',F12.4)
      RETURN
 900  CONTINUE
      WRITE (6,905)  XPd), XP(2)
 905  FORMATCOS10X, 'INVALID PARAMETER SPECIFICATION'/18X,2F11.5)
      RETURN
      END
                                      74
-------
C     SUBROUTINE TO FIT A LOGARITHMIC DISTRIBUTION     REVISED  7-SO-an
C        PROGRAMMERS: ANDERSON, HASSELBLAD             REVISED  7  30-80
      SUBROUTINE LOGRTH
      COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT

C     NF IS AN ARRAY OF OBSERVED COUNTS
C     E IS AN ARRAY OF EXPECTED COUNTS
C     XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS

C     H it Ml LENGTEHC§RFRTEHPE0ED5^AY° ™E URGEST °BSERVED FREQUENCY
C     INT IS AN OPTIONAL INPUT INTEGER.
C     NP IS THE NUMBER OF PARAMETERS ESTIMATED

£     IF luJrl IN£Ti™ ES52iJENCY USED AND IS l FOR MOST DISTRIBUTIONS;
C     .,To£N™IF.u= 2 FOR TRUNCATED AND LOGARITHMIC DISTRIBUTIONS.
C     NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C
      WRITE (6,30) TITLE
 30   FORMAT('1',10X,20A4)
      WRITE (6,5)
 5    FORMAT('0',10X,'FIT TO A LOGARITHMIC DISTRIBUTION')
      KP = K + 1
      NP = 1
      IF(XP(1).GT.O)NP = 0
      M = K + 2
      IF = 2
      NT = 0
      NUM = 0
C
C     COMPUTE SAMPLE SIZE AND MEAN
C
      XM1 = 0.
      DO 2 I = 2, KP
         CI = I - 1
         NT = NT + NF(I)
         XM1 = XM1 + NF(I)XCI
 2    CONTINUE
      TOT = NT
      XM1 = XM1/TOT
      WRITE (6,10) NT
 10   FORMAT('0',10X, 'TOTAL SAMPLE SIZE =',I6)
C
C     CHECK FOR IMPROPER PARAMETER VALUE
C
      THETA = XP(1)
      IF(XP(1).GT.O.) GO TO 3
      TEMP = ALOG(XMl)
C
C     COMPUTE INITIAL ESTIMATE

°     THETA = I. - 1./C1. + (C5./3. - TEMP/16. )X(XM1 - 1.) + 2.)*TEMP)
C
C     USE A NEUTON-RAPHSON ITERATION SCHEME
C
      DO 4 I = 1, 50

         THETP = = 1fHETAH-T(XMl + THETA/(TEMPXALOG(TEMP)) )X( ((TEMP*
     1    ALOG(TEMP))*«/(ALOG(TEMP)+ THETA)))
         IF(ABS(THETP-THETA).LE. 0.00001) GO TO 6
         THETA = THETP
 4    CONTINUE

 95   FORMAT (• 6 S10X, 'PROCEDURE DID NOT CONVERGE')
 6    CONTINUE
      THETA = THETP
                     -                   THET. .-.».«
                                     75
-------
C     COMPUTE ESTIMATED VARIANCE OF ESTIMATE
C
      ALT =-ALQGU. - THETA)

      VAR =~THETAXTEMPXX2XALOG(TEMP)/(ALTX(ALOG(TEMP) * THETA))
      IF(XPCl).EQ.O) WRITE (6,45) VAR
 45   FORMATC ',10X,'ASYMP. VAR. OF THETA =',E13.5)
      CUM = 0.
C
C     COMPUTE EXPECTED COUNTS, LOG-LIKELIHOOD
C
      Ed) = 0.
      XL = 0.
      DO 7 I = 2, KP
         CI = 1-1
         TERM = THETAXXd - 1)/(CIXALT)
         Ed) = TERM*TOT
         XL = XL + NFd)XALOG(TERM)
         CUM = CUM + Ed)
 7    CONTINUE
      ECM) = TOT - CUM
C
C     COMPUTE GOODNESS  OF FIT
C
      CALL GDFIT
      WRITE (6,20) XL
 20   FORMATCOMOX,'LOG-LIKELIHOOD  = ',F12.
-------
C     SUBROUTINE TO COMPUTE A CHI-SQUARE GOODNESS OF FIT  TEST
c        PROGRAMMER: HASSELBLAD           OUUHMS OF FIJEJ||ED  10-2o-80

      SUBROUTINE GDFIT
      COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C     THIS ROUTINE CONMPUTES A CHI-SQUARE GOODNESS OF FIT TEST  BASED ON
S     S™°SFRJf?,i f ?u PTIiJE?nFREQUENCIES-  "HE CELLS ARE COLLAPSED
C     FROM THE RIGHT AND LEFT TO GET A MINIMUM EXPECTED FREQUENCY OF
C     EXMIN.
C
C     NF IS AN ARRAY OF OBSERVED COUNTS
C     E IS AN ARRAY OF EXPECTED COUNTS
C     XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C     M IS THE LENGTH OF THE E ARRAY
C     INT IS AN OPTIONAL INPUT INTEGER.
C     NP IS THE NUMBER OF PARAMETERS ESTIMATED
C     IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST  DISTRIBUTIONS;
C        AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C     NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C
C
      DATA EXMIN /5./
C
C     EXMIN IS THE MINIMUM EXPECTED FREQUENCY ALLOWED AFTER  COLLAPSING.
C
      NDF = 1 - NP
      KP = K + 1
      TOT = 0.
      DO 9 I = IF, M
         TOT = TOT + E(I)
 9    CONTINUE
      ESUM = 0.
      NSUM = 0.
      WRITE (6,5)
 5    FORMAT? '0',13X, 'CHI-SQUARE GOODNESS OF FIT TESTV'O'.IOX,
     I  'VALUE  OBSERVED  EXPECTED  CHI-SQUARE V ')
C
C     SUM INITIAL EXPECTED FREQUENCIES UNTIL THEY REACH EXMIN
C
      DO 2 I = IF, M
         J = I
         IT = I - 1
         ESUM = ESUM + Ed)
         NSUM = NSUM + NFd)
         I F( ESUM. GE. EXMIN) GO TO 3
         WRITE (6,10) IT, NFd), Ed)
 10   FORMAT? • ',10X,I3,I9,F12.2,7X,'~')
 2    CONTINUE
      GO TO 900
 3    CONTINUE
      CHIS = (NSUM - ESUM)X*2/ESUM
      TCHS = CHIS
      WRITE (6,15) IT, NF(J), E(J), CHIS
 15   FORMATC ' ,10X,I3, 19,F12.2,F10 .2)
      L = J + 1

C     WRITE OUT INTERMEDIATE COUNTS UNTIL REMAINING EXPECTED FREQUENCIES
C     ARE LESS THAN XMIN.
C
      DO 4 I = L, M
         J = I
         IT = I - 1
         ESUM = ESUM + Ed)
         NSUM = NSUM + NF(I)
         IF(TOT-ESUM.LT. EXMIN) GO TO 6
                       - E(I))**2/E(I)
         TCHS = TCHS + CHIS
         WRITE (6,15) IT, NF(J), E(J), CHIS
      CONTINUE

                                      77
-------
 6    CONTINUE
      ESUM * TOT - ESUM + E(J)
      CHIS = (NTOT - NSUM + NFU) - ESUM)X*2/E5UM
      TCHS = TCHS + CHIS
      IFU.LT.M.OR.KP.EQ.M) WRITE(6,15) IT,  NFU),  E(J), CHIS
      IF(J.EQ.M.AND.KP.LT.M) WRITE (6,25) IT, NFU), E(J), CHIS
 25   FORMATC ',10X,13,'+',18,F12.2,F10.2)
      IF(J.GE.M) GO TO 7
      L = J + 1
C
C     WRITE OUT REMAINING EXPECTED FREQUENCIES AND  COUNTS
C
      DO 8 I = L, M
         J = I
         IT = I - I
         IFU.LT.M.OR.KP.EQ.M) WRITE (6,10)  IT,  NF(I),  Ed)
         IFU.EQ.M.AND.KP.LT.M)  WRITE  (6,30) IT, NF(I),  Ed)
 30      FORMATC ', 10X,I3, • + •, 18, F12.2,7X,' —' )
 8    CONTINUE
 7    CONTINUE
      PVAL = CHISQ(TCHS,NDF)
      WRITE (6,20) TCHS, NDF, PVAL
 20   FORMATC0',10X,'CHI-SQUARE =',F8.2,',  D.F. =',13,', P =',F6-4)
      RETURN
 900  CONTINUE
      WRITE (6,905)
 905  FORMATC0',10X,'INSUFFICIENT EXPECTED  FREQUENCIES')
      RETURN
      END
                                     78
-------
                                    TECHNICAL REPORT DATA
                             (flease read Instructions on the reverse before completing)
 . REPORT NO.
  EPA-600/2-81-010
                                                             3. RECIPIENT'S ACCESSION>NO.
^ TITLE AND SUBTITLE
   Disfit:  A Distribution Fitting System

   1.  Discrete  Distribution
                                                             5. REPORT DATE

                                                                January  1981  Issuing Date.
                                                             6. PERFORMING ORGANIZATION CODE
 7, AUTHORIS)
    Victor  Hasselblad, Andy Stead, Helen  Anderson
                                                             8. PERFORMING ORGANIZATION REPORT NO.
 9. PERFORMING ORGANIZATION NAME AND ADDRESS
    Biometry Division
    Health Effects Research Laboratory
    U.S.  Environmental  Protection Agency
    Research Triangle  Park, NC  27711
                                                             10. PROGRAM ELEMENT NO.
                                                                  1AA817
                                                             11. CONTRACT/GRANT NO.
 12. SPONSORING AGENCY NAME AND ADDRESS
    Office of Research  and Development
    Health Effects Research Laboratory
    U.S.  Environmental  Protection Agency
    Research Triangle  Park. NC   27711
                                                             13. TYPE OF REPORT AND PERIOD COVERED
                                                                User's  Guide
                                                             14. SPONSORING AGENCY CODE
                                                                EPA 600/11
 15. SUPPLEMENTARY NOTES
 16. ABSTRACT
    The DISFIT system is a series  of programs and subroutines to fit distributions
    to data.  This  first volume describes the routines  to  fit discrete distributions.
    The distributions included are the binomial, truncated binomial, mixture of two
    binomials, beta binomial, poisson, truncated Poisson,  mixture of two Poissons,
    negative binomial, truncated negative binomial, and logarithmic.  All parameters
    are estimated using maximum likelihood techniques.   Any of the parameters may be
    specified instead of estimated.   Variances of estimated asymptotic variances of
    the parameter estimates are also given.  Some tests of hypotheses are possible
    using likelihood ratio tests.   This guide contains  the descriptions, calling
    sequences, documentation, and  examples for each distribution.  The program is
    written entirely in FORTRAN, and a listing of the program is in the appendix.
                                 KEY WORDS AND DOCUMENT ANALYSIS
                  DESCRIPTORS
                                                b.lDENTIFIERS/OPEN ENDED TERMS
                                                                           c. COSATI Field/Group
 3. DISTRIBUTION STATEMENT
      )

    RELEASE TO PUBLIC
                                               19. SECURITY CLASS (ThisReport)
                                                  UNCLASSIFIED
85
                                               20. SECURITY CLASS (Thispage)
                                                  UNCLASSIFIED
                                                                           22. PRICE
EPA Form 2220-1 (9-73)
                                                                     * US. GOVERNMENT PRINTING OFFICE; 1981 -757-064/0236
-------