United States Health Effects Research EPA-600 2-81 -010
Environmental Protection Laboratory January 1981
Agency Research Triangle Park NC 27711
Research and Development
&ERA Disfit: A Distribution
Fitting System
Distributions
-------
EPA 600/2-81-010
January 1981
DISFIT: A DISTRIBUTION FITTING SYSTEM
1. DISCRETE DISTRIBUTIONS
A User's Guide
Victor Hasselblad
Andrew G. Stead
Helen S. Anderson
BIOMETRY DIVISION
HEALTH EFFECTS RESEARCH LABORATORY
U. S. ENVIRONMENTAL PROTECTION AGENCY
RESEARCH TRIANGLE PARK, NORTH CAROLINA 27711
-------
DISCLAIMER
This report has been reviewed by the Health Effects Research Laboratory,
U.S. Environmental Protection Agency, and approved for publication. Approval
does not signify that the contents necessarily reflect the views and policies
of the U.S. Environmental Protection Agency, nor does mention of trade names
or commercial products constitute^endorsement or recommendation for use.
ii
-------
CONTENTS
1. Introduction 1
2. Methods of Estimation 3
3. Use of the Program 4
4. Constant Distribution 5
5. Binomial Distribution 8
6. Truncated Binomial Distribution 11
7. Mixture of Two Binomial Distributions 14
8. Beta-Binomial Distribution 17
9. Poisson Distribution 21
10. Truncated Poisson Distribution 24
11. Mixture of Two Poisson Distributions 27
12. Negative Binomial Distribution 30
13. Truncated Negative Binomial Distribution 34
14. Logarithmic Distribution 37
15. References 40
Appendix A. Index 43
Appendix B. Flow Chart of the Program 45
Appendix C. Error Messages 46
Appendix D. Fortran Program Listing 47
999
m
-------
ABSTRACT
The DISFIT system is a series of programs and subroutines to fit
distributions to data. This first volume describes the routines to fit
discrete distributions. The distributions included are the binomial,
truncated binomial, mixture of two binomials, beta binomial, Poisson,
truncated Poisson, mixture of two Poissons, negative binomial, truncated
negative binomial, and logarithmic. All parameters are estimated using
maximum likelihood techniques. Any of the parameters may be specified
instead of estimated. Variances or estimated asymptotic variances of
the parameter estimates are also given. Some tests of hypotheses are
possible using likelihood ratio tests. This guide contains the
descriptions, calling sequences, documentation, and examples for each
distribution. The program is written entirely in FORTRAN, and a listing
of the program is in the appendix.
IV
-------
ACKNOWLEDGMENT
The authors would like to express their appreciation to Dave Mage and
Gene Lowrimore for their helpful suggestions. We are especially grateful
to Barbara Crabtree for her typing and editing of this guide.
-------
1. INTRODUCTION
Although distribution fitting methods abound in the statistical
literature, very few of these methods are found in the major statistical
packages. In particular, SPSS [27], BMD-P [10] and SAS [3] only give some
overall tests for normality. There are at least three specialized distribu-
tion fitting packages. Bouver and Bargmann [6] wrote a system to fit the
Pearson system of curves by the method of moments. The system of All dredge
and Bolstad [1] fits the normal, lognormal, exponential, gamma, and uniform
distributions using maximum likelihood for some distributions, and moment
estimators for others. Morgan, Nelson, and Caporal [25] have written a
system that includes the normal, lognormal, Weibul and binomial. This
system has the advantage that linear models can be fitted to the shape
parameter, and maximum likelihood methods are used. This feature limits
the kinds of distributions which are fitted, however. None of these
systems include a wide variety of distributions commonly appearing in the
literature.
The examples chosen for this distribution fitting package include some
of the classic ones, as well as those pertinent to the environmental field.
The analysis of both environmental monitoring and environmental health data
require distribution fitting programs. For example, ambient air standards
are often based on the distribution of the "second maximum". Health surveys
often yield counts, and the distributional form of these counts may be
important for hypothesis testing.
This first report will be restricted to discrete distributions. Much
of the information on the distributions is found in "Discrete Distributions"
by Johnson and Kotz [19]. This reference is extremely comprehensive, so
only a small portion of its information is contained in this report. In a
manner similar to Johnson and Kotz [19], each section of this report discusses
a particular distribution. Included are definitions of the distribution,
estimating equations, goodness-of-fit tests, and actual examples. In general,
parameter estimates and their variances are given, but tests of hypotheses
are not usually given. Tests of hypotheses often can be made from the
likelihood functions provided. The appendix of the paper contains the
FORTRAN routines to perform the estimation and distribution fitting.
A "main" program is also included to read in the data, and to enable the
user to fit several different distributions to the same data.
The routines are written so that each parameter may be specified or
estimated. In some cases, specifying a parameter value may give a specific
distribution. For example, if N is set to 1, the negative binomial
distribution becomes the geometric distribution. There may be other cases
-------
where one of the parameters is known, and the other(s) need to be estimated.
We welcome suggestions and corrections to this report and its programs.
Two additional reports are planned. They will contain continuous
distributions and grouped continuous distributions respectively.
-------
2. METHODS OF ESTIMATION
Estimates in this paper are calculated by maximum likelihood. Any
estimates calculated by other means will be so indicated. There are, of
course, several reasons for using maximum likelihood estimates. The
estimates have several optimal properties including asymptotic consistency
and efficiency (Kendall and Stuart [20]). The comparison of the likelihood
function for different distributions can give some measure of goodness-of-fit.
In some specific cases, it is possible to test the fit of one distribution
versus another using likelihood ratio tests.
Maximum likelihood estimates also have certain problems. Some of these
are caused by small sample sizes, although distribution fitting usually
requires moderate to large sample sizes. The likelihood function for
mixtures of continuous distributions can have singularities corresponding
to each data point, as shown by Day [8], The discrete distributions
considered in this paper do not have this particular problem. Finally, the
likelihood function may have more than one local maximum. This can occur
even with discrete distributions, and we can only hope that the initial
estimates lead to meaningful solutions in each case.
Distributions such as the gamma or beta, or functions of them such as
the negative binomial, require iterative methods to obtain maximum likelihood
estimates. The modified Qauss-Newton method is easy to compute, and has
the advantage of being a second order method. A description of the method
was given by Hartley [15], In other cases the standard Newton-Raphson
method was used.
Maximum likelihood estimation is especially useful for problems of
"missing information". Missing information can take the form of censored
data, truncated distributions, grouping, and mixing of distributions.
All of these problems can be handled by using the "missing information
principle" (MIP) of Orchard and Noodbury [28]. This method will converge
under a wide variety of conditions, and the convergence is geometric as
has been shown by Dempster, Laird, and Rubin [9]. We have also used this
method in our routines.
-------
3. USE OF THE PROGRAM
The program to fit various discrete distributions was written entirely
in FORTRAN 77 on a Univac 1100 computer. It was written so that conversion
to other FORTRAN compilers should be relatively simple. The program
consists of a main program to read in the data, and a series of subroutines
one for each different distribution. The exact calling sequence for the
main program will depend on the job control language of the computer used.
The main program reads in the following cards:
Variable
TITLE
K
FMT
NF(I)
Discription
80 character title
Value of largest count
Variable format statement using integer mode only
K counts, NF(I),I=1,K+1, are read in, where NF(1) = no. of
O's, NF(2) = no. of Ts, NF(3) = no. of 2's,...,and
NF(K-H) = no. of values = K.
Format
20A4
15
20A4
FMT
These counts, NF(I) will be denoted by fj , in the formulas in the text,
so that f-j denotes the number of occurrences of the value "i". For an
explanation of a variable format statement, see any FORTRAN manual, or
the SPSS [27] manual.
This information is followed by a card for each distribution to be
fitted. The description of each card is given in the chapter corresponding
to the distribution requested.
The following example shows the cards just described.
A blank card following the distribution cards finishes the analyses of
the data set. The program will then attempt to read in additional data sets.
An end-of-file card will terminate the run.
-------
4. CONSTANT DISTRIBUTION
4.1 Definition
The "constant" distribution routine is really a summary routine. It
computes the first four sample moments:
K K
Mi = L, 1 f-:/n» where n = £f,» and (4.1)
1 i=0 n i=0 1
K
M-. = E (i-M1)Jf,/n, j = 2,3,4 (4.2)
J i=0 ' 1
The routine also prints the frequency and cumulative frequency distribution.
It is also possible to specify the expected frequencies through
constants (thus the name of the routine), C.'s, i = 0,1,...,K. These
constants are normalized so that the sum of the C^'s = n. Thus the C^'s
can be expected counts, probabilities, or just proportional numbers.
If the Cj's are not specified, they are assumed to be equal. This
particular case is commonly called the discrete uniform or discrete
rectangular distribution.
4.2 Calling Sequence
The constant routine is called by
Col. 8 i 151 25 - 351 451 551
(CONSTANT! | l l l
If no constants are specified, then each C^ is set to n/(K+l). If more
than four constants are needed (K > 3), they are read in on additional
cards using the format (8F10.0). NP is the number of parameters estimated
from the data itself, and is often left blank.
-------
4.3 Tests for Goodness-of-Fit
A standard chi -square goodness-of-fit test is computed for the
observed frequencies:
X2 • .Etf, - C//C, (4.3)
which has K-NP degrees of freedom. The cells are collapsed from the left
and right until the end cells have an expected value of 5. The degrees of
freedom will be adjusted for collapsing. The log-likelihood function is
also computed:
K
L = .EfjlogfCj/n) (4.4)
4.4 Example
De Winton and Bateson collected counts of Primula in a cross breeding
experiment. The results are given by Fisher [13]. The expectations
involvingjtwo independent Mendel ian factors would produce counts in the
ratio of 9:3:3:1. Figure 4.1 shows that the -fit using this ratio is not
good. If the first two counts came from a more viable plant, then the
ratios for the first two cells still would be 3:1 as would be the ratio for
the last two. The relative ratio of the first two cells to the last two
cells could be estimated from the data, giving the expected counts in
Figure 4.2. Since one parameter is being estimated from the data, this
is indicated on the control card corresponding to Figure 4.2. The result
is a satisfactory fit. A likelihood ratio test for the difference in fit
can be computed as -2 (-61 3. 34 + 608.82) = 9.04. This is asymptotically
distributed as a chi-square with one degree of freedom, which corresponds
to a p-value of .0026.
Input for Primula data:
MU T* 55 -' , ••
• ' ' 3.
-------
Outputs:
DATA
DI KIMTQN AND 8ATESON
-:|;;.v»rf-;^at :^y.4; #."•» .;^.,^-%^
Figure 4.1
Figure 4.2
-------
5. BINOMIAL DISTRIBUTION
5.1 History and Definition
The binomial distribution was derived by James Bernoulli in 1713. The
variable, x, has the possible values of 0, 1, 2, ... , N, where the
probability of x is given by
(5.1)
subject to q = 1 - p and 0 < p < 1. The number N will be assumed to be
constant and known. It is assumed that the distribution is sampled n
times, so that the outcome i occurs f-j times. Thus the fj's sum to n.
In some data, the value N varies for each sample. We have not allowed for
this possibility since it would require a very different data structure.
The estimation equations for this and related distributions are easily
generalized from the equations given.
5.2 Calling Sequence
The binomial distribution routine is called by
I Col. 8| 1SJ 251
BINOMIAL! | |
where N is an integer, and P is the value of p to be fitted. If N
is not specified, it will be set to K, the value of the largest count. If
p is not specified, it will be estimated from the data.
5.3 Maximum Likelihood Estimate
The maximum likelihood estimate for p is
~ K
P = £ i V(nN) (5.2)
1-0 1
The estimate is also the minimum variance unbiased estimate. The variance of
p is estimated by
Var(p) = p(l-p)/(nN) (5.3)
8
-------
5.4 Test for Goodness-of-Fit
A standard goodness-of-fit test is computed for the observed
frequencies
X2 = L(fn- - E.)2/E., where E. = nPr[x=i] (5.4)
i=0 ' '
and the cells are collapsed from the left and right until the end cells
each have an expected value of 5. The degrees of freedom will be the number
of cells after collapsing minus two. If p is specified, then only one will
be subtracted.
A second goodness-of-fit test is commonly known as the homogeneity test.
It is a test of the variance of the sample, and is discussed by Potthoff and
Whittinghill [30].
I/
X2 = (Ef,2)/(p(l-p)) - Nnp/O-p). (5.5)
1=0 n
The statistic is distributed approximately as a x2 with n-1 degrees of
freedom if p is estimated, or n degrees of freedom if p is specified.
5.5 Example
Fisher [13] also gives counts from a dice experiment of Weldon. The
counts represent the number of 5's and 6's occuring in the toss of 12 dice.
The distribution of this number should be given by a binomial distribution
where p = 1/3. In fact, using a p of 1/3 gives a very poor fit (Figure
5.1). If instead p is estimated from the data, the estimated p of .3377
gives a very adequate fit. A likelihood ratio test of p = 1/3 gives an
asymptotic chi-square of -2(-50255.16 + 50241.65) = 27.02, which corresponds
to a p-value of less than .0001.
Input for Weldon's dice data:
-------
Outputs :
TOTAL SAWWJE
" •""•
I*'*
HCWG6iNmr TEST * 3»£y&.*fc» f * .;
•• , ^ ,
t^C^^GKdJi^C Attg|M^^ft ''filp' ^¥T T£ST
* •• N /*.••.
VAISJE WSEKVSO IXPfCTiO
« ''
M14
, -«194
19
11
12:
44t
.,,v, 14
*, ,
Figure 5.1
*>««-
Fit W A
*
TOTAL
* *
'"••$
, &
OATA
Figure 5.2
V s'
12
" - > 4 s
"" '3^8'* ft af*"
10
-------
6. TRUNCATED BINOMIAL DISTRIBUTION
6.1 History and Definition
The truncated binomial distribution occurs when sampling from the
binomial distribution where the outcome of "0" cannot be observed:
PH>j] = 0) PJqN-j/0-qN) (6.1)
subject to q = 1 - p, 0 < p < 1, and j > 0. The distribution was expressed
in this form by Finney [11] in 1949. The distribution with just the zeros
truncated has also been called the positive binomial distribution, since the
truncation could include more than the zeros, or could occur on the right.
As with the binomial distribution, we assume that the distribution (6.1) is
sampled n times, so that the outcome i occurs f. times. Thus the f.'s
sum to n.
6.2 Calling Sequence
The truncated binomial distribution routine is called by
.Col. 8] 15, 25|
Col. 81 15. 25|
ITBINOMAJ | |
where N is an integer, and P is the value of p to be fitted. If N
is not specified, it will be set to K, the value of the largest count. If
p is not specified it will be estimated from the data. Note that the main
program requires that the number of "0" counts be read in, even though it
is ignored by this routine.
6.3 Maximum Likelihood Estimates
The estimate of p is calculated using the MIP (or EM algorithm) of
Dempster, Laird, and Rubin^ [9]. The initial estimate comes from Mantel
[22] in 1950.
p = (x-f1/n)/(N-f1/n) (6.2)
N
where x =
The MIP iteration scheme is then computed by estimating the "missing" cell,
fo
11
-------
f0 = (q)Nn/(l-qN) (6.3)
»^ /\
where q = 1 - p. The parameter, p, is then reestimated
8 A -N
P = E ifjO-q )/(nN). (6.4)
Equations 6.3 and 6.4 define the iteration scheme. The iterations are
continued until the relative change in p is less than 10"
The asymptotic variance of p is estimated by
Var(p) - pqd-qCNnd-q-Nw-)] (6.5)
which comes from the second partial derivatives of the likelihood function.
6.4 Test for Goodness-of-Fit
A standard goodness of fit test is computed for the observed
frequencies:
X2 = E (f,- - E.)2/E.9 where E, = nPr[x=i] (6.6)
i=l 11 i i
and the cells are collapsed from the left and right until the end cells
each have an expected value of 5. The degrees of freedom will be the number
of cells after collapsing minus two. If P is specified, then only one will
be subtracted.
6.5 An Example
Finney and Varley [12] investigated the distributions of gall-fly
eggs and gall-eelIs in the production of flowerheads of black knapweed in
1935 and 1936. This example is also discussed in section 10.5. Since the
maximum number of gall-fly eggs per flowerhead is about 15, and since no
flowerheads are observable with no eggs, a truncated binomial with N = 15 was
fitted to the data. The fit is not as good as the fit of the truncated
Poisson distribution (see section 10.5).
12
-------
Input for Varley's data:
»t> (wwti-rtv tea* c§««n No* 'ifis
a* it % »'' ti * s . 5 t t- ;^i t t
Output:
VftRUT'S BALL-rtT C6& O3UHT9
MT fO A tWO«AHO eiHtSUAL
TQTAt S*BPtS SIZE « 14«* « * IS
£STIfttT£D VAUK Of P « .
CHI-SQUARE GOODNESS Of PIT TOST
I 29 .. 28,88
f. SI 3?»4§- .*t
$ 3»^ 18.*8 4«
4 m 27.6«
9 » I't.SK
ft , $ 5.83
7 « 1.S8
ft e .45
11
13*
Figure 6.1
13
-------
7. MIXTURE OF TWO BINOMIAL DISTRIBUTIONS
7.1 History and Definition
If samples are taken randomly from two different binomial distributions
with a common known N , then the result is a mixture of two binomial
distributions. If the probability of sampling the first binomial distribution
(with parameter px) is a, the resulting distribution is
subject to qi = 1 - pi and q2 = 1 - Pz« (7.1)
This distribution was described by Rider [32] and moment estimators were
given. The estimation of parameters from mixtures of distributions is
difficult by any method, and usually requires large sample sizes.
7.2 Calling Sequence
The binomial mixture routine is called by
Col. 81 15 I 25 I 35 I 45
|Col. 81 15 I 25 I 35 I
IBINOMIX | | | |
where N is an integer, and PI, P2, and A are the values of pls p2, and a to
be fitted. If N is not specified it will be set to the value of the largest
frequency, K. If any of the other parameters are not specified, they will
be estimated from the data.
7.3 Maximum Likelihood Estimates
Arbitrary initial estimates are first calculated:
a = 1/2, pi = (3/4)p, p2 = (4/3)p! (7.2)
where p is given by equation (5.2). The initial estimates are constrained
to the interval [.05, .95]. The (MIP or EM) iteration scheme has been given
by Hasselblad [17].
14
-------
K
Pi = E agi(i)f.i/[(ag1(i)+(l-a)g2(i))Nn] (7.3)
i=0 1
x K
f\ > f\ *s s\ y\ y\ /V
.=Q a 2 1 a g2 i n
and
x K
o = E ogid)V[(agidHl-a)g2d))Nn] (7.5)
1=0 1
s\ y\
where gi(i) and g2(i) are in equation (7.1), with the parameters replaced
by their estimates. The estimated asymptotic covariance matrix is calculated
from the second partial derivatives of the likelihood function, which are
given by Hasselblad [17].
7.4 Test for Goodness-of-Fit
A standard goodness-of-fit test is computed for the observed
frequencies:
I/
X2 = E (f< - E.)2/E., where E. = nPr[x=i] (7.6)
1=0 n n n
and the cells are collapsed from the left and right until the end cells
each have an expected value of 5. The degrees of freedom will be the number
of cells after collapsing minus four. If any of the parameters are
specified, the degrees of freedom will be adjusted.
7.5 An Example
Sokal and Rohlf [36] give the observed frequencies of the number of
males in 6115 sibships of size 12. The data are from Saxony from 1876 to
1885. A mixture of two binomial distributions gives a reasonable fit
(p=.0898) to the data, whereas a single binomial gives a very poor fit
(X2 = 105.79, D.F. = 9,, p<.0001). A likelihood ratio test of the difference
between the two distributions gives an asymptotic X2 of -2(-12534.17 +
12492.54) = 83.26 for 2 degrees of freedom (p<;.001).
Input for Sokal and Rohlf's data:
15
-------
Output:
SOKAL AND ROHLF'S SEX DATA FROH SAXONY
FIT TO A MIXTURE OF TWO BSNQtttM.
PtXJ * mtALPHA(Pl**X)U-PlJ**t>'HU
Cl-ALPHAK P2**XM 1-
TOTAL SAMPLE SIZE * 6115 > H * IS
ESTiH*T8» VALUE §f l»l * .4756?
ESTIMATES VALUE OF 9t *
ESTSHftTHS VAtWE OF ALTO* *
ASYMPTOTIC COVARIAHCE HATBIX OF ESTIMATES
PI P2
PI ,«*851-00J ,42582-083
Pg .*&&~$W .6S190-003 .44*59-802
ALPM
pflSD«ESS Of FIT TEST
VALUE OBSERVED EXPECTfB CHI-S<3UARE
6 3 1,77
1 24 19.51 1,55
3 es& 31»l24 lla*
4 &?& 667.14 t ,«
5- 3-'Q *£3 tt^S^ •*• 7^- * 50
6 3.343 1»S6.34 S.79
, 7- 11M 1169.00 «.74
8 829 943.86 .26
* 478 «62..Sa .SI
10 181 141.69 .00
11 4S 4S,59 .00
13.70>
Figure 7.1
16
-------
8. THE BETA-BINOMIAL DISTRIBUTION
8.1 History and Definition
If the sampling occurs from a binomial distribution where the parameter
itself is the outcome of a beta distribution, then the resulting distribution
is a beta-binomial distribution. This distribution was described by Skellam
[35] and can be written
=j] - (J)
Pr[x=j] - a , J-0.1.2.....I (8.1)
where B(a,g) is the beta function.
Equivalently
j! KJT '
where C - j . (8.2)
8.2 Calling Sequence
The beta-binomial distribution routine is called by
Col. 8 151 25 35
I Col. 8|
BETBINOM)
N is the parameter in (8.1), read in as an integer, and if not specified
is set to K, the value of the largest count. A and B are the values of a
and 3 to be fitted. If either or both are not given, they are estimated
from the data.
8.3 Maximum Likelihood Estimates
The estimates of a and 6 are calculated using a modified Gauss-
Newton method. The initial estimates are computed from the first two
factorial moments, RI and R2:
17
-------
N
RI = E 1Vn (.8.3)
i=0 ]
N
R2 = E i(i-Df1-/(nRi). (8.4)
i=l 1
The initial estimates are
a = (RiR2-(N-l)Ri)/((N-l)R1-NR2) (8.5)
3 = a(N/R!-l). (8.6)
The Gauss-Newton iteration scheme requires the elements of the first
partial derivatives:
dlj = *(a+3)-*(a)-hp(a+1) (8.7)
) (8.8)
where if/ is the derivative of the log-gamma function (digamma function) and
a and e are evaluated using their previous estimates. The first partial
derivatives are
N
DI = E dlM> J=1»2 . (8.9)
J i=Q Ji
The matrix of cross products is
The iteration scheme is
(8.11)
The estimates are constrained to be positive. If equation (8.11) does not
improve the likelihood, the correction is multiplied by 1/2 until the
likelihood is improved. The iteration scheme is continued until the change
in a and 3 is less than .00001.
18
-------
8.4 Test for Goodness-of-Fit
A standard goodness-of-fit test is computed for the observed
frequencies:
X2-
K
'i " Ei)2/Ei' where Ei = nPr[x=i]
(8.12)
and the cells are collapsed from the left and right until the end cells
each have an expected value of 5. The degrees of freedom will be the
number of cells after collapsing minus three. If a or e are specified
then one degree of freedom will be added for each parameter specified.
8.5 Examples
Haseman and Scares [16] fit a beta-binomial distribution to fetal
mortality of mice. The N of the distribution is the number of implants.
Some 524 CF1S mice were tested, where N ranged from 1 to 20. Figures
8.1 and 8.2 give the fits for N's of 12 and 13, both of which give adequate
fits as measured by a chi-square test.
Input for Haseman and Scares data:
m^i^Kiiiiji,
/litiiicii ::;,;li-ii:;
•;::;.:. ;'•: ,'•••:
;M
II
lllllli
Itllll
11
19
-------
Outputs:
HASI-HAN AW SHARES SAT* K» N - 12
FIT TO A 6ETA~8JB*M«At. 0ISTRI6UTION
TOTAL SAHPtE SHE » a*» K * J*
ESTIHATBO VALUE OF AM1** * .73841
6STSHATE8 ¥AU« Of BET* * $,6626*
COVA8IANCE HATRIX Of ESTIf«T«S
.t322g+«00 ,11538*001
CHI-SQU*RE GOODNESS OF rrr TEST
VALUE OBSIRV60 CXPECTB0 CHI-SWA
0 S3 SB. 31 ,15
S3
24
W
4
ft
0
I
LOS-LlKttlHOOO
Figure 8.1
4
S
6
7
19
U
,57
2.57
1*47
133.2663
AHB SOARI3 C*tA Fd» N * 13
FIT TO A BETA-HBXNQraAI. BISTRI8UTIOH
TOTAL SAMPLE SIZE » 92» H
VALOE OF ALPHA »
E5T3HATEO VALUE Of BETA
10,53389
ASYMPTOTIC
ALPHA
ALPHA
SETA
MATRIX OF ESTIHATES
BETA
,15081+001
O«-SWA»C 6000NESS Of FIT TEST
X
f
?
8
39
4
2
ft
9
3».9>7
13.60
7.S4
.36
.26
2, P = ,6140
Figure 8.2
20
-------
9. THE POISSON DISTRIBUTION
9.1 History and Definition
In 1837 Poisson derived the distribution which has since been given his
name. The Poisson distribution takes on the values 0, 1, 2,....where
Pr[x=j] = e~V/j! for x>0. (9.1)
It is assumed that the distribution is sampled n times so that the outcome
i occurs f. times. Thus the f.'s sum to n.
9.2 Calling Sequence
The Poisson distribution routine is called by
I Col. 81 25 I
POISSON I |
where L (optional) is a specified value of x to be fitted and tested for
goodness-of-fit. If not specified, x is estimated from the data and then
fitted.
9.3 Maximum Likelihood Estimate
The maximum likelihood estimate for x is
X = £ if./n (9.2)
i=0 1
This estimate is also the minimum,variance unbiased estimate. The variance
of this estimate is estimated by x/n.
9.4 Tests for Goodness-of-Fit
A standard goodness-of-fit test is computed for the observed
frequencies:
X2 = £ (V^WEj. (9.3)
21
-------
where EI = n Pr[x=i], and the value of x is taken to be x unless specified
by L. The cells are collapsed from the left and right until the end cells
each have an expected value of 5. The degrees of freedom will be the
number of cells after collapsing minus two.
A second goodness-of-fit test is commonly known as the homogeneity
test. It is a test of the variance of the sample, and is discussed by
Potthoff and Whittinghill [31]. If X is estimated, then
K
X2 = £ f-i2/x - nx (9.4)
i=0 1
is approximately x2 with n-1 degrees of freedom. If X is specified, the
quantity
K K
X2 = £ f,i2/x - 2 £ f,1/n + nx (9.5)
i=0 n i=0 n
is approximately x2 with n degrees of freedom.
9.5 An Example
Mutagenicity testing using the Ames test [2] has recently received a
great deal of interest. The test consists of counts of revertants per plate
for each dose of a given compound. The following example is the result of
196 tests at a zero dose (controls) for the strain TA1537 without metabolic
activation. The plates were all made on the same day. After a 48-hour
incubation period revertants per plate were counted manually. The fit is
adequate as measured by both the variance test and the standard chi-square
goodness-of-fit test.
Input for TA1537 Ames test data:
COUNTS FOR STRAIN TA15
1 3 4 6 18 20 23 19 18 22 14 19 11 f 4 I 3 $
22
-------
Output:
REVERTANT COUNTS FOR STRAIN TA1537
FIT TO A PQISSON DISTRIBUTION
PfXI * EXP
TOTAL SAHFti $121 *
tSTJHAT£» VALUi OF IAHB8A * 10^082
VARIANCE OF LAHBDA * .55614-061
VARIANCE TEST * 2«5»8«, P » .
-S«M«E «30UHESS OF FIT TEST
OBSERVES EXPECTED CHI-SWARE
X 0 ,«4
2 6 .81
5 1 .7®
4 3 2.18
* 4 4.62 .61
* * «.
-------
10. TRUNCATED POISSON DISTRIBUTION
10.1 History and Definition
The truncated Poisson distribution occurs when sampling from a
Poisson distribution where the "0" outcome cannot be observed:
-f
Pr[x=j] = e"V/(j!(l-e'A)), j = 1,2,3,... (10.1)
The distribution was described as early as 1926 by McKendrick [23]. The
distribution has also been referred to as the positive Poisson distribution.
It is assumed that the distribution is sampled n times and that the
outcome i occurs f. times. Thus the f.'s (i = 1, 2, 3,...) sum to n.
10.2 Calling Sequence
The truncated Poisson distribution is called by
I Col. 81 2Bj
TPOISSONl |
where L (optional) is a specified value of x to be fitted and tested for
goodness-of-fit. If not specified, x is estimated from the data, and then
fitted. Note that the main program requires that the number of "0" counts
be read in, even though it is ignored by this routine.
10.3 Maximum Likelihood Estimates
The estimate of x is calculated using the MIP (or EM alogirthm).
The missing information in this case is f0 (the number of O's). The initial
estimate was provided by Irwin [18] who ascribed it to McKendrick [23].
f0 = (.E i fi)2/(EKl-l)f1) - n (10.2)
The initial estimate of x then becomes
(10.3)
24
-------
The iteration scheme is to reestimate f
o
(10.4)
where x is the arithmetic mean of the observed counts.
10.4 Test for Goodness-of-Fit
°f f1t test is ""P"** *>r the observed
I2/E., where E. = nPr[x=i] (10>6)
10.5 An Example
eaas and^n^^i^-^J121 investi9ated the distributions of gall-fly
?935 and llll ill™, *!! Produc";n of flowerheads of black knapweedin
™
iSTB? s.^ffhos-^o ss riaittTA
each year is -2(.-443.1378 + 276.5876 + 166.5481) = .004. This 5a ue
is approxnmately distributed as a chi-square with one degree of freedom
indicating that there was no difference in the two years? reea01",
Input for Varley data:
»'«-•
25
-------
Outputs:
¥A8LE1f'S SJM.««-F».y E6G COUNTS FRON 1955
PIT TO A tRUNCATEO POISSOH DISTRIBUTION
TOTAL SAMPLE SIZE « 1*8
¥ARLEY*S 6ALL~FLf 186 C0UBTS f»OH 1*36
f IT TO A TSUNCATEO POISSON WSTjnSUTION
SAWUE
f STOOlf 0 VALUE OF LATOOA
ASWP. VAR, DP UrtBOA s
CHI-SQUARE 6000NE5S OF
VALUE
1
2
3
4
5
1>
g
9
10
11
lH
13*
OBSERVED
29
38
36
£3
8
f
2
1
0
0
j_
0
gxwEcreo
86, 00
36.97
35.06
24.93
14*18
6.73
2.73
.31
.69
.02
.01
.00
a> e.8446
.2196«-001
FJT TtST
CHI-S»!*, *
UJG-LlXEtlHOO) * -3,64*5431
Figure 10.2
26
-------
11. MIXTURE OF TWO POISSON DISTRIBUTIONS
11.1 History and Definition
If samples are taken randomly from two different Poisson distributions
the result is a mixture of two Poisson distributions. If the
probability of sampling the first Poisson distribution (with parameter
Xi) is a, the resulting distribution is
Pr[x=j] = a e'XlX!j/jl + (1-a) e"X2x2/j! (11.1)
= agi(j) + (l-a)g2(j)
This distribution was described by Schilling [34] and moment estimators were
given. The estimation of parameters from mixtures of distributions is
difficult by any method, and usually requires large sample sizes.
11.2 Calling Sequence
The Poisson mixture routine is called by
• Col. 8. 251 351 45
I Col. 8. 251 351
POISSMIXI I
where L1} L2 and A are the values of Als X2, and a to be fitted. If any
of the parameters are not specified, they will be estimated from the data.
11.3 Maximum Likelihood Estimates
Arbitrary initial estimates are first calculated:
x\ = (3/4)x and A2 = (4/3)x (11.2)
where x is given by equation (9.2). The (MIP or EM) iteration scheme has
been given by Hasselblad [17].
27
-------
••> K
AI = E «gi(1)f4l/[(agi(1)+(l-i)52(1))n] (11-3)
i=0 1
K
A2 = E (l-a)g2(i)f,i/[(ag1(i)+(l-a)g2(1))n] (11.4)
i=0 ]
and
~ I/
«B E agi(1)f4/[(aJi(1)+(l-a)52(1))n] 01.5)
1=0 n
where gi(i) and g2(i) are in equation (11.1), with the parameters replaced
by their estimates. The estimated asymptotic covariance matrix is calculated
from the second partial derivatives of the likelihood function.
11.4 Test for Goodness-of-Fit
A standard goodness-of-fit test is computed for the observed
frequencies:
o K
X = E (f- - E.)2/E., where E. = nPr[x=i] (11.6)
i=0 I!1 n
and the cells are collapsed from the left and right until the end cells
each have an expected value of 5. The degrees of freedom will be the
number of cells after collapsing minus four. If any of the parameters are
specified, the degrees of freedom will be adjusted.
11 .5 An Example
Additional Ames test data similar to that in section 9.5 was collected
by pooling the controls from several experiments. This data was fitted
to a mixture of two Poisson distributions, since the variance was too large
to fit a single Poisson distribution. Although it is hard to justify this
biologically, the mixture does provide an adequate fit as measured by the
standard chi-square goodness-of-fit test. The results are in Figure 11.1.
Input for TA1537 Ames test data:
€0(iNT» S
1- * » 21 %*- ££ 2$ 22 2* 28 * 17 U 11 4 5 3 4 9 0 I 9 2 0
$ I I
FOISSHIX
28
-------
Output:
COUHTS OF »EVEST*KTSJ TA1S37-
f IT TO A MIXTURE OF TWQ POISSON DISTRIBUTIONS
PCX! *
i l-*U>HA >EXPt -LAMBDA2 )X**LAN&B*2 I/XT
TOTAL SAHPLf SIZE = 236
*AU»E OF U*SBAi * 6,49579
ESTJHATED VALUE Qf UMBDA2 * 1 5,
EST331ATE8 VA106 Of ALPHA -
ASYMPTOTIC COVARIANCE MATRIX OF ESTIMATES
LAW80AI Ut«OA£ ALPHA
LAHBflAi ,13070*000 .18212+000 .17625-001
LA«BOA2 -1S2124&00 ,6«U*000
CHI-SWARE ©OOTNESS OF FIT TEST
' ~674.6d29
Figure 11.1
D 9 ,85
1 I 1.60
« * 5.*9 .06
3 13 11,27 .26
4 « 16.59 ,37
S 2* Z<*-12 .00
6 £2 26.61 .60
7 83 «5.6* »27
8 22 28*49 .01
4 *9 ?8,49 .12
10 20 1&.09 1.60
11 6 12. 6t 3.-V9
i& 1? 10,*7 3.32
13 11 9.70 .!&
1* 11 S,S8 .7*
1^ 4 7,22 l.<*4
16 S 5.B7 .13
17 S *,S4 .52
1» * 3,3<> .13
19 & 2.33 .22
8« 0 1.S4
21 1 ,9«
26 9 ,05
27 1 ,03
8* 1 *fll
2** 0 .01
29
-------
12. NEGATIVE BINOMIAL DISTRIBUTION
12.1 History and Definition
The negative binomial distribution was discussed by Pascal [99] and
Fermat. The earliest known publication was by Montmort [24] in 1714.
An excellent review of the distribution was given by Bartko [4]. The
distribution can be written in several forms, but we will use the following
one
Pr[x-J]« P(Hp)' J-0,1.2.... (12.1)
where p>0 and N>0. For the special case where N is an integer, the
distribution is sometimes known as the Pascal distribution. Unlike the
binomial distribution, both N and p are parameters to be estimated.
For the case of N = 1 , the distribution is known as the geometric
distribution. This may be fitted by specifying N = 1 on the control card.
12.2 Calling Sequence
The negative binomial distribution routine is called by
.Col. 8. 25 35
.o. . i
INEGBINOMI | |
where P and N are the (optional) values of the parameters p and N
to be fitted. If either or both are left blank, the parameters will be
estimated from the data.
12.3 Maximum Likelihood Estimates
Initial estimates can be obtained by the method of moments:
p = s2/x - 1 (12.2)
N = P/(s2-x2) (12.3)
30
-------
where x is the sample mean and s2 is the sample variance. Maximum
likelihood estimates can be obtained by rearranging the first partial
derivatives of the log-likelihood function,
log(H-x/N)
- £ S. = 0, (12.4)
A - i
where S. = £ (N+j-1)" f • .
1 j=i J
This equation involving just N can be solved with a Newton-Raphson scheme.
Denote the left hand side of (12.2) by F; then
ESI (12.5)
where S] = L (N+j-1 T2^.
J ~ *
The iteration scheme is
N = N -F/llj- . (12.6)
This procedure is continued until the change in N is less than .00001.
If this does not occur within 100 iterations, the routine indicates that
the procedure did not converge. The estimate for p is then
p = x/N . (12.7)
The equations must be modified slightly if p is specified. If N
is specified, then just equation (12.7) is used.
*S A
The asymptotic variances of p and N are estimated from the inverse
of the second partial derivatives of the log-likelihood function,
. _n(Np2_2px-x)/(p(H-p)),
(12.9)
(12.10)
31
-------
The two integer values of N closest to N^ are tried for the integer
(Pascal) solution to the problem. The integer N giving the largest
log-likelihood is given as an alternative solution.
12.4 Test for Goodness-of-Fit
A standard goodness-of-fit test is computed for the observed
frequencies:
K
X2 = E(f, --E.)2/E., where E. = nPr[x=i] (12.11)
i=l ! i i i
and the cells are collapsed from the left and right until the end cells
each have an expected value of 5. The degrees of freedom will be the
number of cells after collapsing minus three. If p or N are specified,
then one degree of freedom will be added for each parameter specified.
12.5 Example
Love, et al. [21] measured the incidence of acute respiratory disease
in children. Counts of illnesses in children aged 6 to 12 from two
communities in New York are shown in Figures 12.1 and 12.2. Both fits are
adequate as measured by a standard chi-square goodness-of-fit test. A fit
of the two communities combined gives a log-likelihood of -598.2169. The
likelihood ratio test for similar distributions in each community is
-2(-598.2169 + 288.8186 + 304.1975) - 10.4. This value is approximately
distributed as a chi-square with 2 degrees of freedom, and therefore
indicates that the distribution of illnesses is not similar in the two
communities.
Input for Love's illness data:
UWF'i tWER KESttftATftR* ttUtt»$»
"
CSI5J '" ' , '
" 25 ' * ' f •; I
. . . t- ^^
32
-------
Outputs:
u^siusi^.^ u^E's 1^^
::'' to ^it^SlitW ?it;3&::*.^^ '' ."
^^B^^I^^pS^^s^^.
^ilWi^^
•" |^^|p^|s:l;ill
§i|:«l|is5ii! f ;f
.'*.;•:
':• J. *
Figure 12.1
Figure 12.2
33
-------
13. THE TRUNCATED NEGATIVE BINOMIAL DISTRIBUTION
13.1 History and Definition
The truncated negative binomial distribution occurs when sampling from
a negative binomial distribution where the "0" outcome cannot be observed:
P r—| = /IN+J-H DJn+DrlN+J'/M-n+DrlM i-i 2 3 m n
rFLA JJ \ N 1 I P \ 'TP/ /U V'TP/ / J l»t,«5»i«t IIJ'U
where p>0 and N>0. Examples of this distribution were given by
Sampford [33] and Brass [7]. The article by Sampford gives the maximum
likelihood equations. Wyshak [37] actually gives a computer subroutine
to estimate the parameters N and w, where w = l/(l+p). The
subroutine also calculates the estimated asymptotic variances of ^N
and w. Our program modified these equations to obtain N and p,
along with their estimated variances.
13.2 Calling Sequence
The truncated negative binomial distribution is called by
I Col. 8 I 251 351
TNEGBINM! l |
where P and N are the (optional) values of the parameters p and N
to be fitted. If either or both are left blank, the parameters will be
estimated from the data. Note that the main program requires that the
number of "0" counts be read in, even though it is ignored by this routine.
13.3 Maximum Likelihood Estimates
Wyshak [37] gives a FORTRAN algorithm for obtaining maximum
likelihood estimates using a Newton-Raphson method. For simplicity, the
parameter p is replaced by w where w = l/(l+p). Initial estimates
for w and N were given by Brass [7]:
and
w = x/s2(l-f0/n), (13.2)
N = (wx-f0/n)/(l-w) (13.3)
34
-------
where x is the sample mean and s2 is the sample variance. The first and
second partial derivatives of the log-likelihood were given by Wyshak
and are not repeated here. The covariance terms for p are obtained from
those of w using standard calculus formulas.
The iteration scheme is continued until the first derivatives with
respect to w and N are less than .0001. If this does not occur within
100 iterations, the routine indicates that the procedure did not converge
This routine can fail to converge if the variance of the data is too small.
In such cases, the truncated Poisson distribution may provide a good fit
to the data.
13.4 Test for Goodness-of-Fit
A standard goodness-of-fit test is computed for the observed
frequencies:
K
X2 = E(f, - E.)2/E., where E. = nPr[x=i] (13.4)
i=l "'I i
and the cells are collapsed from the left and right until the end cells
each have an expected value of 5. The degrees of freedom will be the
number of cells after collapsing minus three. If p or N are specified,
then one degree of freedom will be added for each parameter specified.
13.5 Example
Sampford [33] gives counts of chromosome breaks. Figure 13.1 shows the
fit of a truncated negative binomial distribution to this data. The standard
chi-square goodness-of-fit test shows that the data are adequately fitted by
this distribution. The data can also be described by a logarithm distribu-
tion (see section 14.5).
Input for Sampford's chromosome break data:
F8KB*
-3-
, ' ^ '
- it * 4 3 a i » 2 i t
35
-------
Output:
SWFOKJ'S $RGHMS®381E BfciAKS
FIT TO A
TOTAL SAttPUe SIZE * 38
EStlWTEB VALOg OF * « 3.7394*
ESTIMATES VALUE OF H * ,4934&
ASYMPTOTIC COVARIANCC IWTUJX Of
H
BOOOMESS er fir its?
VAl«E OBSERVED IXPiCTtB
Figure 13.1
1 11 10.80 .$$
2 t> fc.36 .Sif
5 4 4.17 »frj
4 5 ?,«7 3U5S
5 0 S»«3 2.0$
* 1 1,47 *«
7 0 3..81? —
« 2 ,79
9 1 .59
18 8 ,44 , —
II I ,33
12 8 »:£&
13 1 .19
« *W r ~ —
» " 3.6S> B.F* ^ 3>- P a
36
-------
14. THE LOGARITHMIC DISTRIBUTION
14.1 History and Definition
The logarithmic distribution (more formally referred to as the
logarithmic series distribution) is a single parameter distribution. The
distribution takes on the values 1, 2, 3,..., where
Pr[x=j] = -ex/(xlog(l-e)) for 0|
where T (optional) is a specified value of 9 to be fitted and tested
for goodness-of-fit. If not specified, e is estimated from the data and
then fitted. Note that the main program requires that the number of "0"
counts be read in, even though it is ignored by this routine.
14.3 Maximum Likelihood Estimates
The estimate of e is calculated using a Newton-Raphson iteration
scheme. The initial estimate comes from Birch [5]:
9=1- (H-[(5/3-logx/16)(x-l)+2]logx)"1 . (14.2)
The Newton-Raphson iteration scheme is
0= e - (I+e/((l-e)log(l-e)))(((l-6)2log2(l-e)/(log(l-6)+e))) . (14.3)
37
-------
The iteration scheme is continued until the change in e is less than
.00001. The asymptotic variance of § is estimated by
Var(e) = e2/[nx2(l/(l-e)-x)] . (14.4)
14.4 Test for Goodness-of-fit
A standard goodness-of-fit test is computed for the observed
frequencies:
X2 « £ (VEJVE,. (14-5)
i=l i i !
f\
where Ei = n Pr[x=i], and the value of e is taken to be e unless
specified by T. The cells are collapsed from the left and right until the
end cells each have an expected value of 5. The degrees of f reedont- wi 1 1 be
the number of cells after collapsing minus two. If e is specified, an
additional degree of freedom will be added.
14.5 Example
The data of Sampford [33] given in section 13.5 can also be fitted by
a logarithmic distribution. The output is shown in Figure 14.1.
Input for Sampford 's chromosome break data:
if 14ISS-' - ' * * "* ' ' ' ' ' * «
' U I '*\ I "*' l>* t 2
38
-------
Output:
SAMPFORO'S CHROHOSQffi BREAKS PER CtU. "•
.86;
.;|
Figure 14.1
39
-------
15. REFERENCES
1. Alldredge, J. R., and D. D. Bolstad (1977). GDIST: A Computer Code for
Analysis of Statistical Distributions of Physical Data, United States
Department of Commerce, NTIS TN33.071, 57 pp.
2. Ames, B. N., J. McCann, and E. Yamasaki (1975). Methods for Detecting
Carcinogens and Mutagens with the Salmonella/Mammalian-Microsome
Mutagenicity Test, Mutation Research, 31, pp 347-364.
3. Barr, A. J., J. H. Goodnight, J. P. Sail, and J. T. Helwig (1976).
A User's Guide to SAS 76, SAS Institute, Incorporated, Raleigh, North
Carolina, 330 pp.
4. Bartko, J. J. (1961). The Negative Binomial Distribution: A Review
of Properties and Applications, The Virginia Journal of Science (New
Series), 12, pp 18-37.
5. Birch, M. W. (1963). An Algorithm for the Logarithmic Series
Distribution, Biometrics, 19, pp 651-653.
6. Bouver, H., and R. E. Bargmann (1975). Computational Algorithms and
Modules for the Evaluation of Statistical Distribution Functions,
Univ. of Georgia, Dept. of Statistics, Athens, Georgia.
7. Brass, W. (1958). Simplified Methods of Fitting the Truncated Negative
Binomial Distribution, Biometrika, 45, pp 59-68.
8. Day, N. E. (1969). Estimating the Components of a Mixture of Normal
Distributions, Biometrika, 56, 3, pp 463-484.
9. Dempster, A. D., N. M. Laird, and D. B. Rubin (1977). Maximum
Likelihood from Incomplete Data Via the EM Algorithm, Journal of the
Royal Statistical Society B, 38, pp 1-22.
10. Dixon, W. J. (1975). Biomedical Medical Programs (BMD-P), University
of California Press, Berkeley, California, 792 pp.
11. Finney, D. J. (1949). The Truncated Binomial Distribution, Annals of
Eugenics, London, England, 14, pp 319-328.
12. Finney, D. J., and G. C. Varley (1955). An Example of the Truncated
Poisson Distribution, Biometrics, 11, pp 387-394.
40
-------
13. Fisher, R. A. (1958). Statistical Methods for Research Workers,
Thirteenth Revised Edition, Hafner Publishing Company, Incorporated,
New York, New York, 356 pp.
14. Fisher, R. A., A. S. Corbet, and C. B. Williams (1943). The Relation
Between the Number of Species and the Number of Individuals in a
Random Sample of an Animal Population, Journal of Animal Ecology, 12,
pp 42-57.
15. Hartley, H. 0. (1961). The Modified Gauss-Newton Method for Fitting
of Non-Linear Regression Functions by Least Squares, Technometrics,
3,2, pp 269-280.
16. Haseman, J. K., and E. R. Scares (1976). The Distribution of Fetal
Deaths in Control Mice and Its Implications on Statistical Tests for
Dominant Lethal Effects, Mutation Research, 41, pp 277-288.
17. Hasselblad, V. (1969). Estimation of Finite Mixtures of Distributions
from the Exponential Family, Journal of the American Statistical
Association, 64, pp 1459-1571.
18. Irwin, J. 0. (1959). On the Estimation of the Mean of a Ppisson
Distribution with the Zero Class Missing, Biometrics, 15, pp 324-326.
19. Johnson, N. I., and S. Kotz (1969). Discrete Distributions, John
Wiley and Sons, Incorporated, Somerset, New Jersey, 328 pp.
20. Kendall, M. G., and A. Stuart (1961). The Advanced Theory of
Statistics, Volume 2, Hafner Publishing Company, New York, New York,
676 pp.
21. Love, G. J., S. Lan, C. M. Shy, and R. J. Struba (1980). The Incidence
and Severity of Acute Respiratory Illness in Families Exposed to
Different Levels of Air Pollution, New York Metropolitan Area, 1971-
1972, Dept. of Epidemiology, School of Public Health, Univ. of N. C.,
Chapel Hill, North Carolina.
22. Mantel, N. (1951). Evaluation of a Class of Diagnostic Tests,
Biometrics, 7, pp 240-246.
23. McKendrick, A. G. (1926). Applications of Mathematics to Medical
Problems, Proceedings of Edinburgh Mathematical Society, 44, pp 98-130.
24. Montmort, P. R. (1714). Essai D'Analyse Sur Les Jeux de Hasards,
Paris, France.
25. Morgan, C., W. Nelson, and P. Caporal (1979). General Maximum
Likelihood Fitting with STATPAC, General Electric Technical Report.
41
-------
26. 'Nelson, W. C., and H. A. David (1967). The Logarithmic Distribution:
A Review, Virginia Journal of Science, 18, pp 95-102.
27. Nie, N. H., C. H. Hull, J. G. Jenkins, K. Steinbrenner, and D. H.
Brent (1975). Statistical Package for the Social Sciences, Second
Edition, McGraw Hill Book Company, New York, New York, 673 pp.
28. Orchard, T., and M. A. Woodbury (1972). A Missing Information Principle:
Theory and Applications, Proceedings of the 6th Berkeley Symposium on
Mathematical Statistics and Probability, 1, pp 697-715.
29. Pascal, B. (1679). Varia Opera Mathematica D. Petri de Fermat
(Tolossae).
30. Potthoff, R. F., and M. Whittinghill (1966). Testing for Homogeneity,
I. The binomial and multinomial distributions, Biometrika, 53,
pp 167-182.
31. Potthoff, R. F., and M. Whittinghill (1966). Testing for Homogeneity,
II. The Poisson distribution, Biometrika, 53, pp 183-190.
32. Rider, P. R. (1961). Estimating the Parameters of Mixed Poisson,
Binomial, and Weibull Distributions by the Method of Moments, Bulletin
of International Statistical Institute, Vol. 39, pp 225-232.
33. Sampford, M. R. (1955). The Truncated Negative Binomial Distribution,
Biometrika, 42, pp 58-69.
34. Schilling, W. (1947). A Frequency Distribution Represented on the
Sum of Two Poisson Distributions, Journal of American Statistical
Association, 42, pp 407-424.
35. Skellam, J. G. (1948). A Probability Distribution Derived from the
Binomial Distribution by Regarding the Probability of Success as
Variable Between the Sets of Trials, Journal of the Royal Statistical
Society, Series B, London, 10, pp 257-261.
36. Sokal, R. R. and F. J. Rohlf (1973). Introduction to Biostatistics,
Freeman and Company, San Francisco, California, pp 66-67.
37. Wyshak, G. (1974). A Program for Estimating the Parameters of the
Truncated Negative Binomial Distribution, Journal of Applied
Statistics, 23, 1, pp 87-91.
42
-------
APPENDIX A.
INDEX
Ames test data, 22, 28
Bernoulli, James, 8
Beta-binomial distribution, 17
Beta distribution, 3, 17
Binomial distribution, 1, 8, 15, 17, 30
Chromosome break data, 35, 38
Constant distribution, 5
Cumulative frequency distribution, 5
Dice data, Wei don's, 9
Digamma function, 18
Discrete rectangular distribution, 5
Discrete uniform distribution, 5
EM algorithm, 11, 14, 24, 27
Exponential distribution, 1
Frequency distribution, 5
Gall-fly eggs, 12, 25
Gamma distribution, 1
Gauss-Newton method, 3, 17
Geometric distribution, 1, 30
Goodness of fit, 1, 6, 9, 12, 15, 19, 21, 25, 28, 32, 35, 38
Homogeneity test 9, 22
Illness data, 32
Likelihood ratio test, 3, 6, 9, 15, 25, 32
Logarithmic distribution, 35, 37
Lognormal distribution, 1
Maximum likelihood, 3
Mendelian factors, 6
Mice data, 19
Missing information principle, 3, 11, 14, 24, 27
Mixture of two binomials, 14
Mixture of two Poissons, 27
Moment estimates, 14, 27, 30
Moments, sample, 5
Negative binomial distribution, 3, 30, 34
Newton-Raphson method, 3, 31, 34, 37
Normal distribution, 1
Pascal distribution, 30, 32
Pearson curves, 1
43
-------
Poisson distribution, 21, 28
Positive binomial distribution, 11
Positive Poisson distribution, 24
Primula counts, 6
Rectangular distribution, discrete, 5
Sample moments, 5
Sex distribution data, 15
Singularities, 3
Sokal and Rohlf s data, 15
Truncated binomial distribution, 11
Truncated negative binomial 34, 37
Truncated Poisson distribution, 12, 24
Uniform distribution, 1
Uniform distribution, discrete, 5
Variance test, 22
Varley's data, 12, 25
Weibul, 1
Weidon's dice data, 9
44
-------
APPENDIX B. FLOW CHART OF THE DISCRETE PROGRAM
DISCRETE ,
(main)
/
>
' >•
CONST
BINOML
TBINOM
BINOMX
POISSN
RFTRTN
Dt 1 Din
TPOISN
POISMX
NEGBIN
TNFGBN
LOGRTH
— v
J
i
— V
— s,
.
/*-!
i
/^
l/*-
GnFTT
t
rurcn
t
run DM
> PSI
^ ALGMA
45
-------
APPENDIX C. ERROR MESSAGES
INVALID COMMAND:
The name of the distribution fitted does not match any names in the
program.
INVALID PARAMETER SPECIFICATION:
The values for one or more of the parameters (XP's) are outside the
allowable limits. These limits are given in the definition section
of each distribution.
PROCEDURE DID NOT CONVERGE:
This message can occur in any of the routines which require iterative
methods to obtain the estimates. There is no easy way to correct the
problem, although the estimates given by the program still may be
useful. We would appreciate receiving such examples so that we can
improve our procedures in the future.
INSUFFICIENT EXPECTED FREQUENCIES:
The goodness-of-fit routine (GDFIT) did not find sufficient expected
frequencies to perform a goodness-of-fit test.
UNEXPECTED END OF FILE ENCOUNTERED:
An end-of-file was encountered while reading a card other than a
title card. Check sequence of input cards.
46
-------
APPENDIX D.
LISTING OF THE PROGRAMS
C DISCRETE DISTRIBUTION FITTING PACKAGE REVISED 9-24-80
C PROGRAMMERS'- ANDERSON, HASSELBLAD
C
C MAIN PROGRAM
C
C THE FOLLOWING ARE VARIABLES IN COMMON:
C NF IS AN ARRAY OF OBSERVED COUNTS
C E IS AN ARRAY OF EXPECTED COUNTS
C XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY.
C M IS THE LENGTH OF THE E ARRAY
C INT IS AN OPTIONAL INPUT INTEGER.
C NP IS TH NUMBER OF PARAMETERS EXTIMATED
C IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS}
C IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C
DIMENSION FMTC20)
COMMON TITLE(20),NF(200),E(200),XPC4),K,M,INT,NP,IF,NTOT
CHARACTERX8 NC,NCON(12)
DATA NCON/'CONSTANT','BINOMIAL','TBINOMAL','BINOMIX ','BETBINOM',
1 'POISSON ','TPOISSON','POISSMIX','NEGBINOM','TNEGBINM1,'LOGARITH'
2 ,' V
DATA ND/12/
2 CONTINUE
NTOT = 0
DO 3 I = 1, 200
NF(I) = 0
3 CONTINUE
C
C NC IS THE NAME OF THE DISTRIBUTION TO BE FITTED
C READ IN THE CONTROL CARDS
C
READ (5,5,END=910) TITLE
5 FORMAT(20A4)
READ C5,10,END=920) K
10 FORMATCI5)
IFCK.GE.200) GO TO 930
KP = K + 1
READ C5,5,END=920) FMT
READ (5,FMT,END=920) CNF(I),1 = 1,KP)
DO 4 I = 1, KP
NTOT = NTOT + NF(I)
4 CONTINUE
6 CONTINUE
READ C5,15,END=920) NC, INT, XP
15 FORMAT(A8,I7,4F10.0)
DO 7 I = 1, ND
IF(NC.NE.NCONCI)) GO TO 7
GO TO (11,12,13,14,16,17,18,19,21,22,23,2) I
7 CONTINUE
WRITE(6,55) NC, K,XP
55 FORMAT('0',10X,'INVALID COMMAND: »,A8,I5,4F10.4)
GO TO 6
11 CONTINUE
CALL CONST
47
-------
GO TO 6
12 CONTINUE
CALL BINOML
GO TO 6
13 CONTINUE
CALL TBINOM
GO TO 6
14 CONTINUE
CALL BINOMX
GO TO 6
16 CONTINUE
CALL BETBIN
GO TO 6
17 CONTINUE
CALL POISSN
GO TO 6
18 CONTINUE
CALL TPOISN
GO TO 6
19 CONTINUE
CALL POISMX
GO TO 6
21, CONTINUE
CALL NEGBIN
GO TO 6
22 CONTINUE
CALL TNEGBN
GO TO 6
23 CONTINUE ,
CALL LOGRTH
GO TO 6
910 CONTINUE
STOP
920 CONTINUE
WRITE (6,925)
925 FORMAT('1',10X,'UNEXPECTED END-OF-FILE ENCOUNTERED')
STOP
930 CONTINUE
WRITE (6,935) K
935 FORMATC1',10X, 'VALUE OF K TOO LARGE:',17)
STOP
END
C CNORM REVISED 5- 9-80
C PROGRAMMER? HASSELBLAD
C COMPUTES THE NORMAL PROBABILITY INTEGRAL FROM MINUS INFINITY TO X
C REFERENCE: APPROX. FOR DIG. COMP. - HASTINGS, P. 169
FUNCTION CNORM(X)
DATA S2P,P,A1,A2,A3,A4,A5,S2/1.12837917 ,.3275911,.225836846,
1 -.252128668,1.25969513,-1.28782245 , .94064607,.707106781/
XN = !./(!. + PXABS(XXS2>)
PHI = S2PXEXP(-XXX2/2.)
PHIS=1.-XNX(A1+XNX(A2+XNX(A3+XNXCA4+XN*A5))))XPHI
TERM = .5XPHIS
IF(X.LT.O.) TERM = -TERM
CNORM = 0.5 + TERM
RETURN
END
48
-------
C CONSTANT DISTRIBUTION FITTING ROUTINE INCLUDING SAMPLE MOMENTS
c PROGRAMMERS: HASSELBLAD, ANDERSON REVISED 5- s-so
c
SUBROUTINE CONST
COMMON TITLE(20),NFC200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C
C NF IS AN ARRAY OF OBSERVED COUNTS
C E IS AN ARRAY OF EXPECTED COUNTS
C XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY.
C M IS THE LENGTH OF THE E ARRAY
C INT IS AN OPTIONAL INPUT INTEGER.
C NP IS THE NUMBER OF PARAMETERS ESTIMATED.
C IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C P IS THE ESTIMATED PARAMETER OF THE BINOMIAL DISTRIBUTION: B(N,P)
C
WRITE (6,10) TITLE
10 FORMAT('1',10X,20A<»)
M = K + 1
TOT = NTOT
NFL = 0
PSUM = 0.
DO 4 I = 1, 4
Ed) = XP(I)
IF(Ed).GT.O.) NFL = 1
4 CONTINUE
IF(NFL.EQ.O) GO TO 6
IFCM.LE.4) GO TO 7
READ (5,5) (Ed),1=5,M)
5 FORMAT(SFIO.O)
GO TO 7
6 CONTINUE
C
C ASSUME A UNIFORM DISTRIBUTION IF NOT SPECIFIED.
C
DO 11 I = I, M
Ed) = 1.
11 CONTINUE
7 CONTINUE
C SUM UP EXPECTED FREQUENCIES SO THEY CAN BE NORMALIZED.
C
DO 12 I = 1, M
IFCECD.LT.O.) GO TO 900
PSUM = PSUM + Ed)
12 CONTINUE
XLK = 0.
C NORMALIZE EXPECTED FREQUENCIES AND COMPUTE LOG-LIKELIHOOD.
C
DO 8 I = 1, M
Ed) = E(I)/PSUM
IF(E(I).GT.O.) XLK = XLK + NFd)XALOG(Ed))
Ed) = Ed)*TOT
8 CONTINUE
IF = 1
IF(Ed).EQ.O) IF = 2
XM1 = 0.
XM2 = 0.
XM3 = 0.
XM4 = 0.
NP = INT
C
C COMPUTE SAMPLE MOMENTS
C
DO 2 I = 1. M
X = I - 1
XM1 = XM1 + NFd)*X
49
-------
2 CONTINUE
XM1 = XM1/TOT
DO 3 I = 1, M
X = I - 1 - XM1
X2 = XXX
X3 = X2XX
X4 = X2XX2
XM2 = XM2 + NF(I)*X2
XM3 = XM3 + NF(I)XX3
XM* = XM4 + NFd)xx4
3 CONTINUE
XM2 = XM2/TOT
XM3 = XM3/TOT
XM4 = XM4/TOT
C
C WRITE OUT RESULTS
C
WRITE (6,15) XM1
15 FORMAT(1HO,10X,'SAMPLE MEAN =',5X,F14.6)
WRITE (6,20) XM2
20 FORMATUH ,10X,'SAMPLE VARIANCE =',F15.6)
WRITE (6,25) XM3
25 FORMATdH ,10X,'THIRD MOMENT =',4X,F14.6)
WRITE (6,30) XM4
30 FORMATdH ,10X,'FOURTH MOMENT =',3X,F14.6)
WRITE (6,^0)
40 FORMAT('0',10X,'VALUE COUNT PERCENT CUMULATIVE CUMULATIVE')
WRITE (6,45)
45 FORMATC ',37X,'COUNT',4X,'PERCENT')
NCUM = 0
CUMP = 0.
DO 9 I = 1, M
J = I - 1
PCT = NF(I)X100/TOT
NCUM = NCUM + NF(I)
CUMP = CUMP + PCT
IF(E(I).GT.O..OR.NF(I).GT.O) WRITE (6,50) J,NF(I),PCT,NCUM,CUMP
50 FORMATC ',10X,15,17,F9.2,I11,F11.2)
9 CONTINUE
C
C COMPUTE GOODNESS OF FIT
C
CALL GDFIT
WRITE (6,35) XLK
35 FORMAT('0',10X,'LOG-LIKELIHOOD =',F12.4)
RETURN
900 CONTINUE
WRITE (6,905) INT, (XP(I),I=1,4)
905 FORMAT('0',10X,'INVALID PARAMETER SPECIFICATION'/11X,I7,4F11.5)
IF(M.GT.4) WRITE (6,910) (Ed),1=5,M)
910 FORMATC ', 10X,8F11.5)
RETURN
END
50
-------
C SUBROUTINE TO FIT A BINOMIAL DISTRIBUTION REVISED 9-25-80
c PROGRAMMERS: HASSELBLAD, ANDERSON
c
SUBROUTINE BINOML
COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C
C NF IS AN ARRAY OF OBSERVED COUNTS
C E IS AN ARRAY OF EXPECTED COUNTS
C XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY.
C M IS THE LENGTH OF THE E ARRAY
C INT IS AN OPTIONAL INPUT INTEGER.
C NP IS THE NUMBER OF PARAMETERS ESTIMATED
C IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C P IS THE ESTIMATED PARAMETER OF THE BINOMIAL DISTRIBUTION: B(N,P)
C
IF = 1
WRITE (6,30) TITLE
30 FORMAT('1',10X,20A4)
WRITE (6,5)
5 FORMAT ('0',10X,'FIT TO A BINOMIAL DISTRIBUTION')
WRITE (6,40)
40 FORMAT('0',10X,'P(X,N) = N!P*xX(l-P)xx(N-X)/(X!
-------
C SUBROUTINE TO FIT A TRUNCATED BINOMIAL DISTRIBUTION
c PROGRAMMERS: HASSELBLAD, ANDERSON REVISED io-16-so
c
SUBROUTINE TBINOM
COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C
C NF IS AN ARRAY OF OBSERVED COUNTS
C E IS AN ARRAY OF EXPECTED COUNTS
C XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY
C M IS THE LENGTH OF THE E ARRAY
C INT IS AN OPTIONAL INPUT INTEGER.
C NP IS THE NUMBER OF PARAMETERS ESTIMATED
C IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C PT IS THE NEW ESTIMATE OF THE PARAMETER P
C P IS THE OLD ESTIMATE OF THE PARAMETER P
C ZN IS THE ESTIMATE OF THE UNOBSERVED CELL F(0)
C
WRITE (6,30) TITLE ;
30 FORMAT('1',10X,20A4)
WRITE (6,5)
5 FORMAT('0',10X,'FIT TO A TRUNCATED BINOMIAL DISTRIBUTION')
C
C CHECK FOR IMPROPER PARAMETER VALUES.
C
IFCXPU).LT.O.O.OR.XP(l).GT.l.O) GO TO 900
IF (INT.LT.O) GO TO 900
KP = K + 1
NP = 1
IFCXPCD.GT.O.) NP = 0
N = MAX(INT,K)
M = MIN(KP+1,N-H)
IF = 2
CN = N
NT = 0
P = XPU)
XM1 = 0.
DO 2 I = 2, KP
CI = I - 1
NT = NT + NF(I)
XM1 = XM1 + CIXNF(I)
2 CONTINUE
TOT = NT
XM1 = XM1/TOT
C
C COMPUTE THE INITIAL ESTIMATE OF P IF NOT SPECIFIED.
C
IF(XP(1).GT.O.) GO TO 6
P = (XM1 - NF(2)/TOT)/(CN - NF(2)/TOT)
WRITE (6,10) NT, N ....
10 FORMAT('0',10X,'TOTAL SAMPLE SIZE =',16,', N =',!<»)
C
C BEGIN ITERATION PROCEDURE
C
DO 4 IT = 1, 50
QN = (1. - P)*XN
ZN = QN*TOT/(1. - QN)
PT = 0.
DO 3 I = 2, KP
PT = PT"+ICIXNF(D/((TOT + ZN)*CN)
IF(ABS(PT-P).LE.0.00001) GO TO 7
P = PT
4 CONTINUE
95 FORMAT? VflOX, 'PROCEDURE DID NOT CONVERGE')
7 CONTINUE
53
-------
P = PT
6 CONTINUE
IF(XPC1).EQ.O.) WRITE (6,40) P
40 FORMAT('0',10X,'ESTIMATED VALUE OF P =',F6.4>
IF(XPd).GT.O.) WRITE (6,45) P
45 FORMAT('0',10X,'SPECIFIED VALUE OF P =',F6.4)
XL = 0.
PT = P
QT = 1. - PT
ALP = ALOG(PT)
ALQ = ALOG(QT)
XDC = 2.
XNC = CN - 1.
CUM = 0.
QN = 1. - QTXXN
TERM = ALOG(CN) + ALP + XNCXALQ - ALOG(QN)
C
C COMPUTE EXPECTED COUNTS, ASYMPTOTIC VARIANCE
C
Ed) = 0.
VAR = -TOTX(CNX(CN ~ l.)XQTXX(N " 2)/QN + (QTXX(N - 1)XCN)XX2/QN)
DO 8 I = 2, M
CI = I - 1.
QI = QTXXd - 1)
VAR = VAR + CIXNF(I)/PTXX2 + (CN - CI)XNF(I)/QTXX2
Ed) = EXP(TERM)XTOT
CUM = CUM + Ed)
XL = XL + NF(I)XTERM
IFd.NE.M) TERM = TERM + ALP - ALQ + ALOG(XNC) - ALOG(XDC)
IF(XNC.GT.L) XNC = XNC - 1.
XDC = XDC + 1.
8 CONTINUE
VAR = l./VAR
IF(XP(1).EQ.O.) WRITE (6,50) VAR
50 FORMATC ',10X,'ASYMP. VAR. OF P =',E13.5)
C
C COMPUTE GOODNESS OF FIT TESTS.
C
CALL GDFIT
WRITE (6,20) XL
20 FORMAT('0',10X,'LOG-LIKELIHOOD =',F12.4)
RETURN
900 CONTINUE
WRITE (6,905) INT, XPd)
905 FORMATCOSIOX,'INVALID PARAMETER SPECIFICATION'/11X,I7,F11.5)
RETURN
END
54
-------
C SUBROUTINE TO FIT A BETA BINOMIAL DISTRIBUTION REVISED 4-21-80
C PROGRAMMER: HASSELBLAD
C
SUBROUTINE BETBIN
COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C
C NF IS AN ARRAY OF OBSERVED COUNTS
C E IS AN ARRAY OF EXPECTED COUNTS
C XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C M IS THE LENGTH OF THE E ARRAY
C K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY
C INT IS AN OPTIONAL INPUT INTEGER.
C NP IS THE NUMBER OF PARAMETERS ESTIMATED
C IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C ALPHA AND BETA ARE THE PARAMETERS TO BE ESTIMATED (GIVEN AS A
C AND B IN THE FORMULA OF FORMAT 10)
C
DATA EPS/0. 00001/
WRITE (6,30) TITLE
30 FQRMATC1',10X,20A4)
C
C CHECK FOR IMPROPER PARAMETER VALUES
C
IF(XP(1).LT.O..OR.XP(2).LT-0.) GO TO 900
IF = 1
N = MAX(INT,K)
CN = N
KP = K + 1
M = MIN(KP+1,N-H)
TOT = NTOT
NP = 2
IF(XP(1).GT.O.) NP = NP - 1
IF(XP(2).GT.O.) NP = NP - 1
WRITE (6,5)
5 FORMATCO' ,10X,'FIT TO A BETA-BINOMIAL DISTRIBUTION')
WRITE (6,10)
10 FORMATCO', 10X,'P(X) = N! ( A+X-1) ! (B+N-X-1) ! (A-l) ! (B-l) !/V18X,
WRITE (6,15) NTOT, N
15 FORMATCO', 10X, 'TOTAL SAMPLE SIZE =',16,', N ='
C
C COMPUTE FACTORIAL MOMENTS FOR INITIAL ESTIMATES
C
Rl = 0.
R2 = 0.
DO 6 I = 1, KP
CI = I - 1
Rl = Rl + CIXNF(I)/TOT
R2 = R2 + CI*(CI - l.)XNF(I)/TOT
6 CONTINUE
R2 = R2/R1
C ESTIMATE THE INITIAL VALUES FOR ALPHA AND BETA
° ALPHA = (R1XR2 - (CN - l.)XRl)/((CN - l.)*Rl - CNXR2)
IF(XP(1).GT.O.) ALPHA = XP(1)
BETA = ALPHA*(CN/R1 ~ !•>
IF(XP(2).GT.O.) BETA = XP(2)
IF(XP(1).GT.O..AND.XP(2)-GT.O.) GO TO 16
C BEGIN GAUSS-NEWTON ITERATION SCHEME
C
DO 12 IT = 1, 100
ALBE = ALPHA + BETA
DELTA = 1.
C COMPUTE CURRENT LIKELIHOOD FUNCTION
C
55
-------
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
COMPUTE ELEMENTS OF FIRST PARTIAL DERIVATIVES
SUM THEIR CROSS PRODUCTS
XLK1 = TOTXCALGMACCN + 1.) + ALGMA(ALBE) - ALGMA(ALPHA)
- ALGMA(BETA) - ALGMA(ALBE + CN>>
T = PSKALBE) - PSICALBE + CN)
Tl = T - PSKALPHA)
T2 = T - PSKBETA)
Dl = 0.
D2 = 0.
Dll = 0.
D12 = 0.
D22 = 0.
DO 2 I = 1,
CI = I -
KP
1
XLK1 = XLK1 + NFCDXCALGMACALPHA + CI) + ALGMACBETA + CN-CI)
- ALGMACCI + 1.) - ALGMACCN - CI + 1.))
El = Tl + PSICALPHA + CI)
E2 = T2 + PSICBETA + CN - CI)
Dl = Dl + E1XNFCI)
D2 = D2 + E2*NFCI)
Dll = Dll + E1XX2XNFCI)
D12 = D12 + E1XE2XNFCI)
D22 = D22 + E2XX2XNFCD
CONTINUE
CHECK FOR SPECIFIED PARAMETERS
IFCXPCD.GT.O..OR.XPC2).GT.O.) D12 = 0.
IFCXPCD.GT.O.) Dl = 0.
IFCXPCD.GT.O.) Dll = 1.
IFCXPC2)-GT.O.) D2 = 0.
IFCXPC2).GT.O.) D22 = 1.
DET = D22XD11 - D12XD12
TEMP = Dll
Dll = D22/DET
D22 = TEMP/DET
D12 = -D12/DET
DELA = D11XD1 + D12XD2
DELB = D22XD2 + D12XD1
COMPUTE CONSTRAINED NEW ESTIMATES AND THEIR LIKELIHOOD
THE ESTIMATES ARE CONSTRAINED TO BE GREATER THAN EPS
ALPHP = AMAX1CALPHA + DELA,EPS)
BETAP = AMAX1CBETA + DELB,EPS)
ALBEP = ALPHP + BETAP
XLK2 = TOTXCALGMACCN + 1.) + ALGMACALBEP) - ALGMACALPHP)
- ALGMACBETAP) - ALGMACALBEP + CN))
DO 3 I = 1, KP
CI = I - 1
XLK2 = XLK2 + NFCDXCALGMACALPHP + CI) +
- ALGMACCI + 1.) - ALGMACCN - CI + 1.))
CONTINUE
ALGMA(BETAP+CN-CI)
FIND BEST ESTIMATES USING A CHOPPING METHOD
CONTINUE
DELTA = DELTAX0.5
XLK3 = XLK2
IF CHOPPING GETS TOO SMALL, QUIT
IFCDELTA.LE. 0.000001. AND. IT.GT.l) GO TO 16
IFCDELTA.LE. 0.000001) GO TO 18
ALPHP = AMAXKALPHA + DELTAXDELA, EPS)
BETAP = AMAXKBETA + DELTAXDELB, EPS)
ALBEP = ALPHP + BETAP
XLK2 = TOTXCALGMACCN + 1.) + ALGMACALBEP) - ALGMACALPHP)
- ALGMACBETAP) - ALGMACALBEP + CN))
56
-------
DO 4 I = 1, KP
CI = I - 1
XLK2 = XLK2 + NFCI)X(ALGMA(ALPHP + CI) + ALGMACBETAP+CN-CI)
1 - ALGMACCI + 1.) - ALGMA(CN - CI + 1.))
4 CONTINUE
IFCXLK2.LE.XLK1) GO TO 7
C
C OPTIMIZE STEP LENGTH
C
DOPT = (XLK3 - 4.XXLK2 + 3.XXLKD/C2.XCXLK3 - 2.XXLK2 + XLK1))
IFCDOPT.GE.1..0R.DOPT.LE.O.) DOPT =1.
DELTA = DELTAXDOPT
C
C SAVE NEW IMPROVED ESTIMATES
C
ALPHA = AMAXKALPHA + DELTAXDELA,EPS)
BETA = AMAX1CBETA + DELTAXDELB, EPS)
C
C CHECK FOR CONVERGENCE - A CHANGE IN PARAMETERS OF LESS
C THAN EPS
C
IFCDELTAX(ABSCDELA) + ABS(DELB)).LE.EPS) GO TO 16
12 CONTINUE
18 CONTINUE
WRITE (6,95)
95 FORMATC'OPROCEDURE DID NOT CONVERGE')
16 CONTINUE
C
C COMPUTE FINAL LIKELIHOOD, EXPECTED COUNTS
C
XLK = 0.
TERM = ALGMACBETA + CN) - ALGMACALPHA + BETA + CN) +
1 ALGMACALPHA + BETA) - ALGMACBETA)
DO 17 I = 1, M
CI = I - 1
ECI) = TOTXEXPCTERM)
XLK = XLK + NFCDXTERM
TERM = TERM + ALOGCALPHA + CI) - ALOGCCI + 1.)
IFCI.NE.M ) TERM = TERM + ALOGCCN - CI) - ALOGCBETA+CN-CI-1.)
17 CONTINUE
IFCXPC2).GT.O.) D22 = 0.
IFCXPCD.GT.O.) Dll = 0.
C
C WRITE OUT ESTIMATES AND THEIR VARIANCES
C
IFCXPCD.GT.O.) WRITE C6,20) ALPHA
20 FORMATC'0',10X,"SPECIFIED VALUE OF ALPHA =',F12.5)
IFCXPCD.EQ.O.) WRITE C6.25) ALPHA
25 FORMATC'0',10X,'ESTIMATED VALUE OF ALPHA =',F12.5)
IFCXPC2).GT.O.) WRITE C6,35) BETA
35 FORMATC* ',10X,'SPECIFIED VALUE OF BETA =',F12.5)
IFCXPC2).EQ.O.) WRITE C6,40) BETA
40 FORMATC' ',10X,'ESTIMATED VALUE OF BETA =',F12.5)
IFCXPCD.GT.O. .AND.XPC2).GT.O.) GO TO 19
45 FORMATC'0',10X,'ASYMPTOTIC COVARIANCE MATRIX OF ESTIMATES')
WRITE C6,50)
50 FORMATC' ',21X,'ALPHA',8X,'BETA')
WRITE C6,55) Dll, D12, D12, D22
55 FORMATC' ',10X,'ALPHA',2E13.5/11X,'BETA ',2E13.5)
19 CONTINUE
C
C COMPUTE GOODNESS OF FIT
C
CALL GDFIT
65 FORMATC'0',10X,'LOG-LIKELIHOOD =',F12.4)
RETURN
900 CONTINUE
WRITE C6,905) INT, XPC1), XPC2)
57
-------
905 FORMATCOMOX, 'INVALID PARAMETER SPECIFICATIONV11X, I7,4F11.5>
RETURN
END
C CHISQ REVISED 5- 9-80
C PROGRAMMER: HASSELBLAD
C COMPUTES THE CHI-SQUARE INTEGRAL TO X FOR N DEGREES OF FREEDOM
C REFERENCE: HANDBOOK OF MATHEMATICAL FUNCTIONS
C
REAL FUNCTION CHISQ(X.N)
SUM = 0.
IF(X.GT.O.O.AND.N.GT.O)GO TO 2
CHISQ = 1.
RETURN
2 CONTINUE
IFCN.GT.50) GO TO 6
IF(CN/2)X2.NE.N)GO TO 11
IF(N.LE.2)GO TO 3
M = N/2 - 1
TERM = 1.
DO 4 I = 1, M
CI = 2.XFLOAT(I)
TERM = TERMXX/CI
SUM = SUM + TERM
ft CONTINUE
3 CONTINUE
CHISQ = EXP(-X/2.0)*(1.0+SUM)
RETURN
11 CONTINUE
TERM = 1.0/X
IFCN.LE.DGO TO 12
M = (N-D/2
DO 13-1 = 1, M
CI = 2.XFLOAT(I)-1.
TERM = TERMXX/CI
SUM = SUM + TERM
13 CONTINUE
12 CONTINUE
XS = SQRT(X)
CHISQ = 2. - 2.*CNORM(XS> + XSXEXP(-X/2.>*.7978846XSUM
RETURN
6 CONTINUE
CHISQ = l.-CNORM«(X/N>xx.3333333-l.+2./C9.XN»/SQRTC2./(9.*N))>
RETURN
END
58
-------
C SUBROUTINE TO FIT A MIXTURE OF TWO BINOMIAL DISTRIBUTIONS
c PROGRAMMER: HASSELBLAD REVISED 9-25-80
c
SUBROUTINE BINOMX
COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C
C NF IS AN ARRAY OF OBSERVED COUNTS
C E IS AN ARRAY OF EXPECTED COUNTS
C XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY
C M IS THE LENGTH OF THE E ARRAY
C INT IS AN OPTIONAL INPUT INTEGER.
C NP IS THE NUMBER OF PARAMETERS ESTIMATED
C IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C PI IS THE ESTIMATED PARAMETER OF THE FIRST BINOMIAL DISTRIBUTION
C P2 IS THE ESTIMATED PARAMETER OF THE SECOND BINOMIAL DISTRIBUTION
C ALPHA IS THE ESTIMATED PROPORTION CORRESPONDING TO THE FIRST
C BINOMIAL DISTRIBUTION
C
IF = 1
WRITE (6,20) TITLE
20 FORMATC1' ,10X,20A4)
C
C CHECK FOR IMPROPER PARAMETER VALUES
C
IF(XP(l).LT.O..OR.XP(l).GT.l.) GO TO 900
IF(XP(2).LT.O. .OR.XP(2).GT.l.) GO TO 900
IF(XP(3).LT.O..OR.XP(3).GT.l.) GO TO 900
N = MAX(INT.K)
CN = N
KP = K + 1
M = MIN(KP+1,N+1)
NP = 3
IF(XP(1).GT.O.) NP = NP - 1
IF(XP(2).GT.O.) NP = NP - 1
IF(XP(3).GT.O.) NP = NP - 1
P = 0.
C
C COMPUTE ESTIMATE OF AVERAGE P
C
DO 2 I = 1, KP
P = P + CI-l)XNF(I)
2 CONTINUE
TOT = NTOT
P = PX(TOTXCN)
5 FORMAT ((''0',1 OX, 'FIT TO A MIXTURE OF TWO BINOMIAL DISTRIBUTIONS')
15 FORMAT('o"lOX,'PCX> = N! (ALPHACP1XXX) C1-P1)*X(N-X) +'/15X,
1 ' (1-ALPHA) (P2XXX) (1-P2)XX(N-X) )/(X! CN-X) ! ) ' )
WRITE (6,10) NTOT, N
10 FORMAT ('O'/lOX, 'TOTAL SAMPLE SIZE =',16,', N -
C
C COMPUTE INITIAL ESTIMATES
C
ALPHA = 0.5
IF(XP(3).GT.O.) ALPHA = XP(3)
PI = 0.75XP
IF(Pl.LE.O.OS) PI = 0.05
IF(XP(1)-GT.O.) PI = XP(1)
P2 = 1.333XP
IF(P2.GE.0.95) P2 = 0.95
GO TO 8
C BEGIN MIP-EM ITERATION SCHEME
C
DO 6 IT = 1» 100Q
59
-------
C
C
C
C
C
C
C
XI = 0.
X2 = 0.
XS = 0.
Ql = 1. - PI
Q2 = 1. - P2
ALP1 = ALOG(Pl)
ALQ1 = ALOG(Ql)
ALP2 = ALOGCP2)
ALQ2 = ALOGCQ2)
XDC = 1.
XNC = CN
TERM1 = CNXALQ1
TERM2 = CNXALQ2
DO 7 I = 1, KP
CI = I - 1
Dl = EXPCTERM1)
D2 = EXPCTERM2)
D = ALPHAXD1 + (1. - ALPHA)XD2
XI = XI + D1XCIXNF(I)/D
X2 = X2 + D2XCIXNF(I)/D
XS = XS + ALPHAXD1XNF(I)/D
IF(I.NE.M) TERM1 = TERM1
IF(I.NE.M) TERM2 = TERM2
IF(XNC.GT.l.) XNC = XNC - 1.
XDC = XDC + 1.
CONTINUE
XI = Xl/CTOTXCN)
X2 = X2/(TOTXCN>
XS = XS/TOT
IFCXPCD.GT.O.) XI = XP(l)
IF(XPt2).GT.O.) X2 = XPC2)
IF(XP(3).GT.O.) XS = XP(3)
ALP1-ALQ1 + ALOG(XNC) - ALOG(XDC)
ALP2-ALQ2 + ALOG(XNC) - ALOG(XDC)
TEST FOR CONVERGENCE
ABS(ALPHA-XS) . LE. 0 .0001) GO TO
IF(ABSCPl-Xl) + ABS(P2-X2)
PI = XI
P2 = X2
ALPHA = XS
6 CONTINUE
WRITE (6,95)
95 FORMATC'0',10X, 'PROCEDURE DID NOT CONVERGE1)
4 CONTINUE
PI = XI
P2 = X2
ALPHA = XS
CONTINUE
COMPUTE FINAL LIKELIHOOD, EXPECTED FREQUENCIES,
AND SECOND PARTIAL DERIVATIVES
XLK =
Ql = 1
Q2 = 1
ALP1 =
ALQ1 =
ALP2 =
ALQ2 =
XDC =
XNC =
TERM1
TERM2
ESUM =
Dll =
D12 =
D22 =
D13 =
D23 =
D33 =
DO 9 I
0.
. - PI
. - P2
ALOG(Pl)
ALOG(Ql)
ALOGCP2)
ALOGCQ2)
1.
CN
= CNXALQ1
= CNXALQ2
0.
0.
0.
0.
0.
0.
0.
= 1, KP
60
-------
c
c
c
c
c
c
25
30
35
40
45
50
55
60
65
CI = I - 1
Dl = EXPUERM1)
D2 = EXP(TERM2)
D = ALPHAXD1 + (1. - ALPHA)XD2
Ed) = TOTXD
ESUM = ESUM + Ed)
XLK = XLK + NFd)xALOG(D)
ALP1
ALQ1 + ALOG(XNC) - ALOG(XDC)
AL92 + ALOG(XNC) - ALOG(XDC)
XDC = XDC + 1.
Tl = NF(I)/D
T2 = -Tl/D
T3 = D1XALPHA
T4 = D2X(1. - ALPHA)
T5 = CI/P1 - (CN - CD/C1. - PI)
T6 = CI/P2 - (CN - CI)/(1. - P2)
T1XT3*CT5X*2 - CI/P1XX2 -
DH = Dll + T2XT3XX2XT5XX2
(CN - CI)/d. - PDXX2)
D22 = D22 + T2XT4XX2XT6XX2 + T1XT4XCT6XX2 - CI/P2XX2 -
(CN -
D12 = D12
D13 = D13
D23 = D23
D33 = D33
CONTINUE
E(KP) = E(KP)
- P2)XX2)
T2XT3XT4XT5XT6
T2XT3XT5x(Dl -
D2)
+ T2XT4XT6XCD1 - D2)
+ T2X(D1 - D2)XX2
+ TOT - ESUM
ALLOW FOR SPECIFIED PARAMETERS
IF(XPd)
IF(XP(2)
IF(XP(3)
IF(XPd)
IF(XP(1)
.GT.
.GT.
.GT.
.GT.
.GT.
ooooo
) Dll = 1
) D22 = 1
) D33 = 1
.OR.XP(2)
,OR.XP(3)
!GT
.GT
.0.)
.0.)
D12
D13
=
0.
0.
IF(XP(2).GT.O..OR.XP(3).GT.O.) D23 = 0.
DET = D11XD22XD33 + 2.XD12XD23XD13 - D13XX2XD22 - D23XX2XD11 -
L D12XX2XD33
WRITE OUT ESTIMATES AND THEIR VARIANCES
IF(XPd).GT.O-) WRITE (6,25) PI
FORMATCO',10X,'SPECIFIED VALUE OF PI =',F11.5)
IF(XP(1).EQ.O.) WRITE (6,30) PI
FORMATCO',10X,'ESTIMATED VALUE OF PI =',F11.5)
IF(XP(2)-GT.O.) WRITE (6,35) P2
',10X,'SPECIFIED VALUE OF P2 =',F11.5)
EQ.O.) WRITE (6,40) P2
',10X,'ESTIMATED VALUE OF P2 =',F11.5)
GT.O.) WRITE (6,45) ALPHA
',10X,'SPECIFIED VALUE OF ALPHA =',F8.5)
IF(XP(3).EQ.O.) WRITE (6,50) ALPHA
FORMATC ',10X,'ESTIMATED VALUE OF ALPHA =',F8.5)
IF(XPd).GT.O..AND.XP(2).GT.O..AND.XP(3).GT.O.) GO TO 11
(D23XX2 - D22*D33)/DET
(D12XD33 - D13XD23)/DET
(D13XD22 - D12XD23)/DET
(D13*X2 - D11XD33)/DET
(D11XD23 - D13*D12)/DET
FORMATC
IF(XP(2)
FORMATC
IF(XP(3)
FORMATC
Vll
V12
V13
V22
V23
V33
= (D12XX2 -
D11XD22)/DET
Vll = 0.
V22 = 0.
IF(XP(1).GT.O,
IF(XP(2).GT.O,
IF(XP(3).GT.O.) V33 = 0.
FORMATC OS10X,'ASYMPTOTIC COVARIANCE MATRIX OF ESTIMATES')
FORMAT c''!21X,' PI ',8X,' P2 ' ,8X,'ALPHA')
WRITE 6,65) Vil, V12, V13, V12, V22, V23, V13, V23, V33
FORMATC '7lOX,'Pl ',3E13.5/11X,'P2 ',3E13.5/11X,'ALPHA',
I 3E13.5)
61
-------
11 CONTINUE
C
C COMPUTE GOODNESS OF FIT
C
CALL GDFIT
WRITE (6,70) XLK
70 FORMAT('0',10X,'LOG-LIKELIHOOD =',F12.4)
RETURN
900 CONTINUE
WRITE (6,905) INT, XP(l), XP(2), XP(3)
905 FORMAT('0',10X,'INVALID PARAMETER SPECIFICATION'/11X,I7,4F11.5)
RETURN
END
C SUBROUTINE TO FIT A POISSON DISTRIBUTION REVISED 9-25-80
C PROGRAMMERS: HASSELBLAD, ANDERSON
SUBROUTINE POISSN
COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C
C NF IS AN ARRAY OF OBSERVED COUNTS
C E IS AN ARRAY OF EXPECTED COUNTS
C XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY
C M IS THE LENGTH OF THE E ARRAY
C INT IS AN OPTIONAL INPUT INTEGER.
C NP IS THE NUMBER OF PARAMETERS ESTIMATED
C IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C XLMB IS THE ESTIMATE OF THE PARAMETER LAMBDA
C
WRITE (6,30) TITLE
30 FORMAT ('1',10X,20A4)
WRITE (6,5)
5 FORMAT CO',10X,'FIT TO A POISSON DISTRIBUTION')
WRITE (6,40)
40 FORMAT('0',10X,'P(X) = EXP(-LAMBDA)LAMBDA*XX/X!')
IF(XP(1).LT.O.) GO TO 900
IF = 1
KP = K + 1
M = K + 2
NP = 1
IF(XPU).GT.O) NP = 0
TOT = NTOT
XLMB = XP(1)
IF(XP(1).GT.O.) GO TO 4
C
C COMPUTE ESTIMATE OF LAMBDA
C
XM2 = 0.
DO 2 I = 1, KP
CI = I - 1
XLMB = XLMB + CI*NF(I)
XM2 = XM2 + CIXX2*NF(I)
2 CONTINUE
XLMB = XLMB/TOT
4 CONTINUE
62
-------
WRITE (6,10) NTOT
10 FORMAT CO', 10X, 'TOTAL SAMPLE SIZE =M6>
IF(XP(1).EQ.O.) WRITE (6,15) XLMB
15 FORMATCO'.IOX, 'ESTIMATED VALUE OF LAMBDA =',F9.4)
IF(XPd).GT.O-) WRITE (6,20) XLMB
20 FORMAT('0',10X,'SPECIFIED VALUE OF LAMBDA =',F9 4)
VAR = XLMB/TOT
IF(XP(1).EQ.O.) WRITE (6,25) VAR
25 FORMATC ',10X,'VARIANCE OF LAMBDA =',E13.5)
XLK = 0.
ALP = ALOG(XLMB)
CUM = 0.
XDC = 1.
TERM = -XLMB
C
C COMPUTE EXPECTED COUNTS, LOG-LIKELIHOOD
C
DO 3 I = 1, KP
Ed) = EXP(TERM)*TOT
CUM = CUM + Ed)
XLK = XLK + NFd)XTERM
TERM = TERM + ALP - ALOG(XDC)
XDC = XDC + 1.
3 CONTINUE
E(M) = TOT - CUM
C
C COMPUTE GOODNESS OF FIT TESTS
C
NDF = NTOT - 1
VART = XM2/XLMB - TOTXXLMB
PVAL = CHISQ(VART,NDF)
IF(XP(1).EQ.O.) WRITE (6,45) VART, PVAL
« FORMAT('0',10X,'VARIANCE TEST =',F8.2,', P =',F6.4)
CALL GDFIT
WRITE (6,35) XLK
35 FORMAT('0',10X,'LOG-LIKELIHOOD =',F12.«)
RETURN
900 CONTINUE
WRITE (6,905) XP(1)
905 FORMAT('0',10X,'INVALID PARAMETER SPECIFIEDV11X,F11.5)
RETURN
END
63
-------
C SUBROUTINE TO FIT A TRUNCATED POISSON DISTRIBUTION
c PROGRAMMERS: HASSELBLAD, ANDERSON REVISED 9-25-80
c
SUBROUTINE TPOISN
COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C
C NF IS AN ARRAY OF OBSERVED COUNTS
C E IS AN ARRAY OF EXPECTED COUNTS
C XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY
C M IS THE LENGTH OF THE E ARRAY
C INT IS AN OPTIONAL INPUT INTEGER.
C NP IS THE NUMBER OF PARAMETERS ESTIMATED
C IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C XLMB IS THE CURRENT ESTIMATE OF THE PARAMETER LAMBDA
C XLMBP IS THE NEW ESTIMATE OF THE PARAMETER LAMBDA
C ZN IS THE ESTIMATE OF THE UNOBSERVED CELL F(8>
C
WRITE (6,30) TITLE
30 FORMAT('1',10X,20A4)
WRITE (6,5)
5 FORMATCO',10X,'FIT TO A TRUNCATED POISSON DISTRIBUTION1)
WRITE (6,20)
20 FORMAT('0',10X,'P(X) = EXP(-LAMBDA)LAMBDA*XX/(X!(l-EXP(-LAMBDA)))'
1)
KP = K + 1
M = K + 2
IF = 2
NT = 0
NP = 1
IF(XP(1).GT.O.) NP = 0
IF(XP(1).LT.O.) GO TO 900
XM1 = 0.
XM2 = 0.
C
C COMPUTE ESTIMATE OF LAMBDA AND FCO)
C
XLMB = XP(1)
DO 2 I = 2, KP
CI = I - 1
NT = NT + NF(I)
XM1 = XM1 + CIXNF(I)
XM2 = XM2 + CI*(d - l.)XNF(I)
2 CONTINUE
TOT = NT
ZN = XM1XX2/XM2 - TOT
WRITE (6,10) NT
10 FORMATCO',10X, 'TOTAL SAMPLE SIZE =',I6)
IF(XP(1).GT.O.) GO TO 6
XLMB = XM1/(ZN + TOT)
C
C BEGIN ITERATION PROCEDURE
C
DO 4 IT = 1, 50
EXLMB = EXP(-XLMB)
ZN = EXLMBXTOT/U. - EXLMB)
XLMBP = XM1/(ZN + TOT)
IF(ABS(XLMBP - XLMB)/AMAX1(XLMB,1.).LE.0.00001) GO TO 7
XLMB = XLMBP
4 CONTINUE
WRITE (6,40)
40 FORMATCO',10X, 'PROCEDURE DID NOT CONVERGE')
7 CONTINUE
XLMB = XLMBP
6 CONTINUE
IF(XP(1).EQ.O.) WRITE (6,15) XLMB
15 FORMATCO',10X,'ESTIMATED VALUE OF LAMBDA =',F8.4)
IF(XP(1).GT.O.) WRITE (6,25) XLMB
64
-------
25 FORMAT('0',10X,'SPECIFIED VALUE OF LAMBDA = ',F8 4)
C
C COMPUTE ESTIMATED VARIANCE OF ESTIMATE
C
VAR = XLMBXX2/(XM1X(XLMB + i. - XM1/TOT))
IF(XP(1).EQ.O)WRITE (6,45) VAR
45 FORMATC ',10X,'ASYMP. VAR. OF LAMBDA =',E13.5)
ALP = ALOG(XLMB)
CUM = 0.
XDC = 2.
TERM = ALP - XLMB - ALOGU. - EXP(-XLMB))
C
C COMPUTE EXPECTED COUNTS
C
Ed) = 0.
DO 3 I = 2, KP
ECI) = EXP(TERM)*TOT
CUM = CUM + ECI)
XL = XL + NFCI)XTERM
TERM = TERM + ALP - ALOG(XDC)
XDC = XDC + 1.
3 CONTINUE
ECM) = TOT - CUM
C
C COMPUTE GOODNESS OF FIT
C
CALL GDFIT
WRITE (6,60) XL
60 FORMATC0',10X,'LOG-LIKELIHOOD =',F12.4)
RETURN
900 CONTINUE
WRITE (6,905) XP(1)
905 FORMATC OSIOX, 'INVALID PARAMETER SPECIFICATIONV11X,F11.5)
RETURN
END
C PSI REVISED 7-14-80
C PROGRAMMER: HASSELBLAD
C COMPUTES THE PSI FUNCTION (ALSO KNOWN AS THE DIGAMMA FUNCTION):
C PSI(Z) = D(LN(GAMMA(Z)))/DZ.
C
C THE PROGRAM USES AN ASYMPTOTIC FORMULA AS GIVEN ON P. 259 OF
C "HANDBOOK OF MATHEMATICAL FUNCTIONS".
C
REAL FUNCTION PSI(Z)
X = Z
5 = 0.
2 CONTINUE
IF(X.GE.5.) GO TO 3
S = S + l./X
X = X + 1.
GO TO 2
. 0031349208/X)))
1 /X - S
RETURN
END
65
-------
C SUBROUTINE TO FIT A MIXTURE OF TWO POISSON DISTRIBUTIONS
c PROGRAMMER: HASSELBLAD REVISED 4-30-so
c
SUBROUTINE POISMX
COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C
C NF IS AN ARRAY OF OBSERVED COUNTS
C E IS AN ARRAY OF EXPECTED COUNTS
C XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY
C M IS THE LENGTH OF THE E ARRAY
C INT IS AN OPTIONAL INPUT INTEGER.
C NP IS THE NUMBER OF PARAMETERS ESTIMATED
C IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C XLMI IS THE ESTIMATE OF THE PARAMETER LAMBDA1
C XLM2 IS THE ESTIMATE OF THE PARAMETER LAMBDA2
C ALPHA IS THE ESTIMATED PROPORTION CORRESPONDING TO THE FIRST
C POISSON DISTRIBUTION
° IF . X
WRITE (6,20) TITLE
20 FORMAT('1',10X,20A4)
C
C CHECK FOR IMPROPER PARAMETER VALUES
C
IF(XPd).LT.O-) GO TO 900
IF(XP(2).LT.O.) GO TO 900
IF(XPC3).LT.O..OR.XP(3).GT.l.) GO TO 900
KP = K + 1
M = K + 2
NP = 3
IFCXPCD.GT.O.) NP = NP - 1
IF(XPC2).GT.O.) NP = NP - 1
IF(XP(3).GT.O.) NP = NP - 1
XLM = 0.
C
C COMPUTE ESTIMATE OF AVERAGE P
C
DO 2 I = 1, KP
XLM = XLM + (I-l)XNF(I)
2 CONTINUE
TOT = NTOT
XLM = XLM/TOT
WRITE (6,5)
5 FORMAT CO',10X,'FIT TO A MIXTURE OF TWO POISSON DISTRIBUTIONS')
WRITE (6,15)
15 FORMAT('0',10X,'P(X) = ALPHA*EXP(-LAMBDA1)X**LAMBDA1/X!
1 'U-ALPHA)EXP(-LAMBDA2)XXXLAMBDA2)/X!')
WRITE (6,10) NTOT
10 FORMAT CO',10X,'TOTAL SAMPLE SIZE =',I6)
C
C COMPUTE INITIAL ESTIMATES
C
ALPHA = 0.5
IF(XP(3).GT.O.) ALPHA = XP(3)
XLMI = 0.75XXLM
IF(XP(1).GT.O.) XLMI = XP(1)
XLM2 = 1.3333XXLM
IF(XP(2).GT.O.) XLM2 = XP(2)
IF(XP(1).GT.O..AND.XP(2).GT.O..AND.XP(3).GT.O.) GO TO 8
C
C BEGIN MIP-EM ITERATION SCHEME
C
DO 6 IT = 1, 2000
XI = 0.
X2 = 0.
XS = 0.
ALM1 = ALOG(XLMl)
66
-------
ALM2 = ALOG(XLM2)
XDC = 1.
TERM! = -XLM1
TERM2 = -XLM2
DO 7 I = 1, KP
CI — I — 1
Dl = EXPCTERM1)
D2 = EXPCTERM2)
D = ALPHAXD1 + (1. - ALPHA)XD2
XI = XI + D1XCIXNF(I)/D
X2 = X2 + D2XCIXNFCI)/D
XS = XS + ALPHAXD1XNFCI)/D
TERM1 = TERM1 + ALM1 - ALOG(XDC)
TERM2 = TERM2 + ALM2 - ALOG(XDC)
XDC = XDC + 1.
7 CONTINUE
XI = Xl/TOT
X2 = X2/TOT
XS = XS/TOT
IFCXPCD.GT.O.) XI = XP(l)
IF(XP(2).GT.O.) X2 = XPC2)
IFCXP(3).GT.O.) XS = XPC3)
C
C TEST FOR CONVERGENCE
C
IFCABS(XLM1-X1)+ABS(XLM2-X2)+ABS(ALPHA-XS).LE.0.0001) GO TO
XLM1 = XI
XLM2 = X2
ALPHA = XS
6 CONTINUE
WRITE (6,95)
95 FORMATCO'.IOX,'PROCEDURE DID NOT CONVERGE')
4 CONTINUE
XLM1 = XI
XLM2 = X2
ALPHA = XS
8 CONTINUE
C
C COMPUTE FINAL LIKELIHOOD, EXPECTED FREQUENCIES,
C AND SECOND PARTIAL DERIVATIVES
C
XLK = 0.
ALM1 = ALOG(XLMl)
ALM2 = ALOGCXLM2)
XDC = 1.
TERM1 = -XLM1
TERM2 = -XLM2
ESUM = 0.
Dll = 0.
D12 = 0.
D22 = 0.
D13 = 0.
D23 = 0.
D33 = 0.
DO 9 I = 1, KP
CI = I - 1
Dl = EXP(TERMl)
D2 = EXPCTERM2)
D = ALPHAXD1 + (I- - ALPHA)XD2
EC!) = TOTXD
ESUM = ESUM * Ed)
XLK = XLK + NFm*ALOG(D> ,vn_
TERM1 = TERM1 + ALM1 - ALOG XDC)
TERM2 = TERM2 + ALM2 - ALOG(XDC)
XDC = XDC + 1.
Tl = NF(I)/D
T2 = -Tl/D
T3 = D1XALPHA
T4 = D2*C1. - ALPHA)
T5 = (CI/XLM1 - 1.)
67
-------
T6 = (CI/XLM2 - 1.)
Dll = Dll + T2XT3XX2XT5XX2 * T1XT3X(T5XX2 - CI/XLM1XX2)
D22 = D22 + T2XT4XX2XT6XX2 * T1XT4XCT6XX2 - CI/XLM2XX2)
D12 = D12 + T2XT3XT4XT5XT6
D13 = D13 + T2XT3XT5XCD1 - D2)
D23 = 023 + T2XT4XT6X(D1 - D2)
D33 = D33 + T2X(D1 - D2)XX2
9 CONTINUE
E(M> = TOT - ESUM
C
C ALLOW FOR SPECIFIED PARAMETERS
C
IF(XP(1).GT.O.) Dll = 1.
IF(XP(2).GT.O.) D22 = 1.
IFCXP(3).GT.O.) D33 = 1.
IF(XP(1).GT.O..OR.XP(2).GT.O.) D12 = 0.
IF(XP(1).GT.O..OR.XP(3).GT.O.) D13 = 0.
IF(XP(2).GT.O..OR.XP(3).GT.O.) D23 = 0.
DET = D11XD22XD33 + 2.XD12XD23XD13 - D13XX2XD22 - D23XX2XD11 -
1 D12XX2XD33
C
C WRITE OUT ESTIMATES AND THEIR VARIANCES
C
IF(XP(1).GT.O.) WRITE (6,25) XLM1
25 FORMATCO',10X,'SPECIFIED VALUE OF LAMBDA1 ='fF11.5>
IF(XP(1).EQ.O.) WRITE (6,30) XLM1
30 FORMATCO'»10X,'ESTIMATED VALUE OF LAMBDA1 =',F11.5>
IF(XP(2).GT.O.) WRITE (6,35) XLM2
35 FORMATC ',10X,'SPECIFIED VALUE OF LAMBDA2 =',F11.5>
IF(XP(2).EQ.O.) WRITE (6,40) XLM2
40 FORMATC ',10X,'ESTIMATED VALUE OF LAMBDA2 s'.Fll.S')
IF(XP(3).GT.O.) WRITE (6,45) ALPHA
45 FORMATC ',10X,'SPECIFIED VALUE OF ALPHA =',F13.5)
IF(XP(3).EQ.O.) WRITE (6,50) ALPHA
50 FORMATC ',10X,'ESTIMATED VALUE OF ALPHA =',F13.5)
IF(XP(1).GT.O..AND.XP(2).GT.O..AND.XP(3).GT.O.) GO TO U
Vll = (D23XX2 - D22XD33)/DET
V12 = (D12XD33 - D13XD23)/DET
V13 = (D13XD22 - D12XD23)/DET
V22 = (D13XX2 - D11XD33)/DET
V23 = (D1IXD23 - D13XD12)/DET
V33 = (D12XX2 - D11XD22)/DET
IF(XP(1).GT.O.) Vll = 0.
IF(XP(2)-GT.O.) V22 = 0.
IF(XP(3).GT.O.) V33 = 0.
WRITE (6,55)
55 FORMATCO',10X,'ASYMPTOTIC COVARIANCE MATRIX OF ESTIMATES')
WRITE (6,60)
60 FORMATC ',22X,'LAMBDA1',6X,'LAMBDA2',6X,'ALPHA')
WRITE (6,65) Vll, V12, V13, V12, V22, V23, V13, V23, V33
65 FORMATC ',10X,'LAMBDA1',3E13.5/11X,'LAMBPA2',3E13.5/11X,'ALPHA',
1 2X,3E13.5)
11 CONTINUE
C
C COMPUTE GOODNESS OF FIT
C
CALL GDFIT
WRITE (6,70) XLK
70 FORMATCO'.IOX,'LOG-LIKELIHOOD =',F12.4)
RETURN
900 CONTINUE
WRITE (6,905) INT, XP(1), XP(2), XP(3)
905 FORMATCO',10X,'INVALID PARAMETER SPECIFICATIONT/UX,I7,4F11.5)
RETURN
END
68
-------
C
c
c
C
C
C
C
C
C
C
C
c
SUBROUTINE TO FIT A NEGATIVE BINOMIAL DISTRIBUTION
PROGRAMMER: HASSELBLAD REVISED
SUBROUTINE NEGBIN
COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
9-30-80
C
C NF IS AN ARRAY OF OBSERVED COUNTS
C E IS AN ARRAY OF EXPECTED COUNTS
C XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY
C M IS THE LENGTH OF THE E ARRAY
C INT IS AN OPTIONAL INPUT INTEGER.
C NP IS THE NUMBER OF PARAMETERS ESTIMATED
C IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C P IS THE ESTIMATE OF THE PARAMETER P
C XN IS THE ESTIMATE OF THE PARAMETER N
C XNP IS THE IMPROVED ESTIMATE OF THE PARAMETER N
C Nl, N2, CN1, CN2, XLK1, XLK2, TERM1, TERM2, ALT1, ALT2 ARE USED
C TO FIND THE OPTIMAL INTEGER SOLUTION FOR N
C
WRITE (6,30) TITLE
30 FORMAT ( ' 1 ' ,10X,20A4)
WRITE (6,5)
5 FQRMAT('0',10X,'FIT TO A NEGATIVE BINOMIAL DISTRIBUTION')
WRITE (6,10)
10 FORMAT('0',10X,'P(X) = (N+X-1) !PX*X/(X! (N-l) ! (H-P)XX(N+X) ) ' )
WRITE (6,15) NTOT
15 FORMAT('0',10X, 'TOTAL SAMPLE SIZE =',I6)
IF(XP(1).LT.O..OR.XP(2).LT.O.) GO TO 900
IF = 1
KP = K + 1
M = K + 2
NP = 2
IF(XP(1).GT.O.) NP = NP - 1
IF(XP(2).GT.O.) NP = NP - 1
TOT = NTOT
COMPUTE MOMENTS
XM1 = 0.
XM2 = 0.
DO 2 I = 1, KP
CI = I - 1
XM1 = XM1 + CIXNF(I)
XM2 = XM2 + CIXX2XNF(I)
CONTINUE
XM1 = XM1/TOT
V = XM2/TOT - XM1XX2
COMPUTE INITIAL ESTIMATES
P = XP(1)
V IJ •- YP f J \
IFCXP(1).GT.O..AHD.XPC2).GT.O.) GO TO
IF(XPU) EQ.O..
IF(XP(2)iEQ.O.) XN = XM1/P
IF(XP(2)-GT.O.) GO TO
USE NEWTON-RAPHSON ITERATION SCHEME FOR N IF NOT SPECIFIED
DO 7 IT = 1, 100
SUM = 0.
SUM1 = 0.
S = 0.
SI = 0.
TERM = XN - 1.
69
-------
DO
7
9
95
8
C
C
C
17
6 I = 1, K
TERM = TERM + 1.
S = S + l./TERM
SI = SI + l./TERM**2
SUM = SUM + SXNFCI+l)
GT
EQ
GT
Fl
Fl
- F/F1
GO TO 9
XN)/XN).
SUM1 = SUM1
CONTINUE
IFCXPCD.EQ.O
IFCXPCD
IFCXPCD
IFCXPCD
XNP = XN
IFCXNP.LE.O.)
IFCABSCCXNP -
XN = XNP
CONTINUE
CONTINUE
WRITE C6,95)
FORMATC'0',10X, 'PROCEDURE
CONTINUE
XN = XNP
IFCXPCD.EQ.O
CONTINUE
NXB = XM1
NI = XN
N2 = Nl + 1
IF(N1.LE.O)N1
CN1 = Nl
S1XNFCI + D
F = TOTXALOGC1. +
F = TOTXALOGC1. +
XM1/XN) -
P) - SUM
SUM
= -CTOTXXMD/CXNXX2
= SUM1
+ XM1XXN) + SUM1
LE.0.0001) GO TO 8
DID NOT CONVERGE')
) P = XM1/XN
= 1
CN2 = N2
PI = XPCD
P2 = XPCD
IFCXPCD.EQ.O,) PI
IFCXPCD.EQ.O.) P2
= XM1/CN1
= XM1/CN2
COMPUTE LIKELIHOOD, EXPECTED COUNTS
XLK = 0.
XLK1 = 0.
XLK2 = 0.
ESUM = 0.
XNC = XN
XNC1 = CN1
XNC2 = CN2
Q = 1. + P
Ql = 1. + PI
Q2 = 1. + P2
ALT = ALOGCP) - ALOGCQ)
ALT1 = ALOGCP1) - ALOGCQ1)
ALT2 = ALOGCP2) - ALOGCQ2)
TERM = -XNXALOGCQ)
TERM1 = -CN1XALOGCQ1)
= -CN2XALOGCQ2)
I = 1, KP
= I - 1
TOTXEXP(TERM)
+ ECI)
TERM2
DO 17
CI
ECI) =
ESUM = ESUM
XLK = XLK +
XLK1 = XLK1 +
XLK2 = XLK2 +
TERM = TERM +
TERM1 = TERM1
TERM2 = TERM2
XNC = XNC + 1
XNC1 = XNC1 +
XNC2 = XNC2 +
CONTINUE
ECM) = TOT - ESUM
Dll = -TOTXCXNXPXX2
D12 = TOT/Q
D22 = SUM1
NFCDXTERM
+ NFCDXTERMl
+ NFCDXTERM2
ALT + ALOGCXNC) -
ALOGCCI + 1.)
ALT1 •»• ALOG(XNCl) - ALOGCCI
ALT2 + ALOGCXNC2) - ALOGCCI
1.)
1.)
1.
1.
- 2.XPXXM1 - XM1)/CPXQ)XX2
70
-------
c
C WRITE OUT ESTIMATES AND THEIR VARIANCES
C
IFCXPCD.GT.O.) WRITE (6,20) P
20 FORMATCO',10X, 'SPECIFIED VALUE OF P =',F12 5)
IF(XP(1).EQ.O.) WRITE (6,25) P
25 FORMATCO'.IOX, 'ESTIMATED VALUE OF P =',F12 5)
IF(XP(2).GT.O.) WRITE (6,35) XN
35 FORMATC ', 10X, 'SPECIFIED VALUE OF N =',F12 5)
IF(XP(2).EQ.O.) WRITE (6,40) XN
40 FORMATC ',10X, 'ESTIMATED VALUE OF N =',F12.5)
IF(XP(1).GT.O..AND.XP(2).GT.O.) GO TO 19
IF(XP(1).GT.O.) Dll = 1.
IF(XP(2).GT.O.) D22 = 1.
IF(XP(1).GT.O..OR.XP(2).GT.O.) D12 = 0.
DET = D11XD22 - D12XX2
TEMP = Dll
Dll = D22/DET
D22 = TEMP/DET
D12 = -D12/DET
IF(XP(1).GT.O.) Dll = 0.
IF(XP(2).GT.O.) D22 = 0.
WRITE (6,45)
45 FORMATC 0',10X,' ASYMPTOTIC COVARIANCE MATRIX OF ESTIMATES')
WRITE (6,50)
50 FORMAT('0',21X,'P',12X,'N')
WRITE (6,55) Dll, D12, D12, D22
55 FORMATC ' ,10X, 'P' ,2X,2E13.5/11X, 'N' ,2X,2E13.5)
19 CONTINUE
IF(XP(2).GT.O.) GO TO 22
C
C FIND INTEGER SOLUTION
C
IF(XLK2.LE.XLK1) GO TO 21
PI = P2
Nl = N2
XLK1 = XLK2
21 CONTINUE
WRITE (6,70) PI, Nl, XLK1
70 FORMATCO'.IOX, 'SOLUTION FOR INTEGER N (PASCAL DISTRIBUTION):'/
1 11X,'P =',F8.5,', N =',15,', LOG-LIKELIHOOD =',F12.4)
22 CONTINUE
C
C COMPUTE GOODNESS OF FIT
C
CALL GDFIT
WRITE (6,65) XLK
65 FORMATCO',10X, 'LOG-LIKELIHOOD =',F12.4)
RETURN
900 CONTINUE
905 FORMATCoaOX'INVALIDpARAMETER SPECIFICATION'/18X,2F11.5)
RETURN
920 CONTINUE
925 FORMATCO%10X%ARIANCE C,F10.5,') < MEAN ( ' ,FIO .5, ' ) '/
1 11X,'NO FIT ATTEMPTED')
RETURN
END
71
-------
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
5
35
15
25
c
c
c
SUBROUTINE TO FIT A TRUNCATED NEGATIVE BINOMIAL DISTRIBUTION
PROGRAMMERS: ANDERSON, HASSELBLAD REVISED 10-20-80
SUBROUTINE TNEGBN
DIMENSION R(200)
NF IS AN ARRAY OF OBSERVED COUNTS
E IS AN ARRAY OF EXPECTED COUNTS
XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
K IS THE VALUE CORRESPONDING TO THE LARGEST OBSERVED FREQUENCY
M IS THE LENGTH OF THE E ARRAY
INT IS AN OPTIONAL INPUT INTEGER.
NP IS THE NUMBER OF PARAMETERS ESTIMATED
IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
XN IS THE ESTIMATE OF THE PARAMETER N.
W IS THE ESTIMATE OF THE PARAMETER W, WHERE P = (1 - W)/W
P IS THE ESTIMATE OF THE PARAMETER P
SW IS THE FIRST PARTIAL WITH RESPECT TO W
SN IS THE FIRST PARTIAL WITH RESPECT TO N
R IS THE ARRAY OF CUMULATIVE FREQUENCIES
COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
WRITE (6,5) TITLE
FORMAT('1',10X,20A*)
WRITE (6,35)
FORMAT('0',10X,'FIT TO A TRUNCATED NEGATIVE BINOMIAL')
WRITE (6,15)
FORMAT('0',10X,'P(X) = (N+X-1) !PXXX/(X! (N-l) ! (l+P)*x(N+X) (1-
WRITE (6,25) NTOT
FORMAT('0',10X, 'TOTAL SAMPLE SIZE =',I6)
IF(XP(1).LT.O..OR.XP(2).LT.O.)GO TO 900
IF = 2
NT = 0
KP = K +• 1
M = K + 2
NP = 2
IF(XP(1).GT.O
IF(XP(2).GT.O
XM = 0.
DO 2 I = 2,
CI = I -
XM = XM
NT = NT
CONTINUE
TOT = NT
XM = XM/TOT
Q = 0.
QS = 1. - XM
DO 3 I = 2, KP
Q = Q + NF(I)XQSXX2
QS = QS + 1.
CONTINUE
S = NF(2)/TOT
COMPUTE INITIAL ESTIMATES
W = XMX(1. - S)X(TOT -
) NP = NP - 1
) NP = NP - 1
KP
1
NF(I)XCI
NF(I)
IF(W.LE.O.l) W = 0.1
IF(W.GE.0.9) W = 0.9
XN = (WXXM - S)/(l. - W)
IF(XN.LE.O.S) XN = 0.5
S = TOTXXM
P = (1. -W)/W
IF(XP(1).GT.O.) P = XP(1)
IF(XP(2).GT.O.) XN = XP(2)
IF(XP(1).GT.O.) W = l./U. + P)
72
-------
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
8
7
95
6
FORM THE CUMULATIVE DISTRIBUTION OF OBSERVATIONS
R(KP) = NF(KP)
DO 4 I = 1, K
J = KP - I
R(J) * RCJ+1) + NF(J)
CONTINUE
IF(XP(1).GT.O..AND.XP(2).GT.O.) GO TO 6
ITERATE USING A NEWTON-RAPHSON ITERATION SCHEME
DO 8 IT ~ I, 100
Dll = 0.
D22 = 0.
Q = XN
DO 9 I = 2, KP
QS = RCD/Q
Dll = Dll + QS
D22 = D22 + QS/Q
Q - Q + 1.
CONTINUE
XK = TOTXXN
WM = 1. - W
Q = WXXXN
QS = 1. - Q
AL = ALOG(W)
SU = XK/CWXQS) - S/WM
SN = TOTXAL/QS + Dll
IF(XP(l).GT.d.) SW a o.
IFCXP(2).GT.O.) SN = d.
QM = QSXX2
COMPUTE TERMS OP INFORMATION MATRIX
D12 = WXQM
Dll = XKXC1. - (XN + l.)XQ)/CWXD12> + S/WMXX2
D22 = D22 - TOTXAIXX2XQ/QM
D12 = TOTxd. - (1. - XNXAL)XQ)/D12
IF(XP(1).GT.O..OR.XP(2).GT.O.) D12 = 0.
DET = D11XD22 - D12XX2
IFCDET.LE.O.) DET = D11XD22
VW = D22/DET
VN = Dll/DET
COV = D12/DET
CHECK FOR NEGATIVE VARIANCE ESTIMATES
IF(VW.LT.O.) GO TO 7
IF(VN.LT.O.) GO TO 7
ADJUST N AND W FOR NEXT ITERATION
UP = W
y&jp — YM
IFCXP(1).EQ.O.) U = W + SWXVW + SNXCOV
IF(XP(2).EQ.O.) XN = XN + SWXCOV + SNXVN
IF(W.GT.l-) U = (1. + WP)/2.
IF(W.LE.O.) W = WP/2.
IF(XN.LE.O-) XN = XNP/2.
CHECK FOR CONVERGENCE
IF(CABSCSW) + ABS(SN)).LT.0.0001) GO TO 6
CONTINUE
CONTINUE
FORMAK'OSIOX, 'PROCEDURE DID NOT CONVERGE')
CONTINUE
IFCXPUKEQ.O.) P = (1. -
73
-------
C COMPUTE LIKELIHOOD, EXPECTED COUNTS
C
XLK = 0.
ESUM = 0.
XNC = XN
TERM = XNXALOG(W) * ALOG(XN) + ALOGd. - W) - ALOGd. - UXXXN)
DO 12 I = 2, KP
CI = I - 1
ECI) = TOTKEXPCTERM)
ESUM = ESUM + Ed)
XLK = XLK + NF(I)*TERM
XNC = XNC + 1.
TERM = TERM + ALOGd. - W) + ALOG(XNC) - ALOGCCI + 1.)
12 CONTINUE
E(M> = TOT - ESUM
C
C WRITE OUT ESTIMATES AND THEIR VARIANCES
C
IF(XP(1).GT.O.) WRITE (6.60) P
60 FORMAT('0',10X,'SPECIFIED VALUE OF P =',F12.5)
IF(XP(1).EQ.O.) WRITE (6,65) P
65 FORMATC 0',10X,'ESTIMATED VALUE OF P =',F12.5)
IF(XP(2).GT.O.) WRITE (6,70) XN
70 FORMATC ',10X,'SPECIFIED VALUE OF N =',F12.5)
IF(XP(2).EQ.O.) WRITE (6,75) XN
75 FORMATC ',10X,'ESTIMATED VALUE OF N =',F12.5)
IF(XP(1).GT.O..AND.XP(2).GT.O.) GO TO 13
Dll = D11XUXX4 + 2.XSWXWXX3
D12 = -D12XWXX2
IF(XP(1).GT.O.) Dll = 1.
IF(XP(2).GT.O.) D22 = 1.
IF(XP(1).GT.O..OR.XP(2).GT.O.) D12 = 0.
DET = D11XD22 - D12XX2
TEMP = Dll
Ml = D22/DET
D22 = TEMP/DET
D12 = D12/DET
IF(XPd).GT.O-) Dll = 0.
IF(XP(2).GT.O.) D22 = 0.
WRITE (6,80)
80 FORMATCO'.IOX,'ASYMPTOTIC COVARIANCE MATRIX OF ESTIMATES')
WRITE (6,85)
85 FORMAT(tO',21X,'P',12X,'N')
WRITE (6,90) Dll, D12, D12, D22
90 FORMATC ',10X,'P',2X,2E13.5/11X,'N',2X,2E13.5)
13 CONTINUE
C
C COMPUTE GOODNESS OF FIT
TD
CALL GDFIT
WRITE (6,40) XLK
40 FORMATC0',10X,'LOG-LIKELIHOOD =',F12.4)
RETURN
900 CONTINUE
WRITE (6,905) XPd), XP(2)
905 FORMATCOS10X, 'INVALID PARAMETER SPECIFICATION'/18X,2F11.5)
RETURN
END
74
-------
C SUBROUTINE TO FIT A LOGARITHMIC DISTRIBUTION REVISED 7-SO-an
C PROGRAMMERS: ANDERSON, HASSELBLAD REVISED 7 30-80
SUBROUTINE LOGRTH
COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C NF IS AN ARRAY OF OBSERVED COUNTS
C E IS AN ARRAY OF EXPECTED COUNTS
C XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C H it Ml LENGTEHC§RFRTEHPE0ED5^AY° ™E URGEST °BSERVED FREQUENCY
C INT IS AN OPTIONAL INPUT INTEGER.
C NP IS THE NUMBER OF PARAMETERS ESTIMATED
£ IF luJrl IN£Ti™ ES52iJENCY USED AND IS l FOR MOST DISTRIBUTIONS;
C .,To£N™IF.u= 2 FOR TRUNCATED AND LOGARITHMIC DISTRIBUTIONS.
C NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C
WRITE (6,30) TITLE
30 FORMAT('1',10X,20A4)
WRITE (6,5)
5 FORMAT('0',10X,'FIT TO A LOGARITHMIC DISTRIBUTION')
KP = K + 1
NP = 1
IF(XP(1).GT.O)NP = 0
M = K + 2
IF = 2
NT = 0
NUM = 0
C
C COMPUTE SAMPLE SIZE AND MEAN
C
XM1 = 0.
DO 2 I = 2, KP
CI = I - 1
NT = NT + NF(I)
XM1 = XM1 + NF(I)XCI
2 CONTINUE
TOT = NT
XM1 = XM1/TOT
WRITE (6,10) NT
10 FORMAT('0',10X, 'TOTAL SAMPLE SIZE =',I6)
C
C CHECK FOR IMPROPER PARAMETER VALUE
C
THETA = XP(1)
IF(XP(1).GT.O.) GO TO 3
TEMP = ALOG(XMl)
C
C COMPUTE INITIAL ESTIMATE
° THETA = I. - 1./C1. + (C5./3. - TEMP/16. )X(XM1 - 1.) + 2.)*TEMP)
C
C USE A NEUTON-RAPHSON ITERATION SCHEME
C
DO 4 I = 1, 50
THETP = = 1fHETAH-T(XMl + THETA/(TEMPXALOG(TEMP)) )X( ((TEMP*
1 ALOG(TEMP))*«/(ALOG(TEMP)+ THETA)))
IF(ABS(THETP-THETA).LE. 0.00001) GO TO 6
THETA = THETP
4 CONTINUE
95 FORMAT (• 6 S10X, 'PROCEDURE DID NOT CONVERGE')
6 CONTINUE
THETA = THETP
- THET. .-.».«
75
-------
C COMPUTE ESTIMATED VARIANCE OF ESTIMATE
C
ALT =-ALQGU. - THETA)
VAR =~THETAXTEMPXX2XALOG(TEMP)/(ALTX(ALOG(TEMP) * THETA))
IF(XPCl).EQ.O) WRITE (6,45) VAR
45 FORMATC ',10X,'ASYMP. VAR. OF THETA =',E13.5)
CUM = 0.
C
C COMPUTE EXPECTED COUNTS, LOG-LIKELIHOOD
C
Ed) = 0.
XL = 0.
DO 7 I = 2, KP
CI = 1-1
TERM = THETAXXd - 1)/(CIXALT)
Ed) = TERM*TOT
XL = XL + NFd)XALOG(TERM)
CUM = CUM + Ed)
7 CONTINUE
ECM) = TOT - CUM
C
C COMPUTE GOODNESS OF FIT
C
CALL GDFIT
WRITE (6,20) XL
20 FORMATCOMOX,'LOG-LIKELIHOOD = ',F12.
-------
C SUBROUTINE TO COMPUTE A CHI-SQUARE GOODNESS OF FIT TEST
c PROGRAMMER: HASSELBLAD OUUHMS OF FIJEJ||ED 10-2o-80
SUBROUTINE GDFIT
COMMON TITLE(20),NF(200),E(200),XP(4),K,M,INT,NP,IF,NTOT
C THIS ROUTINE CONMPUTES A CHI-SQUARE GOODNESS OF FIT TEST BASED ON
S S™°SFRJf?,i f ?u PTIiJE?nFREQUENCIES- "HE CELLS ARE COLLAPSED
C FROM THE RIGHT AND LEFT TO GET A MINIMUM EXPECTED FREQUENCY OF
C EXMIN.
C
C NF IS AN ARRAY OF OBSERVED COUNTS
C E IS AN ARRAY OF EXPECTED COUNTS
C XP IS AN ARRAY OF OPTIONAL INPUT CONSTANTS
C M IS THE LENGTH OF THE E ARRAY
C INT IS AN OPTIONAL INPUT INTEGER.
C NP IS THE NUMBER OF PARAMETERS ESTIMATED
C IF IS THE INITIAL FREQUENCY USED AND IS 1 FOR MOST DISTRIBUTIONS;
C AND IF = 2 FOR TRUNCATED DISTRIBUTIONS.
C NTOT IS THE SUM OF THE COUNTS IN THE NF ARRAY
C
C
DATA EXMIN /5./
C
C EXMIN IS THE MINIMUM EXPECTED FREQUENCY ALLOWED AFTER COLLAPSING.
C
NDF = 1 - NP
KP = K + 1
TOT = 0.
DO 9 I = IF, M
TOT = TOT + E(I)
9 CONTINUE
ESUM = 0.
NSUM = 0.
WRITE (6,5)
5 FORMAT? '0',13X, 'CHI-SQUARE GOODNESS OF FIT TESTV'O'.IOX,
I 'VALUE OBSERVED EXPECTED CHI-SQUARE V ')
C
C SUM INITIAL EXPECTED FREQUENCIES UNTIL THEY REACH EXMIN
C
DO 2 I = IF, M
J = I
IT = I - 1
ESUM = ESUM + Ed)
NSUM = NSUM + NFd)
I F( ESUM. GE. EXMIN) GO TO 3
WRITE (6,10) IT, NFd), Ed)
10 FORMAT? • ',10X,I3,I9,F12.2,7X,'~')
2 CONTINUE
GO TO 900
3 CONTINUE
CHIS = (NSUM - ESUM)X*2/ESUM
TCHS = CHIS
WRITE (6,15) IT, NF(J), E(J), CHIS
15 FORMATC ' ,10X,I3, 19,F12.2,F10 .2)
L = J + 1
C WRITE OUT INTERMEDIATE COUNTS UNTIL REMAINING EXPECTED FREQUENCIES
C ARE LESS THAN XMIN.
C
DO 4 I = L, M
J = I
IT = I - 1
ESUM = ESUM + Ed)
NSUM = NSUM + NF(I)
IF(TOT-ESUM.LT. EXMIN) GO TO 6
- E(I))**2/E(I)
TCHS = TCHS + CHIS
WRITE (6,15) IT, NF(J), E(J), CHIS
CONTINUE
77
-------
6 CONTINUE
ESUM * TOT - ESUM + E(J)
CHIS = (NTOT - NSUM + NFU) - ESUM)X*2/E5UM
TCHS = TCHS + CHIS
IFU.LT.M.OR.KP.EQ.M) WRITE(6,15) IT, NFU), E(J), CHIS
IF(J.EQ.M.AND.KP.LT.M) WRITE (6,25) IT, NFU), E(J), CHIS
25 FORMATC ',10X,13,'+',18,F12.2,F10.2)
IF(J.GE.M) GO TO 7
L = J + 1
C
C WRITE OUT REMAINING EXPECTED FREQUENCIES AND COUNTS
C
DO 8 I = L, M
J = I
IT = I - I
IFU.LT.M.OR.KP.EQ.M) WRITE (6,10) IT, NF(I), Ed)
IFU.EQ.M.AND.KP.LT.M) WRITE (6,30) IT, NF(I), Ed)
30 FORMATC ', 10X,I3, • + •, 18, F12.2,7X,' —' )
8 CONTINUE
7 CONTINUE
PVAL = CHISQ(TCHS,NDF)
WRITE (6,20) TCHS, NDF, PVAL
20 FORMATC0',10X,'CHI-SQUARE =',F8.2,', D.F. =',13,', P =',F6-4)
RETURN
900 CONTINUE
WRITE (6,905)
905 FORMATC0',10X,'INSUFFICIENT EXPECTED FREQUENCIES')
RETURN
END
78
-------
TECHNICAL REPORT DATA
(flease read Instructions on the reverse before completing)
. REPORT NO.
EPA-600/2-81-010
3. RECIPIENT'S ACCESSION>NO.
^ TITLE AND SUBTITLE
Disfit: A Distribution Fitting System
1. Discrete Distribution
5. REPORT DATE
January 1981 Issuing Date.
6. PERFORMING ORGANIZATION CODE
7, AUTHORIS)
Victor Hasselblad, Andy Stead, Helen Anderson
8. PERFORMING ORGANIZATION REPORT NO.
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Biometry Division
Health Effects Research Laboratory
U.S. Environmental Protection Agency
Research Triangle Park, NC 27711
10. PROGRAM ELEMENT NO.
1AA817
11. CONTRACT/GRANT NO.
12. SPONSORING AGENCY NAME AND ADDRESS
Office of Research and Development
Health Effects Research Laboratory
U.S. Environmental Protection Agency
Research Triangle Park. NC 27711
13. TYPE OF REPORT AND PERIOD COVERED
User's Guide
14. SPONSORING AGENCY CODE
EPA 600/11
15. SUPPLEMENTARY NOTES
16. ABSTRACT
The DISFIT system is a series of programs and subroutines to fit distributions
to data. This first volume describes the routines to fit discrete distributions.
The distributions included are the binomial, truncated binomial, mixture of two
binomials, beta binomial, poisson, truncated Poisson, mixture of two Poissons,
negative binomial, truncated negative binomial, and logarithmic. All parameters
are estimated using maximum likelihood techniques. Any of the parameters may be
specified instead of estimated. Variances of estimated asymptotic variances of
the parameter estimates are also given. Some tests of hypotheses are possible
using likelihood ratio tests. This guide contains the descriptions, calling
sequences, documentation, and examples for each distribution. The program is
written entirely in FORTRAN, and a listing of the program is in the appendix.
KEY WORDS AND DOCUMENT ANALYSIS
DESCRIPTORS
b.lDENTIFIERS/OPEN ENDED TERMS
c. COSATI Field/Group
3. DISTRIBUTION STATEMENT
)
RELEASE TO PUBLIC
19. SECURITY CLASS (ThisReport)
UNCLASSIFIED
85
20. SECURITY CLASS (Thispage)
UNCLASSIFIED
22. PRICE
EPA Form 2220-1 (9-73)
* US. GOVERNMENT PRINTING OFFICE; 1981 -757-064/0236
------- |