Small Sample Properties of Various Tests on 2 x 2 Tables


United States
Environmental Protection
Agency
Health Effects Research
Laboratory
Research Triangle Park NC 27711
F:PA 600 9-80-029
M.iy 1980
Research and Development
Small Sample
Properties of Various
Tests on 2x2 Tables

-------
                 RESEARCH REPORTING SERIES

 Research reports of the Office of Research and [Development, U.S. Environmental
 Protection Agency, have been grouped into nine series. These nine broad cate-
 gories were established to facilitate further development and application of en-
 vironmental technology. Elimination  of traditional grouping was consciously
 planned to foster technology transfer and a maximum interface in related fields.
 The nine series are:
      1.   Environmental Health Effects Research
      2.   Environmental Protection Technology
      3.   Ecological Research
      4.   Environmental Monitoring
      5.   Socioeconomic Environmental  Studies
      6.   Scientific and Technical Assessment Reports (STAR)
      7.   Interagency Energy-Environment Research and Development
      8.   "Special"  Reports
      9.   Miscellaneous Reports
This document is available to the public through the National Technical Informa-
tion Service, Springfield, Virginia 22161.

-------
                                            EPA-600/9-80-029
                                            May  1980   .
     SMALL SAMPLE PROPERTIES OF VARIOUS
            TESTS ON 2 x 2 TABLES
                     by

             Victor Hasselblad
              Andrew G. Stead
    Statistics and Data Management Office
     Health Effects Research Laboratory
Research Triangle Park, North Carolina  27711
     Health Effects Research Laboratory
     Office of Research and Development
    U.S. Environmental Protection Agency
Research Triangle Park, North Carolina  27711

-------
                                DISCLAIMER
     This report has been reviewed by the Health Effects Research Laboratory,
U.S. Environmental Protection Agency, and approved for publication.   Mention
of trade names or commercial  products does not constitute endorsement or
recommendation for use.
                                    ii

-------
                                  FOREWARD
      The many benefits of our modern, developing,  industrial  society are
accompanied by certain hazards.   Careful  assessment of the relative risk of
existing and new man-made environmental  hazards is  necessary for the estab-
lishment of sound regulatory policy.   These regulations serve to enhance the
quality of our environment in order to promote the  public health and welfare
and the productive capacity of our nation's population.

      The Health Effects Research Laboratory, Research Triangle Park,
conducts a coordinated environmental  health research program in toxicology,
epidemiology, and clinical studies using human volunteer subjects.   These
studies address problems in air pollution, non-ionizing radiation,  environ-
mental carcinogenesis and the toxicology of pesticides as well as other
chemical pollutants.  The Laboratory participates in the development and
revision of air quality criteria documents on pollutants for which  national
ambient air quality standards exist or are proposed, provides the data for
registration of new pesticides or proposed suspension of those already in
use, conducts research on hazardous and toxic materials, and is primarily
responsible for providing the health basis for non-ionizing radiation stan-
dards.  Direct support to the regulatory function of the Agency is  provided
in the form of expert testimony and preparation of affidavits as well as
expert advice to the Administrator to assure the adequacy of health care and
surveillance of persons having suffered imminent and substantial endangerment
of their health.

       In this paper, the small sample properties of the analysis of categor-
ical data methods of Grizzle, Starner, and Koch are explored.  Also, the
small  sample properties of maximum likelihood  techniques are examined.  Both
analysis techniques have been used to analyze  cross-sectional studies of
respiratory behavior, and an understanding of  the behavior of these tests
when applied to small samples will enhance their applicability in future
respiratory studies.
                                      F. G. Hueter, Ph. D.
                                        Director
                              Health Effects Reserach Laboratory
                                      m

-------
                                  ABSTRACT
     With particular interest in the likelihood ratio tests and tests gener-
ated by the categorical analysis methods of Grizzle, Starmer, and Koch, the
small sample behavior of actual a levels is graphically illustrated for
several asymptotically chi-square statistics when testing is performed on
2x2 contingency table data at a nominal level of .05.   This behavior is
examined for hypotheses testing both homogeneity and independence.   In the
homogeneity case, the effects on these levels are examined for equal  and
unequal sample sizes.  In the independence case, the effects of different
constraints concerning individual cell probabilities are demonstrated.
Visual  comparisons of the power of the likelihood ratio and the Grizzle,
Starmer, Koch related statistics are also provided.
                                     iv

-------
                                  SECTION 1

                                 INTRODUCTION

     In several cross-sectional  studies of respiratory conditions conducted
by the U. S. Environmental Protection Agency [1-3], the resulting multiway
contingency tables were analyzed using the analysis of categorical data
methods of Grizzle, Starmer, and Koch (GSK).  As these authors indicate:
"the behavior of the tests in small samples  is unknown" [4].

     Additionally, since it is often possible to analyze the same date using
maximum likelihood techniques, we were quite interested in examining the
small sample properties of both procedures from the hypothesis testing
standpoint.

     Although simulation could be used to study the properties of these tests
in larger tables, we restricted our investigation to the 2x2 table.  In
this way, the exact probability levels of the tests could be calculated.

     Literature discussing various aspects of 2 x 2 tables abounds.  Several
papers, however, are particularly pertinent to our study.  In 1967, Grizzle
[5]  investigated the effect of the continuity correction when testing for
either homogeneity or independence.  Garside and Mack [6], in 1976, discussed
actual Type I error probabilities for various tests of the homogeneity case.
In 1977, Eberhardt and Fligner  [7] compared actual a levels, as well as the
power, of the usual x2 test for homogeneity with a test equivalent to
Goodman's y2 in small to moderate samples.  More recently (1978), Larntz  [8]
compared exact levels of the likelihood ratio, Freeman-Tukey, and Pearson
chi-squared goodness-of-fit statistics in small samples.

     Our paper will compare the following tests, when appropriate, under both
hypotheses described by Grizzle  [5]:  1) the GSK, assuming either a linear or
log-linear model, 2) the likelihood ratio test, 3) the usual x2 test, 4) the
x2 test with Yates continuity correction, and 5) Fisher's "exact" test.  The
GSK  test in the homogeneity case  is the same as the y2 statistic proposed by
Goodman  [9].

-------
                                  SECTION 2

                          THE ASSUMPTIONS AND TESTS


     A sample of n observations has been classified according to two
dichotomous characteristics, A and B.  Using the notation of Grizzle [5], we
have the following table:

                                 A,     A2

                                 BY      Y       Y
                          1      AII    Aja     Aj.

                         U2      A 2 1    "22     A2*

                                 X.i    X.a     n

     The two sampling methods of interest are:   1) X.t and X.a  fixed,  and
the hypotheses compare the difference in proportions of two binomial  polula-
tions, i.e. H0:  Pi = pa; and 2) only n is fixed, and the sample follows  a
multinomial distribution, and H0:  P.. = P..-.p.,-  for  i,j = 1,2.

     In both cases it is possible to generate all possible samples and their
associated probabilities for 1) fixed p^s or 2) fixed Pij's.   In this way
actual  a levels can be computed for small  samples.  Ahtough theoretically
possible, calculating such levels for larger tables  would require prohibitive
amounts of computer time.

     Case I -  H0:   Pi = pa

     The following test statistics were considered:

     1)   The uncorrected chi-square

              X*  = n(XliXaa-XaiXia)2/(Xl.Xa.X.1X.a),                  (2.1)

     2)   The corrected chi-square suggested by Yates

              X*  = n(|X11Xaa-XaxXia|-n/2)a/(Xl.X1.X.1X.a),            (2.2)

     3)   The GSK method1 for testing  the  linear  difference  in two proportions,

              Xpru =  \X«'aXi j-X. jXia)  X.iX.a/(Xxi  XiaXaa+Xi2 XuXai)   (2.3)

              and,

-------
     4)  The likelihood ratio test,

              X[R = -2(X1.ln(X2/n)+X2.ln(X3./n)-X11ln(Xll/X.1)

                    -X2lln(X21/X.1)-Xl2ln(Xl2/X.a)-X22ln(Xa2/X.a)).     (2.4)

     Since these statistics are asymptotically chi-square,  they were  tested
against the appropriate x2 value with one degree of freedom with o =  .05.

     Case II - H0:  p.. = p,.p.,-, for i,j = 1,2
                     IJ    I    J
     For the multinomial or independence case, five test statistics were
examined:

     1)  The unconnected Xa as in Case I,

     2)  The corrected X2 as in Case I,

     3)  The GSK method1 for testing independence using a log-linear  model

              X»   = [ln(X11/n)-ln(X12/n)-ln(Xai/n)+ln(X2a/n)]a(zzl—),  (2.5)
               bbK                                              ^x^

     4)  The likelihood ratio test

              X.% = 2  [zzX..ln(X.,)-nl.ln(nl.)-n2.1n(na.)
               LR      1j ij    U

                    -nlln(n. i)-n.aln(n.a)]                               (2.6)

     5)  Fisher's "exact" test as defined in Kendall and Stuart [10],


              P - z11  (ll')(l*'V(i  )                                  (2-7)
                  «_/->  A    Aa 1   A.i


     As  in Case I, all tests, except Fisher's "exact," are asymptotically
chi-square with one degree of freedom.  As randomization was not used,
Fisher's "exact" test will necessarily be quite conservative.


     *If any of the X..'s were 0, they were replaced by one over the
number of response categories as  suggested by the authors.

-------
SECTION 3

RESULTS

Figures 1 and 2 depict the actual a levels obtained for the four tests
considered in Case I. Behavior of the actual a levels for four of the five
tests of Case II is shown in Figures 3 and 4. All the tests are for a
nominal a level of .05. Figure 5 compares the a level performance of the
corrected chi-square with Fisher's "exact" test. Additionally, some power
curve comparisons between the GSK and likelihood ratio tests appear in
Figures 6A-6D.

Figures 1A-1D show how the actual a levels of the four tests change for
total sample sizes of 20, 40, 60 and 80 when testing the difference in two
proportions by sampling in equal numbers from two binomial populations.
Corresponding to Figures 1A-1D, Table 1 gives the maximum a level, over the
entire range of p, for each test statistic and sample size.

The uncorrected chi-square statistic (Fig. 1A) had a levels very close
to .05, and although it behaved better for smaller samples, actual alpha
levels never exceeded .0590. The corrected chi-square statistic (Fig. IB)
produced very conservative a levels. Even with a total sample size of 80,
its maximum alpha level was only .0330.

For total samples of 20 and 40, the GSK statistic (Fig. 1C) had very
high levels. For a total sample of size 60, however, its maximum a level
dropped to .0602. At 80, it appeared to be as good as the uncorrected
chi-square statistic. The likelihood ratio ttst statistic (Fig. ID) produced
high maximum o levels for all sample sizes. As the size of sample increased,
this problem occurred mainly for values of PI (=P2) near 0 or 1.

In Figures 2A-2D, we examined the same test statistics and total sample
sizes as in Figures 1A-1D, but sampled in a 3 to 1 ratio from the two popu-
lations. This unequal sampling had the following effects on the four test
statistics: 1) maximum o levels (Table 2) for the uncorrect chi-square were
generally closer to the nominal level, 2) the corrected chi-square was more
conservative, 3) the GSK and likelihood ratio generally produced higher a
levels, with the GSK uniformly high for .2
-------
.10
.05
0.0
i i i i i i i i i i i i i i i i i r
1A CHI-SQUARE

^ I I I I I I I I I I I I I I I I I
- 2A CHI-SQUARE ' SAMPLE SIZE ~"

40
60
80
.10
.05
ui
0.0
I I T I I T I T i I i i I I I I
- 1B CORRECTED CHI-SQUARE
i i i i i i i i i i i i i r
T I I I I I I TTllirn^Tril

- 2B CORRECTED CHI-SQUARE
< .10
.10
.05
0.0
1 I ' I ' ' '
1D LIKELIHOOD RATIO
1.0
Pi P1
Figure 1. Actual a levels of four tests as a Figure 2. Actual a levels of four tests as a
function of p-| (=P2>, where N-| = ^. function of p^ (=P2), where IN^ = SIS^.
-------
TABLE 1.

MAXIMUM a LEVELS FOR TESTS OF DIFFERENCES IN PROPORTIONS
OF TWO BINOMIALS OF EQUAL SIZE
Sample
X.!
10
20
30
40

Size
X-2
10
20
30
40

Chi -square Chi -square
Uncorrected Corrected GSK
.0422
.0534
.0554
.0590

MAXIMUM a LEVELS
OF
Sampl e
X-,
5
10
15
20
Size
X.2
15
30
45
60
Chi -square
Uncorrected
.0521
.0542
.0546
.0509
.0128
.0215
.0274
.0330
TABLE
FOR TESTS OF
TWO BINOMIALS
Chi -square
Corrected
.0106
.0211
.0235
.0269
.0894
.0807
.0602
.0567
2.
Likelihood
Ratio
.0876
.0853
.0891
.0903

DIFFERENCES IN PROPORTIONS
OF UNEQUAL SIZE
GSK
.0930
.0827
.0768
.0715
Likelihood
Ratio
.0856
.0733
.0687
.0732
-------
.10
I } I I I

- 3A CHI-SQUARE
.05
0.0
1 I 1 I I I 1 I
4A CHI-SQUARE SAMPLE SIZE
.10
i i i i | i

- 3B CORRECTED CHI-SQUARE
.05
ui
0.0
•"•*«*,••..

_, I
48 CORRECTED CHI-SQUARE
.10
.05
0.0
- 3CGSK
i i I
- 4CGSK
.10
.05
0.0
3D LIKELIHOOD RATIO
' i i r |
4D LIKELIHOOD RATIO
.50
.125
.25
Figure 3. Actual a levels of four tests as a Figure 4. Actual a levels of four tests as a
function of PI-j, where Pi2 =-5- p^-j and function of p^-j, where p-|2 = -25-pii,
P11. P21=3P1V
-------
The plots in Figure 3 were constructed under the constraints that
Pii= Pi2> P2i = P22> and Pn + P2i = -5- Without loss of generality, we
then calculated actual levels for 0 3 p2i = P22, and Pii + P2i = -25.
In this instance, we computed actual a levels for 0 < Pn <.25.

As in Case I, alpha levels of the uncorrected chi-square statistic
adhere very closely to the nominal .05 level of the test for all sample sizes.
By increasing the size of the sample, we observe (Figure 3A) that actual
Type I error probabilities are near the nominal level of the test for a wider
range of p:. The maximum a level that the uncorrected chi-square yields, for
the sample sizes used, is .0540 when n = 80 (Table 3).
TABLE 3

MAXIMUM a LEVELS FOR FIVE TESTS AS A FUNCTION OF
pn, where P12=.5-pn and p2i=Pii
Sample Size
n
20
40
60
80
Chi-square
Uncorrected
.0512
.0530
.0523
.0540
Chi-square
Corrected
.0128
.0211
.0278
.0312
GSK
.0351
.0433
.0503
.0512
Likelihood
Ratio
.0824
.0760
.0779
.0767
Fisher's
"Exact"
.0177
.0221
.0282
.0312
In Case II, where none of the cell probabilities exceeds .5, adding the
continuity correction again yields very conservative actual a levels
(Figure 3B). Although these levels are uniformly higher, over the range of
Pi with an increase in sample size, the maximum a level of the corrected
chi-square is only .0312 for a sample of 80.

The GSK test statistic examined in Case II produces surprisingly con-
servative actual a levels under the first set constraints (Figure 3C). The
effect of increasing sample size is generally to increase these levels over
the range of PH. The maximum a level achieved, .0512, occurs at n = 80.

Under the same constraints, the likelihood ratio test ensures the least
protection from Type 1 errors. Increasing sample size does effectively lower
actual a levels; but, if pn or p2i is less than .1, some of these levels can
exceed .075 (Figure 3D), even with n = 80. For the sample sizes investigated,
the likelihood ratio test reached a maximum a level of .0824.

8
-------
The result of imposing the second set of constraints on the calculations
can be seen in Figures 4A-4D and in Table 4 and can be summarized as follows:

1) maximum a levels for all tests for all sample sizes are lower
than under the first set (Table 3),

2) least affected is the uncorrected chi-square,

3) actual a levels of the corrected chi-square and GSK tests
generally become more conservative and

4) actual levels still show the likelihood ratio test to be the
least effective of the four in controlling Type I errors.
TABLE 4.

MAXIMUM LEVELS FOR FIVE TESTS AS A FUNCTION OF
Pii, where pi2=.25-pn and P2i=3pii
Sample Size
n
20
40
60
80
Chi-square
Uncorrected
.0482
.0515
.0517
.0511
Chi-square
Corrected
.0107
.0189
.0242
.0259
GSK
.0130
.0303
.0415
.0452
Likelihood
Ratio
.0800
.0725
.0731
.0710
Fisher's
"Exact"
.0103
.0209
.0246
.0277
Figure 5 illustrates the general agreement between actual a levels of
the chi-square test with continuity correction and Fisher's "exact" test
defined in the classical sense described by Miettinen [11]. In Figure 5A,
we see that, for a total sample of 20 and with constraints: Pn = Pi2» P2i =
p22 and pii + pzi = .5, both tests are conservative with respect to the
nominal a level of .05. For pn or pai <.l, actual o levels are nearly
identical, but, for values of pn near p21, the corrected chi-square is the
more conservative test. However, as n increases to 80 (Figure 5B), for the
same constraints, there is little difference in the actual a levels of the
two tests, although the corrected chi-square is still more conservative over
the entire range of pj.

Under the other set of constraints, i.e. 3pn= p12, 3p21= p22, and
Pn + P2i = «25, it is evident from Figures 5C-D that, while both tests
still produce extremely low actual a levels, the chi-square with continuity
-------
.05
.04
.03
.02
.01
I
5A n=20,pl1 = P12, P21 = P22-P11+ P21
i i i i i i i i
: 5C n=2
n i
TI i i i
LEGEND
FET
< .05
.04
.03
0.2
.01
__l||il||l|ll|||MI||T TI i I I I I II I II F|fiII|iillii

5B n=80, pn =P12,P21 =P22'
= /
.2
.3
.4
.5
P11
.Ob
.15
.20
.25
Figure 5. Comparison of a levels of Fisher's "Exact" Test with the corrected Chi-square
Test for Case II.
-------
correction is generally more conservative, particularly with pn close to
p12. Note that, in Figure 5C, the effect of the continuity correction for
pn <.05 or >.2 with a total sample of only 20 is quite pronounced.

As so little is known about the performance of the GSK and likelihood
ratio tests in samll samples, we were naturally curious about the power of
these tests as well. Because it wasn't feasible to compare the power of
these tests under all sets of alternative hypotheses, for each of four
sample sizes considered in Case I with X.i = X.2, the power was computed by
fixing P2 equal to the proportion for which the actual ex levels of the two
tests were approximately equal and by letting pl vary from 0 to 1.

In Figures 6A-6D, we see that neither of these tests is clearly and
uniformly most powerful for any of the sample sizes examined. In fact,
with sample sizes of 60 and 80, the two power curves are virtually
indistinguishable (Figure 6C-D). For smaller samples, the GSK is more
powerful when considering alternate hypotheses where Pi>p2. but the likeli-
hood ratio is more powerful when Pi
-------
ro
.4
1 ' I ' ' ' I ' ' '

- 6A p2=.29,x . 1 = x.2=10
I , , , , , ,
ui

I
Q.
.2
.2
.4
.6
.8
1.0
Figure 6. Comparison of the power of the GSK Test vs. the Likelihood Ratio Test. For

Case I, where P2 is chosen so that the a levels of the two tests are the same.
-------
SECTION 4

DISCUSSION

Comparison of the actual a levels of the 6SK and likelihood ratio tests
indicates that, in general, the GSK test more reliably adheres to a nominal
value of .05. For Case I, neither test consistently yields a levels near or
below the nominal value. For sample sizes of 30 or 40 from each binomial
population, the GSK test appears to provide more acceptable a levels. With
unequal sampling, neither test yields levels near the nominal value. For
Case II, the GSK test is nearly always conservative while the likelihood
ratio test often produces extremely high actual levels. In this case, the
GSK test was based on a log-linear model. The log-linear model may be
responsible for the more conservative a levels. Although power was compared
for so few cases that results may not be generalizable, neither the GSK nor
the likelihood ratio test appears to be uniformly more powerful. For larger
sample sizes, they were nearly identical.

While this study was undertaken primarily to examine the behavior of
the GSK and likelihood ratio tests in small samples, some interesting results
occurred in our examination of the other tests. Of all test statistics
considered, the uncorrected chi-square was nearly always closest to the
nominal value. In fact, of all cases tested, the largest a level that the
uncorrected chi-square produced was .059 with actual levels rarely exceeding
.055.

Garside and Mack [6] reported exact a levels for the uncorrected
chi-square as high as .0607. However, our calculations for the same param-
eter values produced an a of .0542. Moreover, in every instance where
comparisons were possible, our a levels differed substantially from those
tabulated by Garside and Mack. Particularly disturbing in these authors'
study is the lack of symmetry about p=.5 of these levels when the sample
sizes from the two binomial populations differed. Conversely, our results
are in complete agreement with those of Eberhardt and Fligner [7] when
conditions permitted comparison.

As demonstrated in Figure 5, actual a levels of the corrected chi-square
and Fisher's "exact" test are always conservative, and their behavior is
quite similar over the entire range of parameters and sample sizes consider-
ed. These results are consistent with the observations of others, especially
Mantel and Greenhouse [12]. Although only these two tests, of the five
studied, provide completely safe a levels, it is very difficult to ignore
the performance of the uncorrected chi-square when one is mainly interested
in ensuring that the actual level of his test is close to a nominal level of
.05.

13
-------
REFERENCES
[1] House, D. E., J. F. Finklea, C. M. Shy, D. C. Calafiore, W. B. Riggan,
J. W. Southwick, L. J. Olsen (1974), Prevalence of Chronic Respiratory
Disease Symptoms in Adults: 1970 Survey of Salt Lake Basin Communi-
ties. Health Consequences of Sulfur Oxides: A Report from CHESS,
1970-1971. EPA Report No. EPA-65/1-74-4, 2-41 - 2-54. Research
Triangle Park, NC, EPA.

[2] Goldberg, H. E., J. F. Finklea, C. J. Nelson, W. B. Steen, R. S.
Chapman, D. H. Swanson, A. A. Cohen (1974), Prevalence of Chronic
Respiratory Disease Symptoms in Adults: 1970 Survey of New York
Communities. Health Consequences of Sulfur Oxides: A Report from
CHESS, 1970-1971. EPA Report No. EPA-65/1-74-4, 5-33 - 5-48.
Research Triangle Park, NC, EPA.

[3] Finklea, J. F., J. Goldberg, V. Hasselblad, C. M. Shy, C. G. Hayes
(1974), Prevalence of Chronic Respiratory Disease Symptoms in Military
Recruits: Chicago Induction Center, 1969-1970. Health Consequences
of Sulfur Oxides: A Report from CHESS, 1970-1971. EPA Report No.
65/1-74-4, 4-23 - 4-36. Research Triangle Park, NC, EPA.

[4] Grizzle, J. E., C. E. Starmer, G. G. Koch (1969), "Analysis of
Categorical Data by Linear Models," Biometrics, 489-504.

[5] Grizzle, J. E. (1967), "Continuity Correction in the X2 Test for 2 x 2
Tables," The American Statistician, 21, No. 4, 28-32.

[6] Garside, G. R. and C. Mack (1976), "Actual Type 1 Error Probabilities
for Various Tests in the Homogeneity Case of the 2x2 Table,"
The American Statistician, 30, No. 1, 18-21.

[7] Eberhardt, K. R. and M. A. Fligner (1977), "A Comparison of Two Tests
for Equality of Proportions," The American Statistician, 31, No. 4,
151-155.

[8] Larntz, K. (1978), "Small-Sample Comparisons of Exact Levels for
Chi-Squared Goodness-of-Fit Statistics," Journal of the American
Statistical Association, 73, No. 362, 253-263.

[9] Goodman, L. (1964), "Simultaneous Confidence Intervals for Contrasts
Among Multinomial Populations," Annals of Mathematical Statistics,
35, 716-725.
14
-------
[10] Kendal, M. G. and A. Stuart (1973), The Advanced Theory of Statistics.
2, 570-572.

[113 Miettinen, 0. S. (1974), "Comment on Yates Continuity Correction,"
JASA, 69, No. 345, 380-382.

[12] Mantel, N. and S. W. Greenhouse (1968), "What is the Continuity
Correction?," Amer. Statist., 22, 27-30.
15
-------
TECHNICAL REPORT DATA
/Please read Instructions on the reverse before completing)
1. REPORT NO. 2.
EPA-600/9-80-029
4. TITLE AND SUBTITLE
Small Sample Properties of Various Tests
2x2 Tables
7. AUTHOR(S)
Victor Hasselblad and Andrew G. Stead
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Biometry Division
Health Effects Research Laboratory
Research Triangle Park, NC 27711
12. SPONSORING AGENCY NAME AND ADDRESS
Health Effects Research Laboratory
Office of Research and Development
U.S. Environmental Protection Agency
Research Triangle Park, NC 27711
3. RECIPIENT'S ACCESSION NO.
5. REPORT DATE
May 1980
6. PERFORMING ORGANIZATION CODE
8. PERFORMING ORGANIZATION REPORT NO.
i
10. PROGRAM ELEMENT NO.
1AA816
11. CONTRACT/GRANT NO.
13. TYPE OF REPORT AND PERIOD COVERED
1d. SPONSORING AGENCY CODE
600/11
15. SUPPLEMENTARY NOTES
16. ABSTRACT ~
With particular interest in the likelihood ratio tests and tests
generated by the categorical analysis methods of Grizzle, Starmer, and
Koch, the small sample behavior of actual a levels is graphically illus-
trated for several asymptotically chi-square statistics when testing is
performed on 2 x 2 contingency table data at a nominal level of .05.
This behavior is examined for hypotheses testing both homogeneity and
independence. In the homogeneity case, the effects on these levels are
examined for equal and unequal sample sizes. In the independence case,
the effects of different constraints concerning individual cell
probabilities are demonstrated. Visual comparisons of the power of the
likelihood ratio and the Grizzle, Starmer, Koch related statistics are
also provided.
17. KEY WORDS AND DOCUMENT ANALYSIS
a. DESCRIPTORS
Statistical analysis
Chi-square test
18. DISTRIBUTION STATEMENT
RELEASE TO PUBLIC
b. IDENTIFIERS/OPEN ENDED TERMS C. COSATI Field/Group
Categorical analysis 12A
methods
Likelihood ratio tests
Small sample behavior
19. SECURITY CLASS (This Report) 21. NO. OF PAGES
UNCLASSIFIED 16
20. SECURITY CLASS (This page) 22. PRICE
UNCLASSIFIED
EPA Farm 2220-1 (Rev. 4-77) PREVIOUS EDI TION is OBSOLETE
16
-------