oEPA
              Uinted State?
              Environmental Protection
              Agency
             Health Effects Research
             Laboratory
             Research Triangle Park NC 2771 1
EPA-600/1-78-043
June 1978
              Research and Development
Comparison  of
Methods for the
Analysis of  Panel
Studies
N
\
                        ARY
                     : ' ^'MENTAL PROTECTS
                     • .-. J  06817

-------
                RESEARCH REPORTING SERIES

Research reports of the Office of Research and Development, U.S. Environmental
Protection Agency, have been grouped into nine series. These nine broad cate-
gories were established to facilitate further development and application of en-
vironmental technology.  Elimination of traditional grouping  was  consciously
planned to foster technology transfer and a maximum interface in related fields.
The nine series are:

      1.  Environmental  Health Effects Research
     2.  Environmental  Protection Technology
     3.  Ecological Research
     4.  Environmental  Monitoring
     5.  Socioeconomic Environmental Studies
     6.  Scientific and Technical Assessment Reports (STAR)
     7.  Interagency Energy-Environment Research and Development
     8.  "Special" Reports
     9.  Miscellaneous Reports
This report has been assigned to the ENVIRONMENTAL HEALTH EFFECTS RE-
SEARCH series. This series describes projects and studies relating to the toler-
ances of man for unhealthtul  substances or conditions This work is generally
assessed from a medical viewpoint, including physiological or psychological
studies. In addition to toxicology and other medical specialities, study areas in-
clude biomedical  instrumentation and health research techniques  utilizing ani-
mals — but always with  intended application to human health measures.
 This document is available to the public through the National Technical Informa-
 tion Service, Springfield, Virginia 22161.

-------
                                            EPA-600/1-78-043
                                            June 1978
COMPARISON OF METHODS FOR THE ANALYSIS OF PANEL STUDIES
                           by

                   Victor Hasselblad
         Statistics and Data Management Office
          Health Effects Research Laboratory
          Research Triangle Park, N.C.  27711
         U.S. ENVIRONMENTAL PROTECTION AGENCY
          OFFICE OF RESEARCH AND DEVELPMENT
          HEALTH EFFECTS RESEARCH LABORATORY
          RESEARCH TRIANGLE PARK, N.C. 27711

-------
                                DISCLAIMER






     This report has been reviewed by the Health Effects  Research Laboratory,



U.S.  Environmental  Protection Agency, and approved for publication.   Approval



does  not signify that the contents necessarily reflect the views and policies



of the U.S. Environmental Protection Agency, nor does mention of trade names



or commercial products constitute endorsement or recommendation for use.
                                      n

-------
                                FOREWORD
     The many benefits of our modern, developing, industrial  society
are accompanied by certain hazards.  Careful  assessment of the relative
risk of existing and new man-made environmental  hazards is necessary
for the establishment of sound regulatory policy.  These regulations
serve to enhance the quality of our environment  in order to promote the
public health and welfare and the productive  capacity of our  Nation's
population.

     The Health Effects Research Laboratory,  Research Triangle Park,
conducts a coordinated environmental  health research program  in toxicology,
epidemiology, and clinical studies using human volunteer subjects.
These studies address problems in air pollution, non-ionizing
radiation, environmental carcinogenesis and the  toxicology of pesticides
as well as other chemical pollutants.  The Laboratory participates in
the development and revision of air quality criteria documents on
pollutants for which national ambient air quality standards exist or
are proposed, provides the data for registration of new pesticides or
proposed suspension of those already in use,  conducts research on
hazardous and toxic materials, and is primarily  responsible for providing
the health basis for non-ionizing radiation standards.  Direct support
to the regulatory function of the Agency is provided in the form of
expert testimony and preparation of affidavits as well as expert advice
to the Administrator to assure the adequacy of health care and surveillance
of persons having suffered imminent and substantial endangerment of
their health.

     This report addresses the extreme difficulty of analyzing panel
data collected over time in an epidemiologic  study.  Although no
exact solution is found, some improvements are suggested for  the design
and analysis of future studies.
                                   F. G. Hueter, Ph. D.
                                     Acting Director,
                           Health Effects Research Laboratory

-------
                                 ABSTRACT





     Three different methods of analysis of panels were compared using



asthma panel  data from a 1970-1971  study done by EPA in Riverhead,  N.  Y.



The methods were 1) regression analysis using raw attack rates;  2)  regression



analysis using the ratio of observed attacks to expected attacks; and  3)



discriminant analysis where repeated attacks were ignored.   The  first  two



methods were found to have serious  serial correlation problems.   The third



method eliminated this problem, but reduced the effective sample size



considerably.



     A more appropriate method was  suggested for larger panels over shorter



periods of time.  The analyses of the Riverhead data showed that any sulfate



effect on asthmatics was confounded with seasonal trends.
                                      IV

-------
                                 CONTENTS
                                                                     Page
Foreword	        iii
Abstract	         iv
Figures	         vi
Tables	         vi
Acknowledgment 	        vii
     1.  Introduction	          1
     2.  The Basic Data	          2
     3.  Analyses Using the Raw Attack Rate	          5
     4.  Analyses Using Observed/Expected	         13
     5.  A Modified Discriminant Analysis	         18
     6.  Comparison of Methods	         24
     7.  Discussion	         26
References	         27

-------
                                     FIGURES


Number                                                                     Page

  1       Histogram of the Attack Rate for the 1970-1971  Riverhead
            Asthma Panel  	     °

  2       Histogram of Suspended Sulfate Values for the 1970-1971
            Riverhead Asthma Panel  	     7

  3       Histogram of Residuals from a Regression  Analysis  of Attack
            Rate for the 1970-1971 Riverhead  Asthma Panel  	    12

  4       Histogram of the Observed Attacks Divided by the Expected
            Attacks for the 1970-1971 Riverhead Asthma Panel  	    14

  5       Histogram of Residuals from a Regression  ANalysis  of
            Observed/Expected for the 1970-1971 Riverhead  Asthma
            Panel 	    15


                                     TABLES


Number                                                                     Page

  1       Pairwise Correlations of Pollutants in Riverhead,  N.Y.,
            October 18, 1970 to May 29, 1971  	     3

  2       Multiple Regression Analyses of the Riverhead Asthma Attack
            Rate on SO  Using Various Covariates Separately  	     8
                      X
  3       Multiple Regression Analysis of the Riverhead Asthma Attack
            Rate on SO  Using All Covariates  Simultaneously  	    11
                      X
  4       Multiple Regression Analyses of the Ratio of Observed to
            Expected on SO  Using Various Covariates Separately 	    16
                          A

  5       Multiple Regression Analysis of the Rate  of Observed to
            Expected on SO  Using All Covariates Simultaneously 	    17
                          A

  6       Discriminant Analyses of the Riverhead Asthma Attack Rate
            on SO  Using Various Covariates Separately 	    21
                 A
  7       Discriminant Analysis of the Riverhead Asthma Attack Rates
            on SO  Using All Covariates Simultaneously	    22
                 X
                                       VI

-------
                              ACKNOWLEDGMENT





     I would like to thank the three  reviewers,  John  Creason,  Larry



Kupper, and Robert Chapman, who all made excellent  suggestions.   I would



also like to thank William C.  Nelson  for his continued  support and



suggestions, and Barbara  Crabtree  for the typing  of this manuscript.
                                     vn

-------
                                  SECTION 1



                                 INTRODUCTION





     Panel  studies have made an important contribution to epidemiology.



These include studies of asthmatics by Cohen, et al.1, Salvaggio,  et al.2,



and Girsh,  et al.3.   Lawther, et al.4 looked at panels of bronchitics.



Goldberg, et al.5 looked at cardiovascular symptoms  in elderly persons.



Hammer, et  al.6, studied irritation symptoms, including eye discomfort.



The effect  of meteorology and pollution on pulmonary function was  studied



by Lawther, et al.7'8.  The problem with panel  studies, as noted by Stebbings



and Hayes9, is that  the analysis of such studies is  statistically  difficult.



If the data are continuous, such as pulmonary function data, then  a repeated



measures analysis or multivariate analysis of variance is appropriate.   If



the data are discrete, such as yes-no data,  then no  available techniques are



fully satisfactory.



     In an  effort to evaluate a range of analysis techniques, this paper



presents a  comparison of three different methods of  analysis, using 1970-1971



Riverhead,  N. Y. asthma panel data.  An analysis of  this data, along with



similar analyses of  panels from Queens and Bronx was previously presented by



Finklea, et al.10  This study was one of several conducted by the



Environmental Protection Agency's Community Health Environmental Surveillance



System (CHESS).

-------
                                  SECTION 2



                               THE BASIC DATA





     Panelists were selected from three New York communities.   Riverhead,



located near the center of Long Island, had generally lower exposures to



air pollution than did the communities  of Queens and Bronx.  The names of



panelists were obtained from hospital  clinic records and records of



practicing physicians.  Background information on each panelist, including



age and smoking history, was obtained  by an interview.  All panelists



lived within a 1.5-mile radius of a monitoring station.



     From October 4, 1970 to May 29, 1971 each panelist received a weekly



diary.  The panelist recorded the presence or absence of an asthma attack  for



each day of the week, and then returned the completed diary by mail.   During



this same time period, continuous 24-hour air monitoring was also conducted by



EPA.  The measurements included total  suspended particulates (TSP), sulfate



fraction of TSP (SO ), nitrate fraction of TSP (NO ), sulfur dioxide  (S02),
                   X                              A


and nitrogen dioxide  (N02)-  The measurement methods are described by



Finklea, et al.10.



     This paper will restrict itself to the analysis of the Riverhead



panel.  Although Riverhead had generally lower pollution values than  the



other two communities, the previous analysis of the Riverhead data showed



a significant relationship between asthma attack rate and SOX-  All of the



pollutants are correlated with one another, as shown in Table 1.  This



paper will consider the relationship of asthma with S0x only.



     The first 2 weeks of data were not used, to minimize the problem of



initial over-reporting, and to allow the panelists to become familiar with



the study.  Fifty panelists were initially enrolled  in Riverhead, but 11

-------
TABLE 1.   PAIRWISE CORRELATIONS  OF  POLLUTANTS
   IN RIVERHEAD,  N.  Y.  FOR THE TIME PERIOD
     OCTOBER 18,  1970  TO  MAY  29,  1971.*

TSP
sox
NOX
S02
TSP
1.000
(206)
.796
(205)
.487
(206)
.370
(188)
SOX
.796
(205)
1.000
(205)
.380
(205)
.377
(187)
NOX
.487
(206)
.380
(205)
1.000
(206)
.141
(188)
S02
.370
(188)
.377
(187)
.141
(188)
1.000
(194)
  * Sample Sizes Are Given in  Parentheses.

-------
were eliminated from the analyses because they either never returned a
diary, or never reported having an asthma attack at any time during the
study.

-------
                                  SECTION 3



                     ANALYSES USING THE RAW ATTACK RATE





     One obvious method of analysis is to use the raw attack rate as the



dependent variable.  This rate is defined for each day as the number of



panelists reporting an attack on that date divided by the number of



panelists giving a response on that same day.  Figure 1 shows a histogram



of this variable.  The analyses of this variable were restricted to the 205



days with valid SO  measurements.  The attack rate ranged from .057 to .500,
                  J\


with a mean value of .181.  The SO  measurements ranged from .5 to 51.4,
                                  /\


with a mean of 9.81.  A histogram of the SO  values is shown in Figure 2.
                                           X


Note that both the simple attack rate and SO  have distributions which are
                                            X


skewed to the right.



     There are a large number of possible covariates which could be used



in the analyses.  Several of these, such as pollen or emotional stress,



were not measured in this study.  The following factors were used in the



analyses shown in Table 2.



     1)  Season:  several authors have noted seasonal trends in asthma, with



higher rates in October and November in the northern hemisphere3'11"13



and higher rates in April and May in the southern hemisphere11*.  This



effect was estimated using the first two terms of a Fourier series.15



     2)  Start-up effect:  there is often over-reporting during the first



few weeks of a study.  Although the first 2 weeks were dropped, the



reciprocal of the week number was still used as a covariate.



     3)  Temperature:  low temperatures have also been associated with



increased asthma attacks.12"14'16  The daily minimum temperature as reported



at the airport was  included as a covariate.



                                     5

-------
               00 <  «-


               Sn  "
               ^Su:
             •   II  —  Q
                  I
                  O
                                                                                                co
                                                                                                E

                                                                                                o
                                                                                                c
                                                                                                CO
                                                                                                a.

                                                                                                CO

                                                                                                £
                                                                                                (A
                                                                                                CO
                                                                                               -a
                                                                                                co
                                                                
                                                                                                •j±
                                                                                                to
                                                                                                   0)

                                                                                                  -C
                                                                                                O  s-

                                                                                                «  .2
                                                                                                03


                                                                                                =
o
to
o
CO
o
CM

-------
    en
        un
cd  >  ™  «*>
 II  UJ
         II   "

        UJ  O-


        CC  r-"
2»  CO D

•55°

112
LJJ  111 X
O
O
                                                                                                        il  <

-------












UJ
h-

u_
o x>-
O —1
CO CO UJ
UJ h-
CO 2T<:

—1 «=c

uj «=co
rv ^~ t ^
"T"
UJ h- CO
	 1 I/O ZO
a.  1-1
CM i— i CO
OL •=>
UJ
CQ
pE
















dJ
^3
Q. i—
to
O)
O X
E 0
O) CO
1 i.
o o
+-> M-
U_

C C.
O O)
•r— »r—
X in u
O IO -I—
CO O) t-
$- M-
cn o;
O) O
a: o


O)
S- 4->
QJ fO
40 *r~
C S-
O) fO
1 >
o o
4-> O
1
u_ s-
0
M-


E
(/) O
fjj GJ
i- a>
CJl S-
0) U-
Q

O





•M
(O
•r—
S-
rO

O
O





LO r~» co CT>
co r~- r— LO
CM CO CO CM
O sf O O


r**-« co r^ st~
CO •* UD O
^ Sh LO





CO CTl O Si"
oo si- si- cr>
•— o •— •—
o o o o
o o o o
• » » •









o s^ CM
CO CM CM
1 ...
CM LO
s^ o
1—








1 ^" r— r^






OJ
i.
3
to
QJ
Q-
E
OJ
h-
Q.
3 E
*-^ C 1 U
C1J O 4-> E
c in i- T-
O fO fC C
2T O -M *r~
^^ co co s:
o
CO
CM
o


LO
CM
LO





^O
CT)
r—
O
o
•









^o
o
•
r—










vo












vx
OJ
Q^
"^g

M-
O

>»j
re
Q

-------
     4)  Day of week:   this variable was not known to be important, but
dummy variables were included to allow for any possible association of
attack with day of week.
     Table 2 shows the F-to-enter for all  of these covariates.   This is the
F value associated with a simple linear regression using the particular
covariate as the independent variable.  The SO  regression coefficient is
                                              /^
given for each covariate added separately  to the model.   The SO  F-to-remove
                                                               A
value, which is a test of the effect of SO  in addition  to that of  a specified
                                          X
covariate, is also shown.  Although both the seasonal  trend and the start-up
effect are highly significant, only the seasonal  trend  reduces  the
significance of SO .
                  /\
     Table 3 shows the result of including all covariates in the same
multiple regression analysis.  The analysis demonstrates the effect of too
many correlated covariates.  The overall sum of squares explained is almost
10 times the sum of the individual sum of  squares explained.  The
regression coefficient for the start-up effect is negative, even though it
has a positive simple correlation of  .653.
     The residuals based on the regression analysis in Table 3 are shown
in Figure 3.  A chi-square goodness of fit test for normality of the
residuals gives a value of 13.28 for 9 degrees of freedom (p=.1503).  There
is some evidence of skewness to the right (p=.0100), but less evidence of
Kurtosis (p=.0738), using Fisher's G statistics17.  The greatest problem
with the residuals is that the serial correlation is .3335, which is highly
significant  (p=<.0001).
     There are at least three major problems with the analysis:
     1)  There is no adjustment for changing  panel composition due to

-------
    dropins and dropouts.
2)  The information about  the sample size of  the  panel  is  lost.
3)  The residuals show high serial  correlation.
                                10

-------










LiJ
T3
i—

LL
0 X
o >-
CO CO _J
I— I CO
co s ID
>- o o
_J LU

Q- CO O
p— i et CJ
si LU et
1C
ce: CD
• LU 2:
co :> i— i
i— i CO
ni rS — i
— 1
CQ

4-> 0
1 £
U_ CU

CU
c s-
(O (O
CU 3
SI CT
CO



t/)
M- CU
O S-
rO
E 3
rs cr
CO CO


-f-J
c c
O CD
•1— •! —
(/) O
(/I *l~
CU M-
S- 4-
05 CU
Ol O
c£ o
^
o
e
V) O
CU "O
CU CU
s- cu
o) s-
O) Lu
Q







OJ
-p
fo
•1 —
S-
ra
>
0
o




o
00

CM


00

•
1

CO
o

o
0
•




CO
o
^J-
0
0
•




CO
1 —
o
o
0









r—






















X
o
00
0
^~
co
o


U3
IO
•
CM

00

0*1
0
o
*




o
r*"*>
UD
CO
o
•






1











^j-



















c
0
OO
ro
CD
00
CM
r—
cr>
CO


CM
o
•


10
o
0
0
o
•




<~o
o
o
0
o
•




en
^~
en
CO
o
•
1







r—

















d.
3
1
-M
S-
ro
.4_>
OO
1 —
r-~
CO
•—


o
CO
•
CM

CO
o
^J-
0
o
•




co
o
^-
0
o
•




CM
o
r—
0
o

1







r—






CU
S-
3
-t-J
ro

a>
Q.
E
CU
I—

E

^
• 1 —
c:
., —
SI
>X)
CO
CM
"—


00
Id
•
1

co

LT>
0
0
•




OO

^J-
co
o
•






I











Id














SX
CU
CU
"^

([
o

^w
03
Q
1 —
o
0
o
M

«^-
cn
•
2

o tn
m «d-
i— CO
LO O
0 0
• •




(d *d
LO CM
en co
 S-
O LU



















































LT.
CM

CD
LT;


II

CM


CU

Q-
• i —
4-'
, —
^
g"

-------
                                                                      0)


                                                                     ir


       —  CL

    CO ^  d


    go  "-


6  n' *  d
 ii N   I  u
dN
                                                                     
-------
                                  SECTION 4



                      ANALYSES USING OBSERVED/EXPECTED





     One refinement of the analysis in the previous section is to compute an



expected attack rate for each day.  This is the sum of the mean daily attack



rates for each individual  who responded on a particular day.  By forming the



ratio of the observed to the expected for each day, the analysis can allow



for changing panel  composition.  This ratio ranged from .3378 to 2.3861.



A histogram of the  ratio is shown in Figure 4.



     Table 4 was constructed in the same manner as Table 2, except that the



ratio of the observed to expected was used as a dependent variable instead



of the raw attack rate.   The results are quite similar to Table 2, with



season again eliminating any SO  effect.  The p values for SO  are slightly
                               x\                             A


larger than they were for the analysis in Table 2.



     If all covariates are added simultaneously, as was done in Table 3, the



result is as shown  in Table 5.  The covariates are still highly correlated,



and as a result, the individual sums of squares do not add up to one fifth



of the overall sum  of squares.  Again, the regression coefficient for the



start-up effect is  negative, even though it is positively correlated (r=.562)



with the ratio of observed to expected rates.



     The residuals  based on the regression analysis in Table 5 are shown



in Figure 5.  A chi-square goodness of fit test for normality of the



residuals gives a value of 24.96 for 21 degrees of freedom (p=.2488).



There is less evidence of skewness (p=.0916) and kurtosis (p=.94E2), than in



the previous analysis.  The residuals still have a significant serial



correlation of  .2257 (p=.0010).
                                      13

-------
com
      UJ O-

      oc ri

      < «-
      — )  I,
          "
                                                                                          r*.
                                                                                          05


                                                                                           03
                                                                                           o
                                                                                           re
re
+-• re
o +*
03 re
                                                                                           X  03
                                                                                           03  J2

                                                                                           03  +-


                                                                                          £  2
                                                                                          •a
                                                                                           0)  ^I
                                                                                           O 3
                                                                                           re -Q
                                                                                           « 'Z

                                                                                           re tJ

                                                                                           •D ^
                                                                                           03

                                                                                           > re



                                                                                              -
                       o
                       00
                                                                                              re
                                                                                           31 re

                                                                                            . -o
                                                                                           «* re
                                                                                           3 03

                                                                                           E1 -
                                                                                           LL be
                                          14

-------
     il
  p** ^ o
  co -i „
  co d M
  o O

6  ""' 52 S
II CM  I
fci.  a)

                                                                           if -£
                                                                             8;
o
in
                o
                CO
o
CM
                                       15

-------











LU
1C
*~
U_ X
o o >-
oo _J
OO UJ
uj -z. h-
oo o <:
>~ OL
—1 Q et
eC UJ D_
 00
ZT Q_ OO
O X UJ
i— i LU 1 —
OO 
UJ LU O
C£ > o
uj uj oo
— 1 00 ID
a. CQ o
> — < CD i — "
1— OL
_J U_ eC
ZD O >
21
CD CD
i — i ^
• |— - *~-*
«vf  M-
u«

E C
O OJ
•1— •!—
X (A 0
O 1/1 T-
00 O) M-
C (^ —
O) OJ
cu o
Qi O

CU
S- -M
CU (C
4-^ *r*
E S-
O) (C
1 >
O 0
4-) O
1 1 C
L4_ >«
0
<4-



E
t/> O
CU T3
CU CU
i. CU
cn S-
cu u_
Q
M-
O




CD
4_>
res
•r*™
^.

^
0
^_j





CM LO 1 —
OO ^3- CO
^J- OO LO
O ^J- 0


«3- i— CM
r— **O ^O
• • •
*!• OO




i— CT> CM
O Is*. CM
CO CM WD
O O O
O O O
• • •








l^~ O
CO LO
1 • •
<£> oo
CM cn










1 ^" r—













a.
3
' — ^ E 1
CU O 4->
E in s-
O t5 *
Z CU 4->
— oo oo
, 	
CO

0


LO
cn
•
oo




oo
cn
i*^
o
o
•








CM
0
•












^—





cu
S-
res
S-
cu
o.
J —
a;
t—

E
3
E
•r—
C
*i^
s:
00
oo
00
o


01
LO
•
^J-




CM
^-
00
o
o
•








f^
•=3-

f-~











<£)











CU
O)
3

l| —
0

>J
fO
Q
16

-------











UJ
~^~
H-

LJL.
O X
0 >-
oo oo _i
i— i OO
>- O O
_l LU

D- OO O
I-H CQ C_)
H- 0
ZD U 	 1
•Si O <

O CJ3
• i — i 2T
LO 1 — l-H

4_> o
i E
u_ cu
o:

cu
E S-
ro rO
CU 13
s: cr
OO


LO
q_ QJ
0 S-
ro

Z3 CT
OO OO



+-1
E E
O Q)
•r— •! —
CO O

CU £
01 cu
cu o
rv r_3

M-
0
E
t/l O
cu ~a
cu cu
i- cu
en s~
cu u_
Q





cu
4_)
ro
•i —
^.
ro

O
0





00
CM
r—
CM


LO
LO
•
r~_


OO
CM
CO
OO
r— .
•



00
CM
OO
oo
r—
•





r^
LO
^~
o
o









r—




















X
0
oo
o
CO
I— —
1—


r+^
CO

,—


LO
0
LO
LO
r—*
.



LO
CM
o
LO
LO
•








1










^J-

















E
o
LO
rO
OJ
oo
^;(-
^^,
en
LO


LO
r—
•



CM
•vf
oo
1—
o




CM
^J-
oo
(— ™
O
•





I---
LO

LO
LO
,"








r—















CL
13
1
+_)
S-
rt3
4-)
00
00
00
^~
•—


LO
^J-
•
CM


CO
^~
r^x.
1—
OJ
•



OD
r—~
|^
r—
CM
•





LO
OO
LO
O
0
1








l—~






Q)
^.
^
¥
cu
Q.
E
cu
1—

E

pt
• r—
E
• r—
S!
, —
r—
LO
O


^~
P—
.
CNJ


CM
CD
00
CO

•



LO
LO
OO
00
! 	
•
f—







1










LO












sx
cu
QJ
3

L) 	
O

>}
rO
Q
, —
0
O
0
V

^J-
1 —
.
0
•~

LO LO
LO ^d-
LO CO
en co
CO O




CM
r— CT>
LO ^d"
LO Ch
LO CO
• •
r~ ^O
1 — 1 — "






1










CO r—
i — O*»
, —














I— ~
,—
rO i-
S- 0
QJ S-

O LU




















































LO
CM
00
0
*
II

CM
C£.

QJ

Q.
•i —
4_)
i —
^
2!
17

-------
                                  SECTION  5
                      A MODIFIED DISCRIMINANT ANALYSIS

     Any sophisticated analysis of panel data should  consider  the  pattern
of individual  responses.  The following method is  a crude  attempt  at this.
Each member of a panel is assumed to respond either yes  or no  to a
particular question for each day of a particular panel study.   Without
loss of generality, we will  assign the value 1 to  yes, and 0 to no.   The
nonindependence of individual responses can be demonstrated easily.   For
each day, j, each person responds 0 or 1,  as he does  on  day j  + 1.  This
process is a natural Markov  chain:

                            Day j + 1 Response
O)
CO
c
Q.
to n
QJ U
o;
r-J
>> 1
10 '
Q
0
Poo
Pio
1
Poi
Pll
where p0o + Poi = 1» Pio + Pn = 1 •   Tne P'S represent the probabilities of
having a particular response on each of 2 successive days.  If each
person's response is independent of his response on the previous day, then
POI will equal PH.  This can be tested using the standard 2x2 chi-square
test.
     The independence, or lack thereof, was studied using the 1970-1971
Riverhead asthma data.  The counts of responses were done for each
individual, and recorded in the 2x2 table previously described.  Let
noo> "oi* nio> nn be tlie observed counts corresponding to p0o> Poi> Pio>
Pn respectively.  The chi-square test for independence of one day's outcome
                                      18

-------
from the previous day's is




          N(n00n11-n10n01)
where N = n0o+n0i+nio+nii


These were computed for each panelist having marginal  totals greater than or


equal to 10.  Of the 21 panelists meeting this criteria, results for 17 were


significant at the .01 level, 1  was significant at a level  between .01  and


.05, and 3 were not significant at a level less than .05.


     Another interesting statistic is the relative risk.18  This is the


probability of developing an attack on one day given at attack on the


previous day, divided by the probability of developing one given none on


the previous day.  The observed relative risks ranged from 1.07 to 37.55.


The median relative risk was 4.96.  Thus the "median" person was almost five


times as likely to have an attack if he had one the previous day.  The


assumption of independence is clearly not met.


     One way to avoid the problem of lack of independence is to look only


at poo and poi.  Let the analysis explain why a person remains in state 0


(no symptoms) on some days, and why he moves from state 0 to state 1


(develops symptoms) on other days.  The price of this restriction is to


discard two other types of information:  1) p10:  the person moves from


state 1 to state 0 (loses symptoms) and 2) pn:  the person stays in state


1 (retains symptoms).  It could be argued that these two events are more


dependent on a person's recovery mechanism than on the insult causing the


response.


     Using pQO and pQ1 only, we can define a variable y-, as follows:
                                                        ' \J

          y.j • - 0 if person  i has no symptoms on either day j or day j-1



                                    19

-------
              = 1  if person i  has symptoms on day j,  but none on  day j=l
              = undefined otherwise.
y..  can be thought of as a Bernoulli  trial, with p..  given by

           pij = h(V *ij	'xkj}
where y.. = overall symptom rate for person i and x, .,...,x, .  = the k values
of the independent variables for day j.  These effects could  include such
factors as pollution, meteorology, seasonal cycles,  and day of week effects.
The logical functional form would be a general dose-response  curve of the form
                         f(t) dt
where f(t) is either a probit or logit density.  Even if g is a linear
function, the large number of variables makes maximum likelihood or least
squares estimation of the coefficients extremely difficult.
     If, however, the effects of the daily factors are relatively small,
then the dose-response curve may be approximately linear in those variables.
If we assume that p.. is linear in the parameters to be estimated, we have
reduced the problem to a point that some analyses can be calculated.
     The solution is to run a discriminant analysis using the y-^'s as
                                                                ' J
dependent variables, and dummy variables for people along with the other
covariates as independent variables.  Approximate tests for significance
for the independent variables can be made using partial F's, as described
by Lachenbruch.19  The results of these analyses appear in Tables 6 and 7.
Table 6 is analogous to Tables 2 and 4, in that the covariates are added one
                                     20

-------










1 1 1
LJLJ
rn
1— X>-
0 _l
Ll_ CO UJ
0 I—

co o a;
LU <;
CO LU Q-
>- H- UU
_l cC CO
^-C D-
2: co
cC N/ LU
0 h-
^ h- 1-1

5^ «=c o
i— i S! <->
o; nr
CJ) h- CO
CO CO 13
i — ' *=C O
0 1-1
Q D;
> i— i
CQ i— i CO
 -p-
c: s-
cu ta
i >
0 0
4-J C_3
I
Ll_ S-
O
q-


oo O
CU "O
CU CU
S- OJ
CD S-
CU 1 1
o


a>
^_>
(T3
'^
ra
>
O
o





i — c\j 01 r-^
UD r^ i — in
o~i en co tn
O LT> O r-


r^ co co CM
1^ CM O O
CM CO CM





CO i — 1^ CTl
O*\ CO Oi 1^-
o o o o
O O O 0
0 O O O
• • • •









CM •— O
00 00 CO
1 ...
(^ CO <£>










1 ^J" r— r-~



01
ra
S-
cu
Q-
§
h-
CL
zs E

CU O +-> E
E to S_ -i—
o ra ro c:
Z CU -t-> -i—
— co co s:
co
CTl
r — ,
0


r^.
0
CO





CO
CTi
o
o
0
.









^~
o
•
CO










^o








cu
cu
3

q-
o

>^
ro
o

-------













LU
1C X
1— O >-
oo _i
LU OO
o -z. =>
o o
OO LU
i— i oo z
00 LU 
oo oo o
*— i <£ C_?
Q
Q _l
. LU « i — i
CQ i— i oo

i —






















Q_


o
-M
1
Lu


C"
(O
O)
s





t{ —
o

E

oo


•4—)
E
res
E
•r*
P^
•^-
J..
U
to
•i —
t ^


O

to
O)
0)
S-
CT>
CO
Q


















O)
3

>
cu
>
0
E
cu
o:
cu
S-
(O

CJ~
oo



to
O)
s-

3
cr
oo



4^
E

»P—
O

')
(^ 	
CU
o
CJ



E
O
"^
cu
O)
s-






O)
1 *
ea

si
fO
^
o
o





CO
cn
o
in

CO
5^-
•


CM
p»*.
CO
CM
O
•




C\J
f^_
CO
CM
0
•





cn
CO
o
o
o
•









i—



















X
o
00
cn
^-
o
0

co
r^
•
co

•0
uo
to

ro
s-

Q.
E
0)
h-

E

E

c~
• i —
^"
^
ro
Q


1




1


CO
cn
o
cn
LO
•
r—~



,-~
cn
CO

CO
•
cn
«vj"





1












r— -
CO















CU
^~
Q.
O
CU
Q-











l_
to
to
o
•




o
CO
LO
cn

•
^—
o
co

















to
LO
LO
^~














S-
o
s»
S-
LU
22

-------
at a time.   SO  is no longer significant at the .05 level  (p=.096)  even
              A


without any covariates added.   If a season effect  is  added,  there  again  is



no evidence of any SO  effect.   Table 7 is analogous  to Tables  3 and 5,
                     X


with all  covariates added simultaneously.   The discriminant  analysis shows



a stronger  seasonal effect than did the other two  analyses.
                                     23

-------
                                 SECTION 6
                           COMPARISON OF METHODS

     The discriminant analysis method has eliminated the three major
problems of regression analysis on attack rates, namely (1)  changing panel
composition, (2) loss of sample size information, and (3) serial  correlation
of responses.  In their place are some new problems:
     1)  A large number of positive responses are ignored (approximately
         1/2 in this particular analysis.)
     2)  The true distributions of the "F to remove" tests are not known
         when discrete variables are used in discriminant analyses.
     3)  The procedure appears to be very conservative as a  test  for the
         effect of air pollution on asthma attack rates.
     There are obviously several other possible methods of analysis.  These
include the "multiple logistic function" of Truett, Cornfield, and Kannel20;
the "stimulus response" method of Lebowitz21, and the "binary multiple
regression analysis" of Elwood, Mackenzie, and Cran22.  These methods are
all appropriate for some data sets, but none of them solve the problems
of this particular kind of panel study.
     Perhaps a promising method of analysis is that given by Koch, et al.23
This technique provides for the analysis of multivariate categorical
data which are obtained from a repeated measure design.  Unfortunately, the
method requires that the number of days be much less than the number of sub-
jects.  For the asthma panels studied thus far by EPA, this limitation would
mean that no more than one month's data could be used at one time.  This
deletion would further reduce the number of eligible panelists, since many
                                     24

-------
have no attacks in any given month.   The reduced  data  set would  be



insufficient for analysis.

-------
                                 SECTION 7



                                DISCUSSION





     Although this report does not find a fully satisfactory method  of



analysis for asthma panel studies, a few conclusions  can  be drawn.   The first



is that any method of analysis must allow for the nonindependence of responses



for a particular individual.   Unfortunately,  the few  available methods which



accomplish this reduce the data set considerably, so  that consistent



significant results are unlikely.  It is possible, of course, that future



research may provide a more appropriate technique. Until  the time that better



methods are available, a possible improvement is to have  larger panels for



shorter periods of time.  These data could be analyzed by the method of Koch,



et al.23, and the contribution of seasonality factors would also be greatly



reduced.



     The results of the various analyses on the Riverhead panel indicate



that the relation between SOV and the asthma rate is  confounded with the
                            X


seasonal trends of both variables.  Any statement of  a positive relationship



between the two variables must be made with much qualification.
                                     26

-------
                                 REFERENCES
 1.   Cohen, A.  A.,  S.  Bromberg,  R.  W.  Buechley,  L.  T.  Heiderscheit,  and
     C.  M.  Shy.  Asthma  and Air  Pollution  from a Coal-fueled  Power Plant.
     Am.  J. of  Public  Health,  62:  1181-1188,  1972.

 2.   Salvaggio, John,  Victor Hasselblad, and  L.  T.  Heiderscheit.  New  Orleans
     Asthma.  II.   Relationship  of  Climatologic  and Seasonal  Factors to
     Outbreaks, J.  of  Allergy, 45:   257-265,  1970.

 3.   Girsch, L. S., E. Shubin, C.  Dick,  and  F.  A.  Schulaner.   A  Study  on
     the Epidemiology  of Asthma  in  Children  in Philadelphia.   J.  of
     Allergy, 39:   347-357, 1967.

 4.   Lawther, P. J., R.  E.  Waller,  and M.  Henderson.   Air  Pollution  and
     Exacerbations  of  Bronchitis.   Thorax:   25:   525-539,  1970.

 5.   Goldberg,  H.  E.,  A. A. Cohen,  J.  F. Finklea,  J.  H.  Farmer,  F. B.
     Benson, and G. J. Love.  Frequency  and  Severity of  Cardiopulmonary
     Symptoms in Adult Panels:   1970-1971  New York Studies.   In:  Health
     Consequences of Sulfur Oxides:  A Report from CHESS,  1970-1971.
     EPA-650/1-74-004, U. S. Environmental  Protection Agency,  Research
     Triangle Park, N. C.,  1974.

 6.   Hammer, D. I., V. Hasselblad,  B.  Portnoy, and P.  F. Wehrle.  Los
     Angeles Student Nurse  Study.   Daily Symptom Reporting  and Photochemical
     Oxidants.   Archives of Environmental  Health,  28:  255-260,  1974.

 7.   Lawther, P. J., A.  G.  F.  Brooks,  P. W.  Lord,  and R. E. Waller.
     Day-to-day Changes  in  Ventilatory Function  in Relation to the
     Environment.   Part  I.   Spirometric  Values.   Environmental Research,
     7:   27-40, 1974.

 8.   Lawther, P. J., A.  G.  F.  Brooks,  P. W.  Lord,  and R. E. Waller.
     Day-to-day Changes  in  Ventilatory Function  in Relation to the
     Environment.   Part  II.  Peak  Expiratory  Flow  Values.   Environmental
     Research,  7:   41-53.

 9.   Stebbings, James  H., Jr., and  Carl  G.  Hayes.   Panel Studies  of  Acute
     Health Effects of Air  Pollution I.  Cardiopulmonary Symptoms in Adults,
     New York 1971-1972.  Environmental  Research,  11:  89-111, 1976.

10.   Finklea, John  F., John H. Farmer, Gory  J.  Love,  Dorothy  C.  Calafiore,
     and Wayne  G.  Sovocool.  Aggravation of  Asthma by Air  Pollutants:
     1970-1971  New  York  Studies.   In:  Health Consequences  of  Sulfur Oxides:
     A Report form  CHESS, 1970-1971, EPA-650/1-74-004, U.  S.  Environmental
     Report from CHESS,  1970-1971,  EPA-650/1-74-004,  U.  S.  Environmental
     Protection Agency,  Research Triangle  Park,  N.  C., 1974.

11.   Greenberg, L., and  F.  Field.   Air Pollution and Asthma.   J.  of  Asthma
     Research",  2:   195,  1965.

                                     27

-------
12.   Booth, S., L.  Degroot, R.  Markash,  and R.  J.  M.  Horton.   Detection
     of Asthma Epidemics in Seven Cities.   Archives  of Environmental  Health,
     10:  152-155,  1965.

13.   Shy, Carl M.,  Victor Hasselblad,  Leo  T.  Heiderscheit,  and Arlan  A.  Cohen.
     Environmental  Factors in Bronchial  Asthma.   In:   Environmental  Factors
     in Respiratory Disease, Douglas  H.  K.  Lee,  ed.  Academic  Press,  1972.

14.   Derrick, E. H.  The Annual  Variation  of Asthma  in Brisbane:   Its
     Relation to the Weather.  International  J.  of Biometeorology,
     10:  91-99, 1966.

15.   Taylor, Angus  E.  Advanced Calculus.   Ginn  and  Company,  1955.
     pp. 711-718.

16.   Tromp, S. W.  Biometeorological  Analysis of the Frequency and  Degree
     of Asthma Attacks  in the Western  Part of the  Netherlands.  In:
     Proceedings of the Second International  Bioclimatological Conference.
     Oxford:  Pergamon  Press, 1962.   p.  477.

17.   Fisher, Ronald A.  Statistical  Methods for Research Workers.  Hafner,
     1958.  pp. 52-54.

18.   Fleiss, Joseph L.  Statistical  Methods for Rates and Proportions.
     John Wiley & Sons, New York, 1973.   pp.  39-51.

19.   Lachenbruch, Peter A.  Discriminant Analysis.   Hafner  Press, 1975.
     pp. 27-29.

20.   Truett, Jeanne, Jerome Cornfield, and William Kannel.   A Multivariate
     Analysis of the Risk of Coronary Heart Disease  in Framingham.
     J. of Chronic  Disease, 20:  511-524,  1967.

21.   Lebowitz, Michael  D. A Stimulus  Response Method for the Analysis of
     Environmental  and Health Events  Related in Time.  Environmental  Letters,
     2;  23-34, 1971.

22.   Elwood, J. H., G.  Mackenzie, and G. W. Cran.   The Measurement and
     Comparison of Infant Mortality Risks  by Binary  Multiple Regression
     Analysis.  J.  of Chronic Disease, 24:  93-106,  1971.

23.   Koch, Gary G., J.  Richard Landis, Jean L. Freeman, Daniel H. Freeman,  Jr.,
     and Robert G.  Lehnen.  A General Method for the Analysis of Experiments
     with Repeated Measurement of Categorical Data.   Biometrics, 33:   133-158,
     1977.
                                    28

-------
                                   TECHNICAL REPORT DATA
                            "'cast 'cad /"si/.A ti',i\\ uii the rc\ use /i lot" L
1 REPORT MO.
   EPA-600/1-78-043
                                                           3  REC.P.ENT'S ACCESS! Of* NO
4. TITut AND SUBTITLE

 COMPARISON OF METHODS  FOR  THE  ANALYSIS OF PANEL
 STUDIES
               5 REP .   DATE
                June  1978
7 AUTHOR(S)

 Victor Hasselblad
g PERFORMING ORGANIZATION NAMb AND ADDRESS

 Statistics and Data  Management Office
 Health Effects Research  Laboratory
 Research Triangle  Park   NC  27711
               6 PERFORMING ORGANISATION CODE
                                                           8 PERFORMING ORGANIZATION REPORT NO.
               10 PROGRAM ELEMENT NO

                 1AA601
               11. CONTRACT/GRANT NO
12. SPONSORING AGENCY NAME AND ADDRESS
 Health Effects Research  Laboratory
 Office of Research  and Development
 U.S.  Environmental  Protection  Agency
 Research Triangle Park.  NC 27711
                                                           13. TYPE OF REPORT AND PERIOD COVERED
RTP.NC
               14. SPONSORING AGENCY CODE

                EPA  600/11
15. SUPPLEMENTARY NOTES
16. ABSTRACT
      Three different  methods  of analysis of panels  were  compared using asthma  panel
 data from a 1970-1971  study done by EPA in Riverhead,  New York.   The methods were
 (1) regression analysis  using raw attack rates;  (2)  regression analysis using  the
 ratio of observed  attacks  to  expected attacks; and  (3)  discriminant analysis where
 repeated attacks were ignored.  The first two methods  were found to have serious
 serial correlation  problems.   The third method eliminated this problem, but
 reduced the effective sample  size considerably.

      A more appropriate  method was suggested for  larger  panels over shorter
 periods of time.   The analyses of the Riverhead data  showed that any sulfate
 effect on asthmatics  was confounded with seasonal trends.
17.
                                KEY WORDS AND DOCUMENT ANALYSIS
                  DESCRIPTORS
 statistical analysis
 epidemiology
 asthma
 b. IDENTIFIERS/OPEN ENDED TERMS  c  COS ATI I icid'Group
                              06 F
                              12 A
 8 DISTRIBUTION STATEMENT

 RELEASE TO PUBLIC
  13 SECURITY CLASS i Tins Report)    21 NO
  UNCLASSIFIED  	  	
  20 SECURITY CLt"SS~, r'.;- pa^cI      \22 PRICE

  UNCLASSIFIED
                                                                                  GES
                                                                        i
EPA Form 2220-1 (9-73)
                                             29

-------