oEPA
Uinted State?
Environmental Protection
Agency
Health Effects Research
Laboratory
Research Triangle Park NC 2771 1
EPA-600/1-78-043
June 1978
Research and Development
Comparison of
Methods for the
Analysis of Panel
Studies
N
\
ARY
: ' ^'MENTAL PROTECTS
• .-. J 06817
-------
RESEARCH REPORTING SERIES
Research reports of the Office of Research and Development, U.S. Environmental
Protection Agency, have been grouped into nine series. These nine broad cate-
gories were established to facilitate further development and application of en-
vironmental technology. Elimination of traditional grouping was consciously
planned to foster technology transfer and a maximum interface in related fields.
The nine series are:
1. Environmental Health Effects Research
2. Environmental Protection Technology
3. Ecological Research
4. Environmental Monitoring
5. Socioeconomic Environmental Studies
6. Scientific and Technical Assessment Reports (STAR)
7. Interagency Energy-Environment Research and Development
8. "Special" Reports
9. Miscellaneous Reports
This report has been assigned to the ENVIRONMENTAL HEALTH EFFECTS RE-
SEARCH series. This series describes projects and studies relating to the toler-
ances of man for unhealthtul substances or conditions This work is generally
assessed from a medical viewpoint, including physiological or psychological
studies. In addition to toxicology and other medical specialities, study areas in-
clude biomedical instrumentation and health research techniques utilizing ani-
mals — but always with intended application to human health measures.
This document is available to the public through the National Technical Informa-
tion Service, Springfield, Virginia 22161.
-------
EPA-600/1-78-043
June 1978
COMPARISON OF METHODS FOR THE ANALYSIS OF PANEL STUDIES
by
Victor Hasselblad
Statistics and Data Management Office
Health Effects Research Laboratory
Research Triangle Park, N.C. 27711
U.S. ENVIRONMENTAL PROTECTION AGENCY
OFFICE OF RESEARCH AND DEVELPMENT
HEALTH EFFECTS RESEARCH LABORATORY
RESEARCH TRIANGLE PARK, N.C. 27711
-------
DISCLAIMER
This report has been reviewed by the Health Effects Research Laboratory,
U.S. Environmental Protection Agency, and approved for publication. Approval
does not signify that the contents necessarily reflect the views and policies
of the U.S. Environmental Protection Agency, nor does mention of trade names
or commercial products constitute endorsement or recommendation for use.
n
-------
FOREWORD
The many benefits of our modern, developing, industrial society
are accompanied by certain hazards. Careful assessment of the relative
risk of existing and new man-made environmental hazards is necessary
for the establishment of sound regulatory policy. These regulations
serve to enhance the quality of our environment in order to promote the
public health and welfare and the productive capacity of our Nation's
population.
The Health Effects Research Laboratory, Research Triangle Park,
conducts a coordinated environmental health research program in toxicology,
epidemiology, and clinical studies using human volunteer subjects.
These studies address problems in air pollution, non-ionizing
radiation, environmental carcinogenesis and the toxicology of pesticides
as well as other chemical pollutants. The Laboratory participates in
the development and revision of air quality criteria documents on
pollutants for which national ambient air quality standards exist or
are proposed, provides the data for registration of new pesticides or
proposed suspension of those already in use, conducts research on
hazardous and toxic materials, and is primarily responsible for providing
the health basis for non-ionizing radiation standards. Direct support
to the regulatory function of the Agency is provided in the form of
expert testimony and preparation of affidavits as well as expert advice
to the Administrator to assure the adequacy of health care and surveillance
of persons having suffered imminent and substantial endangerment of
their health.
This report addresses the extreme difficulty of analyzing panel
data collected over time in an epidemiologic study. Although no
exact solution is found, some improvements are suggested for the design
and analysis of future studies.
F. G. Hueter, Ph. D.
Acting Director,
Health Effects Research Laboratory
-------
ABSTRACT
Three different methods of analysis of panels were compared using
asthma panel data from a 1970-1971 study done by EPA in Riverhead, N. Y.
The methods were 1) regression analysis using raw attack rates; 2) regression
analysis using the ratio of observed attacks to expected attacks; and 3)
discriminant analysis where repeated attacks were ignored. The first two
methods were found to have serious serial correlation problems. The third
method eliminated this problem, but reduced the effective sample size
considerably.
A more appropriate method was suggested for larger panels over shorter
periods of time. The analyses of the Riverhead data showed that any sulfate
effect on asthmatics was confounded with seasonal trends.
IV
-------
CONTENTS
Page
Foreword iii
Abstract iv
Figures vi
Tables vi
Acknowledgment vii
1. Introduction 1
2. The Basic Data 2
3. Analyses Using the Raw Attack Rate 5
4. Analyses Using Observed/Expected 13
5. A Modified Discriminant Analysis 18
6. Comparison of Methods 24
7. Discussion 26
References 27
-------
FIGURES
Number Page
1 Histogram of the Attack Rate for the 1970-1971 Riverhead
Asthma Panel °
2 Histogram of Suspended Sulfate Values for the 1970-1971
Riverhead Asthma Panel 7
3 Histogram of Residuals from a Regression Analysis of Attack
Rate for the 1970-1971 Riverhead Asthma Panel 12
4 Histogram of the Observed Attacks Divided by the Expected
Attacks for the 1970-1971 Riverhead Asthma Panel 14
5 Histogram of Residuals from a Regression ANalysis of
Observed/Expected for the 1970-1971 Riverhead Asthma
Panel 15
TABLES
Number Page
1 Pairwise Correlations of Pollutants in Riverhead, N.Y.,
October 18, 1970 to May 29, 1971 3
2 Multiple Regression Analyses of the Riverhead Asthma Attack
Rate on SO Using Various Covariates Separately 8
X
3 Multiple Regression Analysis of the Riverhead Asthma Attack
Rate on SO Using All Covariates Simultaneously 11
X
4 Multiple Regression Analyses of the Ratio of Observed to
Expected on SO Using Various Covariates Separately 16
A
5 Multiple Regression Analysis of the Rate of Observed to
Expected on SO Using All Covariates Simultaneously 17
A
6 Discriminant Analyses of the Riverhead Asthma Attack Rate
on SO Using Various Covariates Separately 21
A
7 Discriminant Analysis of the Riverhead Asthma Attack Rates
on SO Using All Covariates Simultaneously 22
X
VI
-------
ACKNOWLEDGMENT
I would like to thank the three reviewers, John Creason, Larry
Kupper, and Robert Chapman, who all made excellent suggestions. I would
also like to thank William C. Nelson for his continued support and
suggestions, and Barbara Crabtree for the typing of this manuscript.
vn
-------
SECTION 1
INTRODUCTION
Panel studies have made an important contribution to epidemiology.
These include studies of asthmatics by Cohen, et al.1, Salvaggio, et al.2,
and Girsh, et al.3. Lawther, et al.4 looked at panels of bronchitics.
Goldberg, et al.5 looked at cardiovascular symptoms in elderly persons.
Hammer, et al.6, studied irritation symptoms, including eye discomfort.
The effect of meteorology and pollution on pulmonary function was studied
by Lawther, et al.7'8. The problem with panel studies, as noted by Stebbings
and Hayes9, is that the analysis of such studies is statistically difficult.
If the data are continuous, such as pulmonary function data, then a repeated
measures analysis or multivariate analysis of variance is appropriate. If
the data are discrete, such as yes-no data, then no available techniques are
fully satisfactory.
In an effort to evaluate a range of analysis techniques, this paper
presents a comparison of three different methods of analysis, using 1970-1971
Riverhead, N. Y. asthma panel data. An analysis of this data, along with
similar analyses of panels from Queens and Bronx was previously presented by
Finklea, et al.10 This study was one of several conducted by the
Environmental Protection Agency's Community Health Environmental Surveillance
System (CHESS).
-------
SECTION 2
THE BASIC DATA
Panelists were selected from three New York communities. Riverhead,
located near the center of Long Island, had generally lower exposures to
air pollution than did the communities of Queens and Bronx. The names of
panelists were obtained from hospital clinic records and records of
practicing physicians. Background information on each panelist, including
age and smoking history, was obtained by an interview. All panelists
lived within a 1.5-mile radius of a monitoring station.
From October 4, 1970 to May 29, 1971 each panelist received a weekly
diary. The panelist recorded the presence or absence of an asthma attack for
each day of the week, and then returned the completed diary by mail. During
this same time period, continuous 24-hour air monitoring was also conducted by
EPA. The measurements included total suspended particulates (TSP), sulfate
fraction of TSP (SO ), nitrate fraction of TSP (NO ), sulfur dioxide (S02),
X A
and nitrogen dioxide (N02)- The measurement methods are described by
Finklea, et al.10.
This paper will restrict itself to the analysis of the Riverhead
panel. Although Riverhead had generally lower pollution values than the
other two communities, the previous analysis of the Riverhead data showed
a significant relationship between asthma attack rate and SOX- All of the
pollutants are correlated with one another, as shown in Table 1. This
paper will consider the relationship of asthma with S0x only.
The first 2 weeks of data were not used, to minimize the problem of
initial over-reporting, and to allow the panelists to become familiar with
the study. Fifty panelists were initially enrolled in Riverhead, but 11
-------
TABLE 1. PAIRWISE CORRELATIONS OF POLLUTANTS
IN RIVERHEAD, N. Y. FOR THE TIME PERIOD
OCTOBER 18, 1970 TO MAY 29, 1971.*
TSP
sox
NOX
S02
TSP
1.000
(206)
.796
(205)
.487
(206)
.370
(188)
SOX
.796
(205)
1.000
(205)
.380
(205)
.377
(187)
NOX
.487
(206)
.380
(205)
1.000
(206)
.141
(188)
S02
.370
(188)
.377
(187)
.141
(188)
1.000
(194)
* Sample Sizes Are Given in Parentheses.
-------
were eliminated from the analyses because they either never returned a
diary, or never reported having an asthma attack at any time during the
study.
-------
SECTION 3
ANALYSES USING THE RAW ATTACK RATE
One obvious method of analysis is to use the raw attack rate as the
dependent variable. This rate is defined for each day as the number of
panelists reporting an attack on that date divided by the number of
panelists giving a response on that same day. Figure 1 shows a histogram
of this variable. The analyses of this variable were restricted to the 205
days with valid SO measurements. The attack rate ranged from .057 to .500,
J\
with a mean value of .181. The SO measurements ranged from .5 to 51.4,
/\
with a mean of 9.81. A histogram of the SO values is shown in Figure 2.
X
Note that both the simple attack rate and SO have distributions which are
X
skewed to the right.
There are a large number of possible covariates which could be used
in the analyses. Several of these, such as pollen or emotional stress,
were not measured in this study. The following factors were used in the
analyses shown in Table 2.
1) Season: several authors have noted seasonal trends in asthma, with
higher rates in October and November in the northern hemisphere3'11"13
and higher rates in April and May in the southern hemisphere11*. This
effect was estimated using the first two terms of a Fourier series.15
2) Start-up effect: there is often over-reporting during the first
few weeks of a study. Although the first 2 weeks were dropped, the
reciprocal of the week number was still used as a covariate.
3) Temperature: low temperatures have also been associated with
increased asthma attacks.12"14'16 The daily minimum temperature as reported
at the airport was included as a covariate.
5
-------
00 < «-
Sn "
^Su:
• II — Q
I
O
co
E
o
c
CO
a.
CO
£
(A
CO
-a
co
•j±
to
0)
-C
O s-
« .2
03
=
o
to
o
CO
o
CM
-------
en
un
cd > ™ «*>
II UJ
II "
UJ O-
CC r-"
2» CO D
•55°
112
LJJ 111 X
O
O
il <
-------
UJ
h-
u_
o x>-
O —1
CO CO UJ
UJ h-
CO 2T<:
—1 «=c
uj «=co
rv ^~ t ^
"T"
UJ h- CO
1 I/O ZO
a. 1-1
CM i— i CO
OL •=>
UJ
CQ
pE
dJ
^3
Q. i—
to
O)
O X
E 0
O) CO
1 i.
o o
+-> M-
U_
C C.
O O)
•r— »r—
X in u
O IO -I—
CO O) t-
$- M-
cn o;
O) O
a: o
O)
S- 4->
QJ fO
40 *r~
C S-
O) fO
1 >
o o
4-> O
1
u_ s-
0
M-
E
(/) O
fjj GJ
i- a>
CJl S-
0) U-
Q
O
•M
(O
•r—
S-
rO
O
O
LO r~» co CT>
co r~- r— LO
CM CO CO CM
O sf O O
r**-« co r^ st~
CO •* UD O
^ Sh LO
CO CTl O Si"
oo si- si- cr>
•— o •— •—
o o o o
o o o o
• » » •
o s^ CM
CO CM CM
1 ...
CM LO
s^ o
1—
1 ^" r— r^
OJ
i.
3
to
QJ
Q-
E
OJ
h-
Q.
3 E
*-^ C 1 U
C1J O 4-> E
c in i- T-
O fO fC C
2T O -M *r~
^^ co co s:
o
CO
CM
o
LO
CM
LO
^O
CT)
r—
O
o
•
^o
o
•
r—
vo
vx
OJ
Q^
"^g
M-
O
>»j
re
Q
-------
4) Day of week: this variable was not known to be important, but
dummy variables were included to allow for any possible association of
attack with day of week.
Table 2 shows the F-to-enter for all of these covariates. This is the
F value associated with a simple linear regression using the particular
covariate as the independent variable. The SO regression coefficient is
/^
given for each covariate added separately to the model. The SO F-to-remove
A
value, which is a test of the effect of SO in addition to that of a specified
X
covariate, is also shown. Although both the seasonal trend and the start-up
effect are highly significant, only the seasonal trend reduces the
significance of SO .
/\
Table 3 shows the result of including all covariates in the same
multiple regression analysis. The analysis demonstrates the effect of too
many correlated covariates. The overall sum of squares explained is almost
10 times the sum of the individual sum of squares explained. The
regression coefficient for the start-up effect is negative, even though it
has a positive simple correlation of .653.
The residuals based on the regression analysis in Table 3 are shown
in Figure 3. A chi-square goodness of fit test for normality of the
residuals gives a value of 13.28 for 9 degrees of freedom (p=.1503). There
is some evidence of skewness to the right (p=.0100), but less evidence of
Kurtosis (p=.0738), using Fisher's G statistics17. The greatest problem
with the residuals is that the serial correlation is .3335, which is highly
significant (p=<.0001).
There are at least three major problems with the analysis:
1) There is no adjustment for changing panel composition due to
-------
dropins and dropouts.
2) The information about the sample size of the panel is lost.
3) The residuals show high serial correlation.
10
-------
LiJ
T3
i—
LL
0 X
o >-
CO CO _J
I— I CO
co s ID
>- o o
_J LU
Q- CO O
p— i et CJ
si LU et
1C
ce: CD
• LU 2:
co :> i— i
i— i CO
ni rS — i
— 1
CQ
4-> 0
1 £
U_ CU
CU
c s-
(O (O
CU 3
SI CT
CO
t/)
M- CU
O S-
rO
E 3
rs cr
CO CO
-f-J
c c
O CD
•1— •! —
(/) O
(/I *l~
CU M-
S- 4-
05 CU
Ol O
c£ o
^
o
e
V) O
CU "O
CU CU
s- cu
o) s-
O) Lu
Q
OJ
-p
fo
•1 —
S-
ra
>
0
o
o
00
CM
00
•
1
CO
o
o
0
•
CO
o
^J-
0
0
•
CO
1 —
o
o
0
r—
X
o
00
0
^~
co
o
U3
IO
•
CM
00
0*1
0
o
*
o
r*"*>
UD
CO
o
•
1
^j-
c
0
OO
ro
CD
00
CM
r—
cr>
CO
CM
o
•
10
o
0
0
o
•
<~o
o
o
0
o
•
en
^~
en
CO
o
•
1
r—
d.
3
1
-M
S-
ro
.4_>
OO
1 —
r-~
CO
•—
o
CO
•
CM
CO
o
^J-
0
o
•
co
o
^-
0
o
•
CM
o
r—
0
o
1
r—
CU
S-
3
-t-J
ro
a>
Q.
E
CU
I—
E
^
• 1 —
c:
., —
SI
>X)
CO
CM
"—
00
Id
•
1
co
LT>
0
0
•
OO
^J-
co
o
•
I
Id
SX
CU
CU
"^
([
o
^w
03
Q
1 —
o
0
o
M
«^-
cn
•
2
o tn
m «d-
i— CO
LO O
0 0
• •
(d *d
LO CM
en co
S-
O LU
LT.
CM
CD
LT;
II
CM
CU
Q-
• i —
4-'
, —
^
g"
-------
0)
ir
— CL
CO ^ d
go "-
6 n' * d
ii N I u
dN
-------
SECTION 4
ANALYSES USING OBSERVED/EXPECTED
One refinement of the analysis in the previous section is to compute an
expected attack rate for each day. This is the sum of the mean daily attack
rates for each individual who responded on a particular day. By forming the
ratio of the observed to the expected for each day, the analysis can allow
for changing panel composition. This ratio ranged from .3378 to 2.3861.
A histogram of the ratio is shown in Figure 4.
Table 4 was constructed in the same manner as Table 2, except that the
ratio of the observed to expected was used as a dependent variable instead
of the raw attack rate. The results are quite similar to Table 2, with
season again eliminating any SO effect. The p values for SO are slightly
x\ A
larger than they were for the analysis in Table 2.
If all covariates are added simultaneously, as was done in Table 3, the
result is as shown in Table 5. The covariates are still highly correlated,
and as a result, the individual sums of squares do not add up to one fifth
of the overall sum of squares. Again, the regression coefficient for the
start-up effect is negative, even though it is positively correlated (r=.562)
with the ratio of observed to expected rates.
The residuals based on the regression analysis in Table 5 are shown
in Figure 5. A chi-square goodness of fit test for normality of the
residuals gives a value of 24.96 for 21 degrees of freedom (p=.2488).
There is less evidence of skewness (p=.0916) and kurtosis (p=.94E2), than in
the previous analysis. The residuals still have a significant serial
correlation of .2257 (p=.0010).
13
-------
com
UJ O-
oc ri
< «-
— ) I,
"
r*.
05
03
o
re
re
+-• re
o +*
03 re
X 03
03 J2
03 +-
£ 2
•a
0) ^I
O 3
re -Q
« 'Z
re tJ
•D ^
03
> re
-
o
00
re
31 re
. -o
«* re
3 03
E1 -
LL be
14
-------
il
p** ^ o
co -i „
co d M
o O
6 ""' 52 S
II CM I
fci. a)
if -£
8;
o
in
o
CO
o
CM
15
-------
LU
1C
*~
U_ X
o o >-
oo _J
OO UJ
uj -z. h-
oo o <:
>~ OL
—1 Q et
eC UJ D_
00
ZT Q_ OO
O X UJ
i— i LU 1 —
OO
UJ LU O
C£ > o
uj uj oo
— 1 00 ID
a. CQ o
> — < CD i — "
1— OL
_J U_ eC
ZD O >
21
CD CD
i — i ^
• |— - *~-*
«vf M-
u«
E C
O OJ
•1— •!—
X (A 0
O 1/1 T-
00 O) M-
C (^ —
O) OJ
cu o
Qi O
CU
S- -M
CU (C
4-^ *r*
E S-
O) (C
1 >
O 0
4-) O
1 1 C
L4_ >«
0
<4-
E
t/> O
CU T3
CU CU
i. CU
cn S-
cu u_
Q
M-
O
CD
4_>
res
•r*™
^.
^
0
^_j
CM LO 1 —
OO ^3- CO
^J- OO LO
O ^J- 0
«3- i— CM
r— **O ^O
• • •
*!• OO
i— CT> CM
O Is*. CM
CO CM WD
O O O
O O O
• • •
l^~ O
CO LO
1 • •
<£> oo
CM cn
1 ^" r—
a.
3
' — ^ E 1
CU O 4->
E in s-
O t5 *
Z CU 4->
— oo oo
,
CO
0
LO
cn
•
oo
oo
cn
i*^
o
o
•
CM
0
•
^—
cu
S-
res
S-
cu
o.
J —
a;
t—
E
3
E
•r—
C
*i^
s:
00
oo
00
o
01
LO
•
^J-
CM
^-
00
o
o
•
f^
•=3-
f-~
<£)
CU
O)
3
l| —
0
>J
fO
Q
16
-------
UJ
~^~
H-
LJL.
O X
0 >-
oo oo _i
i— i OO
>- O O
_l LU
D- OO O
I-H CQ C_)
H- 0
ZD U 1
•Si O <
O CJ3
• i — i 2T
LO 1 — l-H
4_> o
i E
u_ cu
o:
cu
E S-
ro rO
CU 13
s: cr
OO
LO
q_ QJ
0 S-
ro
Z3 CT
OO OO
+-1
E E
O Q)
•r— •! —
CO O
CU £
01 cu
cu o
rv r_3
M-
0
E
t/l O
cu ~a
cu cu
i- cu
en s~
cu u_
Q
cu
4_)
ro
•i —
^.
ro
O
0
00
CM
r—
CM
LO
LO
•
r~_
OO
CM
CO
OO
r— .
•
00
CM
OO
oo
r—
•
r^
LO
^~
o
o
r—
X
0
oo
o
CO
I— —
1—
r+^
CO
,—
LO
0
LO
LO
r—*
.
LO
CM
o
LO
LO
•
1
^J-
E
o
LO
rO
OJ
oo
^;(-
^^,
en
LO
LO
r—
•
CM
•vf
oo
1—
o
CM
^J-
oo
(— ™
O
•
I---
LO
LO
LO
,"
r—
CL
13
1
+_)
S-
rt3
4-)
00
00
00
^~
•—
LO
^J-
•
CM
CO
^~
r^x.
1—
OJ
•
OD
r—~
|^
r—
CM
•
LO
OO
LO
O
0
1
l—~
Q)
^.
^
¥
cu
Q.
E
cu
1—
E
pt
• r—
E
• r—
S!
, —
r—
LO
O
^~
P—
.
CNJ
CM
CD
00
CO
•
LO
LO
OO
00
!
•
f—
1
LO
sx
cu
QJ
3
L)
O
>}
rO
Q
, —
0
O
0
V
^J-
1 —
.
0
•~
LO LO
LO ^d-
LO CO
en co
CO O
CM
r— CT>
LO ^d"
LO Ch
LO CO
• •
r~ ^O
1 — 1 — "
1
CO r—
i — O*»
, —
I— ~
,—
rO i-
S- 0
QJ S-
O LU
LO
CM
00
0
*
II
CM
C£.
QJ
Q.
•i —
4_)
i —
^
2!
17
-------
SECTION 5
A MODIFIED DISCRIMINANT ANALYSIS
Any sophisticated analysis of panel data should consider the pattern
of individual responses. The following method is a crude attempt at this.
Each member of a panel is assumed to respond either yes or no to a
particular question for each day of a particular panel study. Without
loss of generality, we will assign the value 1 to yes, and 0 to no. The
nonindependence of individual responses can be demonstrated easily. For
each day, j, each person responds 0 or 1, as he does on day j + 1. This
process is a natural Markov chain:
Day j + 1 Response
O)
CO
c
Q.
to n
QJ U
o;
r-J
>> 1
10 '
Q
0
Poo
Pio
1
Poi
Pll
where p0o + Poi = 1» Pio + Pn = 1 • Tne P'S represent the probabilities of
having a particular response on each of 2 successive days. If each
person's response is independent of his response on the previous day, then
POI will equal PH. This can be tested using the standard 2x2 chi-square
test.
The independence, or lack thereof, was studied using the 1970-1971
Riverhead asthma data. The counts of responses were done for each
individual, and recorded in the 2x2 table previously described. Let
noo> "oi* nio> nn be tlie observed counts corresponding to p0o> Poi> Pio>
Pn respectively. The chi-square test for independence of one day's outcome
18
-------
from the previous day's is
N(n00n11-n10n01)
where N = n0o+n0i+nio+nii
These were computed for each panelist having marginal totals greater than or
equal to 10. Of the 21 panelists meeting this criteria, results for 17 were
significant at the .01 level, 1 was significant at a level between .01 and
.05, and 3 were not significant at a level less than .05.
Another interesting statistic is the relative risk.18 This is the
probability of developing an attack on one day given at attack on the
previous day, divided by the probability of developing one given none on
the previous day. The observed relative risks ranged from 1.07 to 37.55.
The median relative risk was 4.96. Thus the "median" person was almost five
times as likely to have an attack if he had one the previous day. The
assumption of independence is clearly not met.
One way to avoid the problem of lack of independence is to look only
at poo and poi. Let the analysis explain why a person remains in state 0
(no symptoms) on some days, and why he moves from state 0 to state 1
(develops symptoms) on other days. The price of this restriction is to
discard two other types of information: 1) p10: the person moves from
state 1 to state 0 (loses symptoms) and 2) pn: the person stays in state
1 (retains symptoms). It could be argued that these two events are more
dependent on a person's recovery mechanism than on the insult causing the
response.
Using pQO and pQ1 only, we can define a variable y-, as follows:
' \J
y.j • - 0 if person i has no symptoms on either day j or day j-1
19
-------
= 1 if person i has symptoms on day j, but none on day j=l
= undefined otherwise.
y.. can be thought of as a Bernoulli trial, with p.. given by
pij = h(V *ij 'xkj}
where y.. = overall symptom rate for person i and x, .,...,x, . = the k values
of the independent variables for day j. These effects could include such
factors as pollution, meteorology, seasonal cycles, and day of week effects.
The logical functional form would be a general dose-response curve of the form
f(t) dt
where f(t) is either a probit or logit density. Even if g is a linear
function, the large number of variables makes maximum likelihood or least
squares estimation of the coefficients extremely difficult.
If, however, the effects of the daily factors are relatively small,
then the dose-response curve may be approximately linear in those variables.
If we assume that p.. is linear in the parameters to be estimated, we have
reduced the problem to a point that some analyses can be calculated.
The solution is to run a discriminant analysis using the y-^'s as
' J
dependent variables, and dummy variables for people along with the other
covariates as independent variables. Approximate tests for significance
for the independent variables can be made using partial F's, as described
by Lachenbruch.19 The results of these analyses appear in Tables 6 and 7.
Table 6 is analogous to Tables 2 and 4, in that the covariates are added one
20
-------
1 1 1
LJLJ
rn
1— X>-
0 _l
Ll_ CO UJ
0 I—
co o a;
LU <;
CO LU Q-
>- H- UU
_l cC CO
^-C D-
2: co
cC N/ LU
0 h-
^ h- 1-1
5^ «=c o
i— i S! <->
o; nr
CJ) h- CO
CO CO 13
i — ' *=C O
0 1-1
Q D;
> i— i
CQ i— i CO
-p-
c: s-
cu ta
i >
0 0
4-J C_3
I
Ll_ S-
O
q-
oo O
CU "O
CU CU
S- OJ
CD S-
CU 1 1
o
a>
^_>
(T3
'^
ra
>
O
o
i — c\j 01 r-^
UD r^ i — in
o~i en co tn
O LT> O r-
r^ co co CM
1^ CM O O
CM CO CM
CO i — 1^ CTl
O*\ CO Oi 1^-
o o o o
O O O 0
0 O O O
• • • •
CM •— O
00 00 CO
1 ...
(^ CO <£>
1 ^J" r— r-~
01
ra
S-
cu
Q-
§
h-
CL
zs E
CU O +-> E
E to S_ -i—
o ra ro c:
Z CU -t-> -i—
— co co s:
co
CTl
r — ,
0
r^.
0
CO
CO
CTi
o
o
0
.
^~
o
•
CO
^o
cu
cu
3
q-
o
>^
ro
o
-------
LU
1C X
1— O >-
oo _i
LU OO
o -z. =>
o o
OO LU
i— i oo z
00 LU
oo oo o
*— i <£ C_?
Q
Q _l
. LU « i — i
CQ i— i oo
i —
Q_
o
-M
1
Lu
C"
(O
O)
s
t{ —
o
E
oo
•4—)
E
res
E
•r*
P^
•^-
J..
U
to
•i —
t ^
O
to
O)
0)
S-
CT>
CO
Q
O)
3
>
cu
>
0
E
cu
o:
cu
S-
(O
CJ~
oo
to
O)
s-
3
cr
oo
4^
E
»P—
O
')
(^
CU
o
CJ
E
O
"^
cu
O)
s-
O)
1 *
ea
si
fO
^
o
o
CO
cn
o
in
CO
5^-
•
CM
p»*.
CO
CM
O
•
C\J
f^_
CO
CM
0
•
cn
CO
o
o
o
•
i—
X
o
00
cn
^-
o
0
co
r^
•
co
•0
uo
to
ro
s-
Q.
E
0)
h-
E
E
c~
• i —
^"
^
ro
Q
1
1
CO
cn
o
cn
LO
•
r—~
,-~
cn
CO
CO
•
cn
«vj"
1
r— -
CO
CU
^~
Q.
O
CU
Q-
l_
to
to
o
•
o
CO
LO
cn
•
^—
o
co
to
LO
LO
^~
S-
o
s»
S-
LU
22
-------
at a time. SO is no longer significant at the .05 level (p=.096) even
A
without any covariates added. If a season effect is added, there again is
no evidence of any SO effect. Table 7 is analogous to Tables 3 and 5,
X
with all covariates added simultaneously. The discriminant analysis shows
a stronger seasonal effect than did the other two analyses.
23
-------
SECTION 6
COMPARISON OF METHODS
The discriminant analysis method has eliminated the three major
problems of regression analysis on attack rates, namely (1) changing panel
composition, (2) loss of sample size information, and (3) serial correlation
of responses. In their place are some new problems:
1) A large number of positive responses are ignored (approximately
1/2 in this particular analysis.)
2) The true distributions of the "F to remove" tests are not known
when discrete variables are used in discriminant analyses.
3) The procedure appears to be very conservative as a test for the
effect of air pollution on asthma attack rates.
There are obviously several other possible methods of analysis. These
include the "multiple logistic function" of Truett, Cornfield, and Kannel20;
the "stimulus response" method of Lebowitz21, and the "binary multiple
regression analysis" of Elwood, Mackenzie, and Cran22. These methods are
all appropriate for some data sets, but none of them solve the problems
of this particular kind of panel study.
Perhaps a promising method of analysis is that given by Koch, et al.23
This technique provides for the analysis of multivariate categorical
data which are obtained from a repeated measure design. Unfortunately, the
method requires that the number of days be much less than the number of sub-
jects. For the asthma panels studied thus far by EPA, this limitation would
mean that no more than one month's data could be used at one time. This
deletion would further reduce the number of eligible panelists, since many
24
-------
have no attacks in any given month. The reduced data set would be
insufficient for analysis.
-------
SECTION 7
DISCUSSION
Although this report does not find a fully satisfactory method of
analysis for asthma panel studies, a few conclusions can be drawn. The first
is that any method of analysis must allow for the nonindependence of responses
for a particular individual. Unfortunately, the few available methods which
accomplish this reduce the data set considerably, so that consistent
significant results are unlikely. It is possible, of course, that future
research may provide a more appropriate technique. Until the time that better
methods are available, a possible improvement is to have larger panels for
shorter periods of time. These data could be analyzed by the method of Koch,
et al.23, and the contribution of seasonality factors would also be greatly
reduced.
The results of the various analyses on the Riverhead panel indicate
that the relation between SOV and the asthma rate is confounded with the
X
seasonal trends of both variables. Any statement of a positive relationship
between the two variables must be made with much qualification.
26
-------
REFERENCES
1. Cohen, A. A., S. Bromberg, R. W. Buechley, L. T. Heiderscheit, and
C. M. Shy. Asthma and Air Pollution from a Coal-fueled Power Plant.
Am. J. of Public Health, 62: 1181-1188, 1972.
2. Salvaggio, John, Victor Hasselblad, and L. T. Heiderscheit. New Orleans
Asthma. II. Relationship of Climatologic and Seasonal Factors to
Outbreaks, J. of Allergy, 45: 257-265, 1970.
3. Girsch, L. S., E. Shubin, C. Dick, and F. A. Schulaner. A Study on
the Epidemiology of Asthma in Children in Philadelphia. J. of
Allergy, 39: 347-357, 1967.
4. Lawther, P. J., R. E. Waller, and M. Henderson. Air Pollution and
Exacerbations of Bronchitis. Thorax: 25: 525-539, 1970.
5. Goldberg, H. E., A. A. Cohen, J. F. Finklea, J. H. Farmer, F. B.
Benson, and G. J. Love. Frequency and Severity of Cardiopulmonary
Symptoms in Adult Panels: 1970-1971 New York Studies. In: Health
Consequences of Sulfur Oxides: A Report from CHESS, 1970-1971.
EPA-650/1-74-004, U. S. Environmental Protection Agency, Research
Triangle Park, N. C., 1974.
6. Hammer, D. I., V. Hasselblad, B. Portnoy, and P. F. Wehrle. Los
Angeles Student Nurse Study. Daily Symptom Reporting and Photochemical
Oxidants. Archives of Environmental Health, 28: 255-260, 1974.
7. Lawther, P. J., A. G. F. Brooks, P. W. Lord, and R. E. Waller.
Day-to-day Changes in Ventilatory Function in Relation to the
Environment. Part I. Spirometric Values. Environmental Research,
7: 27-40, 1974.
8. Lawther, P. J., A. G. F. Brooks, P. W. Lord, and R. E. Waller.
Day-to-day Changes in Ventilatory Function in Relation to the
Environment. Part II. Peak Expiratory Flow Values. Environmental
Research, 7: 41-53.
9. Stebbings, James H., Jr., and Carl G. Hayes. Panel Studies of Acute
Health Effects of Air Pollution I. Cardiopulmonary Symptoms in Adults,
New York 1971-1972. Environmental Research, 11: 89-111, 1976.
10. Finklea, John F., John H. Farmer, Gory J. Love, Dorothy C. Calafiore,
and Wayne G. Sovocool. Aggravation of Asthma by Air Pollutants:
1970-1971 New York Studies. In: Health Consequences of Sulfur Oxides:
A Report form CHESS, 1970-1971, EPA-650/1-74-004, U. S. Environmental
Report from CHESS, 1970-1971, EPA-650/1-74-004, U. S. Environmental
Protection Agency, Research Triangle Park, N. C., 1974.
11. Greenberg, L., and F. Field. Air Pollution and Asthma. J. of Asthma
Research", 2: 195, 1965.
27
-------
12. Booth, S., L. Degroot, R. Markash, and R. J. M. Horton. Detection
of Asthma Epidemics in Seven Cities. Archives of Environmental Health,
10: 152-155, 1965.
13. Shy, Carl M., Victor Hasselblad, Leo T. Heiderscheit, and Arlan A. Cohen.
Environmental Factors in Bronchial Asthma. In: Environmental Factors
in Respiratory Disease, Douglas H. K. Lee, ed. Academic Press, 1972.
14. Derrick, E. H. The Annual Variation of Asthma in Brisbane: Its
Relation to the Weather. International J. of Biometeorology,
10: 91-99, 1966.
15. Taylor, Angus E. Advanced Calculus. Ginn and Company, 1955.
pp. 711-718.
16. Tromp, S. W. Biometeorological Analysis of the Frequency and Degree
of Asthma Attacks in the Western Part of the Netherlands. In:
Proceedings of the Second International Bioclimatological Conference.
Oxford: Pergamon Press, 1962. p. 477.
17. Fisher, Ronald A. Statistical Methods for Research Workers. Hafner,
1958. pp. 52-54.
18. Fleiss, Joseph L. Statistical Methods for Rates and Proportions.
John Wiley & Sons, New York, 1973. pp. 39-51.
19. Lachenbruch, Peter A. Discriminant Analysis. Hafner Press, 1975.
pp. 27-29.
20. Truett, Jeanne, Jerome Cornfield, and William Kannel. A Multivariate
Analysis of the Risk of Coronary Heart Disease in Framingham.
J. of Chronic Disease, 20: 511-524, 1967.
21. Lebowitz, Michael D. A Stimulus Response Method for the Analysis of
Environmental and Health Events Related in Time. Environmental Letters,
2; 23-34, 1971.
22. Elwood, J. H., G. Mackenzie, and G. W. Cran. The Measurement and
Comparison of Infant Mortality Risks by Binary Multiple Regression
Analysis. J. of Chronic Disease, 24: 93-106, 1971.
23. Koch, Gary G., J. Richard Landis, Jean L. Freeman, Daniel H. Freeman, Jr.,
and Robert G. Lehnen. A General Method for the Analysis of Experiments
with Repeated Measurement of Categorical Data. Biometrics, 33: 133-158,
1977.
28
-------
TECHNICAL REPORT DATA
"'cast 'cad /"si/.A ti',i\\ uii the rc\ use /i lot" L
1 REPORT MO.
EPA-600/1-78-043
3 REC.P.ENT'S ACCESS! Of* NO
4. TITut AND SUBTITLE
COMPARISON OF METHODS FOR THE ANALYSIS OF PANEL
STUDIES
5 REP . DATE
June 1978
7 AUTHOR(S)
Victor Hasselblad
g PERFORMING ORGANIZATION NAMb AND ADDRESS
Statistics and Data Management Office
Health Effects Research Laboratory
Research Triangle Park NC 27711
6 PERFORMING ORGANISATION CODE
8 PERFORMING ORGANIZATION REPORT NO.
10 PROGRAM ELEMENT NO
1AA601
11. CONTRACT/GRANT NO
12. SPONSORING AGENCY NAME AND ADDRESS
Health Effects Research Laboratory
Office of Research and Development
U.S. Environmental Protection Agency
Research Triangle Park. NC 27711
13. TYPE OF REPORT AND PERIOD COVERED
RTP.NC
14. SPONSORING AGENCY CODE
EPA 600/11
15. SUPPLEMENTARY NOTES
16. ABSTRACT
Three different methods of analysis of panels were compared using asthma panel
data from a 1970-1971 study done by EPA in Riverhead, New York. The methods were
(1) regression analysis using raw attack rates; (2) regression analysis using the
ratio of observed attacks to expected attacks; and (3) discriminant analysis where
repeated attacks were ignored. The first two methods were found to have serious
serial correlation problems. The third method eliminated this problem, but
reduced the effective sample size considerably.
A more appropriate method was suggested for larger panels over shorter
periods of time. The analyses of the Riverhead data showed that any sulfate
effect on asthmatics was confounded with seasonal trends.
17.
KEY WORDS AND DOCUMENT ANALYSIS
DESCRIPTORS
statistical analysis
epidemiology
asthma
b. IDENTIFIERS/OPEN ENDED TERMS c COS ATI I icid'Group
06 F
12 A
8 DISTRIBUTION STATEMENT
RELEASE TO PUBLIC
13 SECURITY CLASS i Tins Report) 21 NO
UNCLASSIFIED
20 SECURITY CLt"SS~, r'.;- pa^cI \22 PRICE
UNCLASSIFIED
GES
i
EPA Form 2220-1 (9-73)
29
------- |