United States
Environmental Protection
Agency
Office of Monitoring Support and/*-
-------
or otherwise associated in epidemi-
ologies! studies.
2) Differences or similarities of corre-
lations between sexes or races may
indicate genetic or occupational
factors of importance in tracing dis-
ease etiology.
3) Negative correlations may indicate
competing causes of death.
The objective of the study was to evalu-
ate use of the UPGRADE system to calcu-
late all possible correlations, determine
the strongest correlations, record them
for future use by interested researchers,
and investigate the geographical varia-
tion of the strongly correlated diseases.
Experimental Procedure.
The data base used in the study con-
sisted of county-level, age-adjusted
mortality rates averaged over the five-
year period 1968-1972. The rates were
calculated by Herb Sauer of the Univer-
sity of Missouri, using the detailed mor-
tality records provided by the National
Center for Health Statistics (NCHS). All
deaths were recorded between 1968
and 1971, but only every other death in
1972. Thus, some sampling error could
exist for the less common diseases.
Death rate calculations were based on
each county's 1970 population. About
50 causes of death were studied for
white males and females (Table 1).
Because the 3082 mortality rates for
almost any cause of death contained
some 10-30 extraordinarily high rates,
due often to confounding factors such
as the existence of a major institution
(Indian reservation, regional hospital,
prison) in the county, and because these
rates could exert undue influence on the
Pearson correlation coefficient, such
outliers were eliminated by use of a scat-
terplot screening technique. Visual in-
spection of the scatterplots suggested
reasonable upper and lower bounds for
county mortality rates to be included in
the correlation calculations. Varying
these limits provided an indication of the
sensitivity of the calculations to the
number of counties included: only about
a 10-15% variation in the most signifi-
cant correlations was observed.
A stringent significance criterion of p
•C0001 was chosen to lessen the likeli-
hood of error in identifying significant
correlations. Even so, of the approxi-
mately 1200 possible correlations for
each sex, 1 52 correlations were signifi-
cant for white females and 136 correla-
tions were significant for white males at
the p<.0001 level.
Table 1. UPGRADE Variables Used in This Study with Corresponding ICDA Codes
UPGRADE
CODE
071
072
073
074
075
076
077
078
079
080
081
082
083
084
085
086
087
088
089
090
091
092
093
094
O95
096
097
098
099
100
101
102
103
104
105
106
107
1O8
109
VARIABLE
Tuberculosis, All Forms
Other Infective Disease
Ca Buccal Cavity, Pharynx
Cancer of Esophagus
Cancer of Stomach
Cancer of Intestine
Cancer of Rectum
Ca Liver, Gall B., Ducts
Cancer of Pancreas
Other Digestive Cancer
Cancer of Resp. System
Cancer of Breast
Cancer of Cervix
Cancer of Uterus
Ca Prost, Other Female Ca
Cancer of Bladder
Cancer of Kidney, Etc.
Cancer of Central Nervous System (CNS)
Residual Cancer
Cancer, Ill-Def. & Sec.
Lymphosarcoma, Etc.
Hodgkin's Disease
Multiple Myeloma
Leukemia
Other Lymphatic
Neoplasms, Benign & Unspecified
Diabetes
Alcoholism
Rheumatic Heart Dis.
Hypertension
Acute Ischemic Heart Dis.
Chronic Ischemic Heart
Other Heart Disease
Cerebrovascular Disease
Arteriosclerosis
Aortic Aneurysm
Other Arteries, Etc.
Veins, Etc.
Influenza and Pneumonia
ICDA
CODES
(8th Revision)
010-019
000-009,020-136
140-149
150
151
152, 153
154
155, 156
157
158, 159
160-163
174
180
181, 182
183, 184, 185
188
189
191, 192
170-3, 183, 186-7,
190, 194
195-199
200
201
203
204-207
2O2, 2O8, 2O9
210-239
250
303
390-398
400-404
410,411
412,413
420-429
430-438
440
441
442-448
450-458
470-486
(continued)
-------
Table 1.
(Continued)
UPGRADE
CODE
VARIABLE
ICDA
CODES
(8th Revision)
110
111
112
113
114
115
116
117
126
127
Table 2.
Disease Title
Chronic Resp. Dis.
Cirrhosis of Liver
Chronic Nephritis, Etc.
Infections of Kidney
Congenital Heart & Circ.
Other Congenital
Other Early Infancy
Symptoms, Ill-Defined
Major CV Diseases
Cancer, All Sites and Forms
490-493, 517-519
571
582-584
590
746, 747
740-745, 748-759
760-778
780-796
390-448
140-209
Correlation Coefficients for the Top Twenty Correlations
White Females
Scatterp/ot
Method
Exclusion/
Filter Method
1. Cancer of the Respiratory System - Cirrhosis .224 .206
2. Cancer of the Intestine - Cancer of the Breast .211 .161
3. Chronic Ischemic - Cancer, All Forms . 189 .238
4. Chronic Ischemic - Cirrhosis . 182 .198
5. Cancer of the Intestine - Cancer of the Rectum .180 .145
6. Cancer of the Cervix - Major CV .180 .189
7. Other Heart Disease-Symptoms, Ill-Defined .179 .203
8. Rheumatic Heart Disease - Chronic Ischemic .175 .193
9. Chronic Ischemic - Other Heart -.172 -.2OJ
10. Acute Ischemic - Cerebrovascular .171 .173
11. Cancer of the Rectum - Rheumatic Heart .170 .143
12. Aortic Aneurysm - Cirrhosis .170 .152
13. Cancer of the Esophagus - Cirrhosis . 169 .118
14. Cancer of the Rectum - Chronic Ischemic .167 .151
15. Rheumatic Heart - Cirrhosis .166 .164
16. Diabetes - Major CV .166 .210
17. Rheumatic Heart - Aortic Aneurysm .161 .141
18. Cancer of the Rectum - Cancer of the Breast .159 .155
19. Cirrhosis-Cancer, All Forms .150 .194
20. Major CV - Cancer, All Forms .146 .149
The top 30 of these correlations for
each sex were further examined. If out-
liers were suspected, a new modified re-
gression was run using different bounda-
ries for excluding counties. (In most
cases, fewer than 1 % of all counties
were excluded). This procedure resulted
in some changes of order among the top
correlations, but few sharp changes in
the magnitudes of the correlation coeffi-
cients.
Results and Discussion
From the procedures discussed above,
a final list of the 20 strongest correla-
tions was obtained (Tables 2 and 3). No
fewer than eleven correlations appear in
both tables, and only two pairs of dis-
eases for each sex were not strongly cor-
related in the other sex (Table 4). Thus,
sex is not a strong factor in the co-varia-
tion of mortality rates for most diseases.
However, population density is very
clearly an important factor in the most
strongly correlated disease pairs, as can
be seen by comparing those causes of
death most strongly associated with
county population to those most strongly
correlated with each other. For white fe-
males, six of 48 causes of death investi-
gated showed a strong ( p <.0001) in-
crease in mortality rates in the more pop-
ulous counties (Table 5). Four of these
six appear most often in the strongest
20 correlations for females. Similarly,
nine of 46 causes of death investigated
for white males showed a strong (p
<.0001) increase with county popula-
tion (Table 6). Six of these nine appear
most often in the strongest 20 correla-
tions for white males.
The strongest negative correlations
are dominated by the "miscellaneous"
categories of "Other Heart Disease"
and "Symptoms, Ill-defined" (Table 7).
These categories probably "compete"
with other causes of death in the sense
that inexperienced or untrained county
medical officers are more likely to classify
difficult cases in the miscellaneous cate-
gory. However, the frequent appearance
of rheumatic heart disease in this table
does not appear to be explainable in the
same way. Rheumatic heart disease ap-
pears only for white females and only in
association with diseases that have
higher mortality rates in rural regions.
This phenomenon seems worthy of fur-
ther study.
The Pearson product-moment correla-
tion coefficient calculation assumes a
normal distribution. However, the distri-
bution of county mortality rates was cat-
-------
Table 3. Correlation Coefficients for the Top Twenty Correlations
White Males
Scatterplot
Disease Title Method
Exclusion/
Filter Method
1. Other Heart Disease - Symptoms, Ill-Defined . 286
2. Cancer of the Respiratory System - Major
Cardiovascular .286
3. Chronic Ischemic Heart Disease - Aortic
Aneurysm .268
4. Chronic Ischemic - Cirrhosis of the Liver .263
5. Chronic Ischemic - Cancer, All Forms and Sites . 250
6. Cirrhosis - Aortic Aneurysm .246
7. Cirrhosis - Cancer, All Forms .243
8. Cancer of the Respiratory System - Chronic
Ischemic .242
9. Cancer of the Rectum - Cancer of the Intestine .242
10. Cancer of the Rectum - Chronic Ischemic . 241
11. Major CV - Cancer, All Forms . 239
12. Acute Ischemic Heart Disease - Cerebrovascular . 235
13. Cancer of the Buccal Cavity, Pharynx - Cancer
of the Respiratory System . 233
14. Cancer of the Esophagus - Cirrhosis .231
15. Cancer of the Respiratory System - Chronic
Respiratory .231
16. Cancer of the Respiratory System - Aortic
Aneurysm .226
17. Aortic Aneurysm - Cancer, All Forms .214
18. Cancer of the Rectum - Cirrhosis . 205
19. Cancer of the Respiratory System - Cancer
Ill-Defined and Unspecified . 202
20. Cancer of the Respiratory System - Cirrhosis .201
.256
.302
.201
.221
.217
.146
.249
.223
.168
.205
.220
.301
.170
.186
.203
.173
.178
.161
.175
.183
Table 4. Correlations That Are Strong For One Sex But Not The Other
Rank (WF)
11
16
Rank (WM)
Cancer of the Rectum - Rheumatic Heart
Disease
Diabetes - Major CV Diseases
Correlation Coefficient
WF WM
.170 .130
.166 .098
1 Cancer of the Resp. System - Major CV
Diseases .084 .286
13 Cancer of the Resp. System - Ca. Buccal
Cavity .074 .233
culated for six causes of death for each
of three race-sex groups and not one of
the 18 data sets passed chi-square tests
for normality. In every case, the distribu-
tions were more strongly clustered
toward the mean and simultaneously
more dispersed in the tails than the nor-
mal distribution. Such distributions are
termed kurtic. The 18 distributions were
then plotted on logarithmic probability
paper but failed to display log-normal be-
havior. (Figure 1 provides an example of
the nonlinear shape of the distribution).
When a more homogenous set of coun-
ties is selected, the distribution of mor-
tality rates may more nearly approach
log-normality. For example, lung cancer
death rates for white males in 234 mostly
urban counties were much closer to a
log-normal distribution than the rates
from all 3082 counties (Figure 2).
Thus we are uncertain of the interpre-
tation to be given to the absolute values
of the Pearson product-moment correla-
tions calculated in Tables 2 and 3, al-
though the relative values may be more
trustworthy. For this reason, we have
considered only correlations with p
<.0001. Nonparametric statistics would
have been preferable, but because of the
large number of counties involved, it
was not feasible to calculate Spearman
or Kendall correlation coefficients.
It should also be noted that the lack of
normality of the county mortality rate
distributions probably decreases the al-
lowed range of negative correlations.
(For example, two log-normally distri-
buted variables have a minimum r. of
-0.369, although the positive limit re-
mains at + 1.0.) Thus, a negative r is
probably indicative of a stronger rela-
tionship than a positive one of the same
magnitude.
Geographic variations were studied
using bivariate color maps created by the
Domestic Information Display System
(DIDS). Rates for each disease were cat-
egorized in quartiles, and colors assigned
to each of the 16 cells of the resulting
4x4 matrix. Geographic characteriza-
tions of six disease pairs showing high
correlations for both white males and
white females were prepared. An exam-
ple is given in Table 8.
Two other studies have used similar
programs for investigating correlations
between diseases. Saueri has grouped
the same basic mortality data (1968-72)
by state and by state economic area;
thus, the present study of county data
can be viewed as complementary to
Sauer's work. Wellington, MacDonald
-------
Table 5. Variation in Mortality Rate with County Population
(Age-Adjusted Mortality Rate per Million at Risk (1968- 72) - White Males)
1970 White Male County Population tin thousands)
Cause of Death
O-5
5-10
10-25
25-100
>10O
Tuberculosis, AH Forms
Other Infective Disease
CA Buccal Cavity, Pharynx
Cancer of Esophagus
Cancer of Stomach
Cancer of Intestine
Cancer of Rectum
Cancer of Liver, Gall B., Ducts
Cancer of Pancreas
Other Digestive Cancer
Cancer of Resp. System
Cancer of Breast
Cancer of Prostate
Cancer of Bladder
Cancer of Kidney, Etc.
Cancer of CNS
Residual Cancer
Lymphosarcoma, Etc.
Cancer Ill-Del, and Sec.
Hodgkin 's Disease
Multiple Myeloma
Leukemia
Other Lymphatic
Neoplasms, Benign and Unspec.
Diabetes
Alcoholism
Rheumatic Heart Disease
Hypertension
Acute Ischemic Heart Dis.
Chronic Ischemic Heart
Other Heart Disease
Cerebrovascular Disease
Arteriosclerosis
Aortic Aneurysm
Influenza and Pneumonia
Chronic Resp. Disease
Cirrhosis of Liver
Chronic Nephritis
Infections of Kidney
Congenital Heart & Cir.
Other Congenital
Other Early Infancy
Major Cardiovascular Diseases
Cancer, AH Sites and Forms
25
64
45
26
94
152
39
31
110
7
512
3
202
61
44
47
77
38
109
22
24
108
25
23
170
24
55
117
3,006
1,223
376
1,207
198
90
415
403
115
44
SO
42
48
237
6,355
1,777
30
65
44
29
95
157
46
28
105
a
529
2
195
55
43
47
77
36
109
22
24
96
26
20
166
26
61
133
3,004
1,300
321
1,229
198
92
403
376
116
45
46
44
45
234
6,416
1,774
32
56
47
33
92
162
56
28
108
8
566
3
200
59
43
49
72
40
111
21
24
97
26
22
171
27
65
129
3,019
1,445
293
1,252
202
102
397
395
128
43
49
44
47
228
6,587
1,848
31
56
54
38
92
175
62
29
108
7
600
3
200
70
47
47
70
41
113
20
25
92
25
25
174
25
72
121
2,860
1,682
247
1,166
199
125
387
423
158
47
44
42
45
216
6,549
1,926
35
53
64
46
107
208
72
35
111
8
645
3
198
78
46
51
68
43
118
21
24
91
24
26
173
26
83
104
2,629
1,930
190
1,056
180
129
393
397
219
35
39
41
42
199
6,311
2,071
—
—
.02
.OOO/
—
.0007
.0001
—
—
—
.0001
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
.000;
—
.0001
.0001
.0001
.0003
—
.0001
—
—
.0001
—
—
—
—
.02
—
.0001
'Probability that the increase (decrease) in rates is due to chance (Pearson product-moment correlations applied to all counties)
and Wolf2 considered cancer mortality
between 1950 and 1969 on a state-
wide basis. Comparisons with results
from both works reveals considerable
agreement, although different choices
of disease groups makes detailed com-
parisons impossible.
References
1. Sauer, H.I., Geographic Patterns in
the Risk of Dying and Associated Fac-
tors: U.S. 1968-72, National Center
for Health Statistics, U.S. Dept. of
Health & Welfare, Wash. D.C. 1979.
2. Wellington, MacDonald, and Wolf,
Cancer Mortality: Environmental and
Ethnic Factors, Academic Press, New
York, 1979.
-------
Table 6. Variation in Mortality Rate with County Population
(Age-Adjusted Mortality Rate per Million at Risk f 1968- 72) — White Females)
1970 White Female County Population (in thousands)
Cause of Death
Tuberculosis, All Forms
Other Infective Disease
Cancer of Buccal Cavity, Pharynx
Cancer of Esophagus
Cancer of Stomach
Cancer of Intestine
Cancer of Rectum
Cancer of Liver, Gall B., Ducts
Cancer of Pancreas
Other Digestive Cancer
Cancer of Resp. System
Cancer of Breast
Cancer of Cervix
Cancer of Uterus
Other Female Cancer
Cancer of Bladder
Cancer of Kidney, Etc.
Cancer of CNS
Residual Cancer
Lymphosarcoma, Etc.
Cancer Ill-Def. and Sec.
Hodgkin's Disease
Multiple Myeloma
Leukemia
Other Lymphatic
Neoplasms, Benign and Unspec.
Diabetes
Alcoholism
Rheumatic Heart Disease
Hypertension
Acute Ischemic Heart Dis.
Chronic Ischemic Heart
Other Heart Disease
Cerebrovascular Disease
Arteriosclerosis
Aortic Aneurysm
Influenza and Pneumonia
Chronic Resp. Disease
Cirrhosis of Liver
Chronic Nephritis, Etc.
Infections of Kidney
Congenital Heart & Circ.
Other Congenital
Other Early Infancy
Major Cardiovascular Diseases
Cancer, All Sites and Forms
0-5
9
56
16
9
50
143
28
37
61
8
85
212
52
42
81
20
22
31
46
23
85
11
17
58
16
20
192
6
51
104
1,197
900
214
1,000
162
28
264
89
51
28
47
33
43
172
3,712
1,153
5-10
10
52
15
8
46
147
29
31
62
6
90
219
59
46
84
17
21
32
44
24
97
12
15
59
16
21
180
5
5O
111
1,212
938
195
978
162
26
252
88
50
29
41
35
41
161
3,723
1,178
10-25
10
46
16
9
48
156
34
31
64
6
97
228
61
46
91
19
21
33
43
26
91
11
16
59
16
18
187
5
55
112
1,202
1,026
175
996
164
28
243
89
57
29
42
35
45
164
3,813
1,227
25-100
10
42
17
9
45
160
38
32
64
6
107
249
61
44
95
22
21
30
40
28
89
12
17
56
15
20
186
6
67
97
1,161
1,179
148
962
168
33
233
95
74
25
37
34
43
154
3,869
1,254
>100
10
40
19
12
52
173
40
32
66
6
126
279
50
46
99
23
21
32
39
28
92
13
17
57
16
21
175
7
82
86
1,114
1,263
137
899
149
34
225
94
101
22
34
33
40
143
3,815
1,359
P*
.03
—
.03 -
—
.002
.002
—
—
—
.0007
.0007
—
—
.007
—
—
—
—
—
—
—
—
—
—
—
—
—
.0007
.05
.01
.0001
.003
.009
—
—
.04
—
.0001
—
—
—
—
.04
—
.0001
*See note to Table 5.
-------
Table 7. Strongest Negative Correlations
WF
WM
Chronic Ischemic
Ca. Rectum
Aortic Aneurysm
Ca. Breast
Ca. Intestine
Acute Ischemic
Ca. Intestine
Ca., All Forms
Acute Ischemic
Chronic Ischemic
Major CV
Ca. Rectum
Ca. All Forms
Ca. Breast
Ca. Intestine
Other Heart Disease
Infections of Breast
Symptoms, Ill-Defined
Acute Ischemic
Cerebrovascular
Chronic Isch. Heart
Disease
Ca. Rectum
vs. Other Heart Disease
vs. Other Heart Disease
vs. Other Heart Disease
vs. Other Heart Disease
vs. Other Heart Disease
vs. Other Heart Disease
vs. Other Heart Disease
vs. Other Heart Disease
vs. Symptoms, Ill-Defined
vs. Symptoms, Ill-Defined
vs. Symptoms, Ill-Defined
vs. Symptoms, Ill-Defined
vs. Symptoms, Ill-Defined
vs. Symptoms, Ill-Defined
vs. Symptoms, Ill-Defined
vs. Rheumatic Heart Disease
vs. Rheumatic Heart Disease
vs. Rheumatic Heart Disease
vs. Rheumatic Heart Disease
vs. Rheumatic Heart Disease
vs. Cerebrovascular
vs. Cerebrovascular
-.172
-.105
-.086
-.137
*
-.125
-.123
-.118
-.137
-.114
-.083
-.139
-.122
-.123
-.141
-.137
-.125
-.118
-.092
-.141
-.197
-.146
-.101
NA
-.130
-.097
-.196
-.155
-.123
-.125
-.093
-.082
* Blanks indicate correlations that were not significant atp <. 000 / for the particular sex.
-------
woo
800
600
§500
. 400
I 300
£
Q. 200
2 100
$ so
I"
60
50
40
30 -
20-
JO
Major
CV diseases
Major
CV diseases
(cont.)
2000
"\9 . \ 1000
0.01 0.1 0.5 1 2 5 10 20 40 60 80 90 95 98 99 99.8 99.99
Figure 1. Cumulative frequency distribution of mortality rates: (1968-72).
1000
800
600
500
§ 400
g 300
« 200
Q.
1 1°°
•8
80
I-
60
50
40
30
20
234 mostly urban
U.S. counties
All 3,082 U.S. counties
'1313 U.S. counties with
more than 10,000 white
male population
10
0.01
98 99.8
0.1
1 2 5 10 20 40 60 80 90 95 99 99.9 99.99
Figure 2. Cumulative frequency distribution of lung cancer mortality rates-
white males (1968-72)
8
-------
Tables.
Respiratory Cancer vs Cirrhosis of Liver
Respiratory Cirrhosis of
Cancer Liver Sex Geographic Location
HIGH HIGH WM New England, California, Florida
WF New England*, California, Florida, Nevada, Arizona,
New Mexico, Washington fSeattle-Tacoma-Everett),
Gulf Coast, Alaska
LOW LOW WM Tennessee, Kentucky, Virginia
WF West**, Southeast
HIGH MIXED WM Georgia, South Carolina, Lower Mississippi River
LOW HIGH WM West, Southwest***
* Particularly CONNECTICUT, MASSACHUSETTS, SOUTHERN VERMONT AND
NEW HAMPSHIRE, EASTERN NEW YORK STATE, COASTAL PARTS OF MAIN,
MOST OF NEW JERSEY
' * Particularly WESTERN PORTION OF NORTH AND SOUTH DAKOTA,
NEBRASKA, KANSAS, SOUTHERN PORTION OF MONTANA, IDAHO, UTAH
>* Particularly NEW MEXICO, COLORADO, WYOMING
The EPA authors Lance Wallace and Valarie J. Gill (also the EPA Project
Officer, see below) are with the Office of Monitoring Support and Quality
Assurance, Washington, DC 20460.
The complete report, entitled "Correlations Between Age-Adjusted Mortality
Rates for White Males and Females in the United States, by County: 1968-
1972," (Order No. PB 82-224 114; Cost: $10.50, subject to change) will be
available only from:
National Technical Information Service
5285 Port Royal Road
Springfield, VA 22161
Telephone: 703-487-4650
The EPA Project Officer can be contacted at:
Office of Monitoring Support and Quality Assurance
U.S. Environmental Protection Agency (RD-680)
Washington, DC 20460
•USGPO:1M2-659-095-543
-------
o
O :
o
o>
O
Oi
KJ
O)
00
,
o i_
>cc
S^
cc
5
rn
?
o
m
TJ m •
>
U< ST. 3 (
01 S|«
------- |