Pattern Recognition Analysis
of VA/EPA PCDD and PCDF Data
Final Report
by
Karin M. Bauer and John S. Stanley
Midwest Research Institute
425 Volker Boulevard
Kansas City, MO 64110
Janet Remmers, Work Assignment Manager
John Schwemberger, Work Assignment Manager
Dr. Joseph J. Breen, Project Officer
U.S. Environmental Protection Agency
Office of Toxic Substances
Exposure Evaluation Division
401 M Street, SW
Washington, D.C. 20460
EPA Contract No. 68-02-4252
MRI Project No. 8863-A
Work Assignment No. 25
September 17,1990
U.S. Environmental Protection Agency
Region 5, Library pi-i-j)
-------
DISCLAIMER
This document has been reviewed and approved for publication by the Office of
Environmental Epidemiology, Department of Veterans Affairs (VA), and the
Office of Pesticides and Toxic Substances, U.S. Environmental Protection
Agency (EPA). The use of trade names or commercial products does not
constitute VA or EPA endorsement or recommendation for use.
-------
PREFACE
This report details the results from a multivariate statistical
analysis of the PCDD and PCDF data from the collaborative effort between the
Department of Veterans Affairs (VA), Office of Environmental Epidemiology, and
the U.S. Environmental Protection Agency, Office of Toxic Substances. The
data base was provided by Midwest Research Institute (MRI) through EPA
Contract No. 68-02-4252, Work Assignment No. 24. The chemical analyses of the
human adipose tissue samples were conducted by MRI under the VA and EPA
intergovernmental agency agreement (VA Reference No. V101(91)-P86009, EPA
Reference No. RW36930735-01-3).
The statistical analyses were performed by MRI through the above-
mentioned intergovernmental agency agreement under the direction of
Ms. Karin M. Bauer, Principal Statistician, for EPA's Office of Toxic
Substances, Exposure Evaluation Division (Contract No. 68-02-4252, Work
Assignment No. 25). Throughout the project, Dr. John S. Stanley, Head,
Analytical Chemistry Section, provided guidance and consulting on chemical
matters. Ms. Jean E. Pelkey, Associate Statistician, assisted in generating
figures and tables for this report.
The assistance provided by the EPA Work Assignment Managers,
Ms. Janet Remmers, Mr. Bradley Schultz, and Mr. John Schwemberger, as well as
the additional demographic and military/civilian status information on the
individuals provided by Dr. Han Kang and Mr. Kevin Watanabe of the VA, are
gratefully acknowledged.
MIDWEST RESEARCH INSTITUTE
Karin M. Bauer
Principal Statistician
Work Assignment Leader
Approved:
J
Paul C. Constant
Program Manager
September 17, 1990
iii
-------
EXECUTIVE SUMMARY
The objective of this analysis was to evaluate, by pattern recogni-
tion techniques, relationships of chlorinated dibenzo-p-dioxin and
dibenzofuran levels in human adipose tissue among the analytes themselves and
with respect to demographic variables (military/civilian status, age,
collection year, race, geographic region). The conclusions of this study
provide a multivariate context for the data and strengthen the conclusions
obtained in the univariate study (on 2,3,7,8-TCDD) conducted by the Department
of Veterans Affairs (VA) and the U.S. Environmental Protection Agency (EPA).
The result of most interest, based on a set of matched NHATS specimens (36
Vietnam-veterans, 79 non-Vietnam veterans, and 80 civilians), is that Vietnam
veterans in this study do not exhibit higher levels of 2,3,7,8-TCDD, PCDDs,
and PCDFs than the other two groups. Also, cluster analysis of the PCDDs and
PCDFs demonstrates that 2,3,7,8-TCDD is not strongly correlated with any of
the other compounds, indicating possible differences in exposure routes to
these chemicals.
The primary focus of the VA/EPA collaborative study was on 2,3,7,8-
tetrachlorodibenzo-p-dioxin (TCDD), a contaminant of the herbicide Agent
Orange used in Vietnam. However, the analytical protocol used in that study
provided data on all 17 of the 2,3,7,8-substituted chlorinated dioxin and
furan congeners. These chemical data, together with the available demographic
and military service status information, were the basis of the present
study. Pattern recognition techniques (cluster and principal component
analyses) were used to evaluate relationships and their significance within
the data. The intent of this study was not to duplicate the previous analysis
efforts (VA and USEPA, 1989), but rather to complement them by considering all
PCDD and PCDF residue levels and demographic information simultaneously.
The tissue samples considered in this study were obtained from a
large, archived set of specimens obtained through the National Human Adipose
Tissue Survey (NHATS). The availability of military service status on males
born between 1936 and 1954 was the basis for the selection of specimens in the
three groups—Vietnam veterans, non-Vietnam veterans, and civilians-
considered in this study. To the extent that the NHATS population is not
representative of Vietnam veterans, the results of this study are not
representative of the population of Vietnam veterans. The relatively small
sample size (36) of Vietnam veterans would limit the significance of results
relating to TCDD levels in the population of Vietnam veterans in any case.
However, within the restricted NHATS population, the comparative results are
valid.
The most important outcome from the analyses performed in the
present study is that there is no visible stratification of the NHATS speci-
mens in groups as defined by their military service or civilian status with
respect to either the PCDD and PCDF levels or the demographic variables.
From the univariate analysis results obtained during the current
study, which parallel those obtained by the VA, the following general remarks
can be highlighted.
-------
The levels of 2,3,7,8-TCDD are, for all practical purposes, independent
of the levels of all other analytes quantified in this set of NHATS
specimens. As subsequently confirmed by the multivariate analyses
results, 2,3,7,8-TCDD behaves unlike any of the other analytes; that is,
its level in adipose tissue cannot be predicted accurately by the levels
of any other analyte or analyte combinations.
The levels of 2,3,7,8-TCDD decrease with successive collection years, at
the rate of 0.63 pg/g per year of collection. Due to the very low
predictive power of that variable, however, the general downward trend of
2,3,7,8-TCDD levels cannot be predicted with any degree of certainty by
the collection year.
Based on year of birth, the levels of 2,3,7,8-TCDD decrease with
successive year, at the rate of 0.04 pg/g per year. Again, due to the
very low power of this variable in predicting the concentration levels in
adipose tissue, no strong conclusion can be drawn as to the effect of
this variable on 2,3,7,8-TCDD levels.
The most interesting chemical results were obtained through
multivariate statistical analyses, which were the focus of the present
study. The objective of cluster analysis, performed on the log-transformed
concentration levels of the analytes, was to investigate interrelationships of
the PCDD and PCDF analytes. The following points highlight the cluster
analysis results:
2,3,7,8-TCDD does not group with any of the other analytes—its behavior
is unlike the behavior of the others.
Apart from 2,3,7,8-TCDD and those analytes with relatively high
frequencies of nondetects, the levels of the higher chlorinated PCDDs and
PCDFs are strongly interrelated. All correlations are high and positive,
that is, the levels of these analytes are proportional to each other.
Principal component analysis focuses on representing data in a space
with preferably no more than three dimensions. These dimensions are defined
by linear combinations (principal components) of the variables included in the
analysis. The objective of principal component analysis is to detect a
structure in the data base, if it exists, that could possibly distinguish
between meaningful groups of specimens. This is accomplished by using all
data related to the chemical (the concentration levels were log-transformed),
the demographic variables, and the military service/civilian status. After
applying this technique to all the data available in this data base, the
following conclusions can be drawn:
• None of the analytes, demographic variables, or military service/civilian
status information provided any means of classifying the NHATS specimens
into more than one large group. That is, no pattern-distinguishing—for
example, between Vietnam veterans, non-Vietnam veterans, and civilians--
was visible on a plot of the specimen scores along the most important
principal components.
-------
A large proportion (77%) of the variability in the total data set, as
determined by the values of the NHATS specimens on all the variables, is
explained by the combination of two principal components. The first and
predominant principal component (explaining 64% of the total variability)
is a combination of the higher chlorinated PCDDs and PCDFs, each analyte
having about equal contribution to this principal component. This is
consistent with the results of the cluster analysis. The second, and by
far less important principal component (13% of the total variability
explained), consists mainly of age of specimen at time of death or
surgery and specimen collection year.
The results of these additional statistical analysis efforts provide
evidence of interrelationships of PCDDs and PCDFs that could be possible
indicators of sources of exposure. That is, the source of 2,3,7,8-TCDD may be
completely different from those of the other PCDDs and PCDFs. Further
statistical evaluations of body burden data may require more comprehensive
data bases that identify body burden half-life of the higher chlorinated PCDDs
and PCDFs vs. 2,3,7,8-TCDD, as well as the potential contributions arising
from commercial products and environmental matrices (e.g., incinerator
emissions, ambient air, water, fish).
In summary, the plots of the two most important principal
components, the "Higher Chlorinated PCDDs and PCDFs" and the "Age Plus
Collection Year" components, explain about 80% of the total variation in the
data set. However, these plots do not reveal any clear stratification of the
specimens into groups, whether defined by military service/civilian status of
the individual, geographic region, race, or age. This study indicates, based
on the set of matched NHATS specimens, that the group of 36 Vietnam veterans
in this study does not exhibit higher levels of PCDDs and PCDFs, including
2,3,7,8-TCDD, than either the group of 79 non-Vietnam veterans or the control
group of 80 civilians.
-------
II. Data Base Description
TABLE OF CONTENTS
Preface [[[ i i i
Executi ve Summary [[[ v
List of Figures [[[ x
List of Tables [[[ xi
I . Introduction ................................................ 1
A. Project Background ................................ 1
B. Study Objectives .................................. 2
A. Chemical Data Base ................................ 3
B. Demographic Data Base ............................. 4
C. Military/Civilian Status Data Base ................ 4
III. Descriptive Statistics ...................................... 5
A. Descriptive Statistics of the Chemical Data ....... 5
B. Descriptive Statistics of the Demographic Data
by Military/Civilian Status ..................... 10
C. Relationships Between 2,3,7,8-TCDD and
Selected Demographic Variables .................. 13
IV. Multivariate Statistical Methods ............................ 17
A. Cluster Analysis of Variables ..................... 17
B. Principal Component Analysis ...................... 18
V. Multivariate Analysis Results and Discussion ................ 20
A. Cluster Analysis of Analytes ...................... 20
-------
LIST OF FIGURES
Number Title Page
1 Plot of 2,3,7,8-TCDD vs. age of individual at time of
col lection , 15
2 Plot of 2,3,7,8-TCDD concentrations vs. collection year 15
3 Plot of 2,3,7,8-TCDD concentrations vs. birth year of
individual 16
4 Plot of 2,3,7,8-TCDD concentrations vs. body mass index 16
5 Tree diagram showing clustering process of all 16 analytes
based on the correlations of the log-transformed
concentrations 22
6 Cluster analysis results of all 16 analytes based on the
correlations of the log-transformed concentrations 26
7 Plot of principal component 2 vs. 1—all 195 specimens
coded by military/civilian status 33
8 Plot of principal component 3 vs. 1—all 195 specimens
coded by military/civilian status 34
9 Plot of principal component 3 vs. 2—all 195 specimens
coded by military/civilian status 35
10 Plot of principal component 2 vs. 1—all 195 specimens
coded by geographic region 36
11 Plot of principal component 3 vs. 1—all 195 specimens
coded by geographic region 37
12 Plot of principal component 3 vs. 2—all 195 specimens
coded by geographic region 38
13 Plot of principal component 2 vs. I—all 195 specimens
coded by race 39
14 Plot of principal component 3 vs. 1—all 195 specimens
coded by race 40
15 Plot of principal component 3 vs. 2—all 195 specimens
coded by race 41
-------
LIST OF TABLES
Number Title Page
1 Number and Percent of Nondetects, Traces, and Positive
Quantifiable Levels per Analyte in 197 NHATS Specimens.... 6
2 Descriptive Concentration (pg/g) Statistics per Analyte
Based on 197 NHATS Specimens 8
3 Correlation Coefficients Between Pairs of Analyte
Concentrations Based on 197 NHATS Specimens 9
4 Distribution of 195 NHATS Specimens Across Military/
Civilian Status, Geographic Regions, and Race Groups 11
5 Descriptive Statistics of Selected Demographic Variables.... 12
6 Simple Regression Results of 2,3,7,8-TCDD Concentrations
vs. Selected Demographic Variables 14
7 Summary Table of Analyte Clusters Formed and Their
Similarity Values 23
8 Principal Component Analysis Results 28
XI
-------
I. INTRODUCTION
This report presents the results and interpretation of a series of
multivariate statistical analyses performed on a data base of levels of
polychlorinated dibenzo-p-dioxins (PCDDs) and dibenzofurans (PCDFs) in human
adipose tissue. The objective of this work assignment effort is to provide a
means of evaluating the relationships between, and significance of, chemical
and demographic variables and military/civilian status of the individuals
represented in this data base. This work is to provide a complementary
examination of the previous univariate analyses performed by the VA and USEPA
(1989).
A. Project Background
Under a joint Department of Veterans Affairs/U.S. Environmental
Protection Agency agreement, Midwest Research Institute (MRI) has analyzed
approximately 200 human adipose tissue specimens for PCDDs and PCDFs. These
tissue samples were obtained from a large, archived, set of specimens obtained
through the National Human Adipose Tissue Survey (NHATS). The availability of
military service status on males born between 1936 and 1954 was the basis for
the selection of specimens in the three groups—Vietnam veterans, non-Vietnam
veterans, and civilians—considered in this study (refer to the VA and USEPA
1989 report for selection procedure details). To the extent that the NHATS
population is not representative of Vietnam veterans, the results are not
representative of TCDD levels in the population of Vietnam veterans. The
relatively small sample size (36) of Vietnam veterans would limit the
significance of results relating to TCDD levels in the population of Vietnam
veterans in any case. However, within the restricted NHATS population, the
comparative results are valid.
The analytical results, expressed in parts per trillion (ppt), have
been reported on a total of 17 selected compounds. These are the 2,3,7,8-
substituted tetra- through octachloro-PCDD and PCDF congeners, which are
considered to be the most toxic. Because two analytes, 1,2,3,4,7,8- and
1,2,3,6,7,8-HxCDD, coeluted, data were available on a total of 16 analytes.
The complete list of analytes follows.
1. 2,3,7,8-TCDF
2. 2,3,7,8-TCDD
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
1,2,3,7,8-PeCDF
2,3,4,7,8-PeCDF
1,2,3,7,8-PeCDD
1,2,3,4,7,8-HxCDF
1,2,3,6,7,8-HxCDF
2,3,4,6,7,8-HxCDF
1,2,3,7,8,9-HxCDD
1,2,3,4,7,8/1,2,3,6,7,8-HxCDD,
1,2,3,7,8,9-HxCDD
1,2,3,4,6,7,8-HpCDF
1,2,3,4,7,8,9-HpCDF
1,2,3,4,6,7,8-HpCDD
OCDF
OCDD
referred to as the pair-HxCDD
-------
The data base was generated by MRI under Work Assignment 24 (EPA
Contract No. 68-02-4252) using an analytical method (USEPA. 1986) that had
been previously evaluated specifically for the VA/EPA program. All data were
reported to VA/EPA in the form of 20 batch reports. The VA was responsible
for analyzing the 2,3,7,8-TCDD data, the key compound with respect to Vietnam
veteran exposure (VA and USEPA. 1989).
Since the data had been reported for all compounds, some effort was
needed for the assessment of relationships between compounds and with respect
to demographic data. Pattern recognition (multivariate statistical analysis)
was proposed as an exploratory analysis tool. This approach, investigation of
potential multidimensional relationships among the analyte results, was to be
followed by classification of the subjects into groups, if groups could be
identified. Thus the VA provided demographic information on these subjects.
After all statistical analyses were completed, the military/civilian status
(whether Vietnam veterans, non-Vietnam veterans, or civilians) information on
the donors was made available so that possible groupings of the results could
be identified that would parallel the three groups of donors.
B. Study Objectives
The objectives of this study were to examine the chemical and demo-
graphic data obtained on approximately 200 subjects using multivariate sta-
tistical techniques or pattern recognition. The relationships so elucidated
among chemical and demographic variables, or among chemical variables alone,
may be useful in answering pertinent questions such as, "Is there a difference
among the exposure groups in adipose tissue levels of 2,3,7,8-TCDD?" and "If
there is a difference, is there evidence that it is related to exposure other
than Agent Orange?" The intent of this study was not to duplicate the VA's
efforts, but rather to complement them by looking at all chemical data (PCDDs
and PCDFs) and demographic variables simultaneously.
Since military information as well as the classification of the some
200 subjects into the three groups—Vietnam veterans, non-Vietnam veterans,
and civilians—was kept confidential to the EPA and the VA until the end of
the statistical analysis, MRI proposed performing first a "blind" statistical
analysis of the available data and providing the EPA and VA with a possible
classification of the subjects determined solely on the basis of chemical and
demographic information. Groups identified in the pattern recognition
approach would be identified by listing specimen identification numbers. The
VA/EPA would then decode the results and determine whether any grouping of
subjects would match exposure groups or any other groups defined by military-
related data.
The approach taken to meet these study objectives was as follows.
First, to elucidate relationships among the 16 analytes based on the sample
concentrations, the approach focused on reducing the number of analytes by
analyzing the correlation among them. Next, the chemical data, possibly
considering a reduced set of analytes, and the demographic data were analyzed
to identify groups among the individuals in the data set. Cluster analysis
and principal component analysis were the two most appropriate techniques for
this approach. Finally, the military/civilian status information was used in
-------
the coding of the multivariate analysis results with respect to the three
groups.
This report presents first a description of the chemical,
demographic, and military/civilian status data bases (Section II). Next,
basic descriptive statistics for each measured variable are presented and
discussed (Section III). Section IV is a synopsis of each statistical proce-
dure used for the analyses. In Section V, the results of each analysis are
presented and discussed. The Appendix contains histograms of the concentra-
tion levels of each of the 16 analytes for which data were available.
II. DATA BASE DESCRIPTION
Three data bases were made available for this work assignment
effort. The first and largest data base contained chemical information (PCDD
and PCDF data) only, the second data base contained demographic information on
the subjects, and the third, which was provided after all other analyses were
completed, provided the military/civilian status information on each
individual.
A. Chemical Data Base
The data base used had been generated previously by MRI and was
transferred on a floppy diskette for subsequent statistical analysis. Thus no
manual data entry was required. Accuracy of the data base was checked by
means of various frequency checks, random record comparisons with the batch
reports provided to the EPA and VA, and other logical checks to determine
consistency throughout the data base.
The data base contained information on a total of 274 samples. The
total consisted of 197 NHATS samples, 20 method blanks, and 57 quality assur-
ance samples. For each sample, the following information was available:
1. WID, an encoded sample identifier;
2. MRI lab number;
3. Sample type;
4. Batch number;
5. Low resolution mass spec run date (PCDDs and PCDFs);
6. High resolution mass spec run date (2,3,7,8-TCDD and 2,3,7,8-
TCDF);
7. Specimen wet weight (g); and
8. Percent lipid of specimen.
-------
For the 16 analytes, the following three variables were listed:
9. Data qualifier (NO = not detected, TR = trace, PQ = positive
quantifiable);
10. Limit of detection (pg/g) if the data qualifier was either "ND"
or "TR"; and
11. Concentration if the data qualifier was either "TR" or "PQ."
All concentrations were percent lipid based and expressed in
picograms per gram (pg/g) or equivalently in parts per trillion
(ppt).
B. Demographic Data Base
The data base containing selected demographic information on the
subjects was provided by the VA. These data were transferred to MRI in the
form of an ASCII file on a floppy diskette. For each subject, the following
information was given:
1. WID, to match the records across data bases;
2. Geographic region (North Central, North East, South, or West)
determined by the location of the individual's hospital;
3. Race of individual (Caucasian, Black, Oriental, American
Indian, or Other);
4. Age of individual (years) at time of specimen collection (i.e.,
at time of death or surgery)
5. Specimen collection year;
6. Year of birth; and
7. Body mass index (kg/m2).
C. Military/Civilian Status Data Base
After all statistical analysis activities were concluded, a data
base including the military/civilian status of each individual in the study
was provided by the VA and EPA. This data base, which was provided to MRI in
the form of an ASCII file on floppy diskette, included two variables:
1. WID to match the three data bases; and
2. Military/civilian status code (Vietnam veteran, non-Vietnam
veteran, civilian).
The data base contained information on 195 of the 197 NHATS specimens
(36 Vietnam veterans, 79 non-Vietnam veterans, and 80 civilians). In some
-------
cases, demographic and military/civilian status information was not available
for all specimens analyzed. These differences in sample sizes are reflected
in subsequent tables.
III. DESCRIPTIVE STATISTICS
The analysis results presented and discussed in this report are
based solely on the 197 NHATS specimens. Analysis of the quality assurance
sample results (method blanks, duplicates, performance audit samples, split
samples, and spiked and unspiked QC samples) is not reported. A statistical
analysis of these sample results could be undertaken in a future task and
incorporated with the results reported herein as a means of: (a) validating
relationships, if they exist, among the 197 samples and the chemical vari-
ables; and (b) analyzing the QA/QC results in a multivariate context. Since
the data analysis techniques used here are variance based, it would be useful
to explicitly examine the fraction of the total variance that may be attri-
buted to the analytical/sampling procedures. Knowledge of these variance
sources would add considerable strength to the inferences drawn about
distinctions or lack thereof among the sample and control populations.
The following subsections present basic descriptive, i.e., uni-
variate and bivariate, statistics for the chemical and demographic data.
A. Descriptive Statistics of the Chemical Data
Chemical information on a total of 197 samples was available for
analysis. For each of the samples, lipid-based concentrations (pg/g) on each
of the selected 16 analytes was given if the analyte was detected. When the
analyte was not detected, the limit of detection was reported (also equivalent
to lipid concentration). Throughout all analyses, half the limit of detection
value was used for those samples with levels below detection limits. This
approach was taken to conform with the approach chosen by VA statisticians.
When the analyte was detected at a trace level, that is, between the limit of
detection and the limit of quantisation, the reported concentration was used
in the analyses.
1. Univariate Statistics
As a first step, percentages of nondetects (ND), traces (TR), and
positive quantifiable (PQ) levels were summarized for each analyte. The
results, both in frequency counts and in percentages, are shown in Table 1.
An upper limit of 25% for the total number of specimens with levels below
detection limit for a given analyte (i.e., at least 75% PQ+TR) was selected as
the cutoff point for that analyte to be included in subsequent multivariate
data analyses. By that rule, a total of five compounds were excluded from
analyses. They were: 2,3,7,8-TCDF (35.0% NDs); 1,2,3,7,8-PeCDF (79.2% NDs);
1,2,3,7,8,9-HxCDF (95.4% NDs); 1,2,3,4,7,8,9-HpCDF (53.8% NDs); and OCDF
(27.9% NDs).
-------
Table 1. Number and Percent of Nondetects, Traces,
and Positive Quantifiable Levels per Analyte
in 197 NHATS Specimens
Analyte Name/No.
2378-TCDF 1 *
2378-TCDD 2
12378-PeCDF 3*
23478-PeCDF 4
12378-PeCDD 5
123478-HxCDF 6
123678-HxCDF 7
234678-HxCDF 8
123789-HxCDF 9*
1 23478/1 23678-HxCDD 10
123789-HxCDD 11
" • .' :::•: • -.- - :-• \--';: .
1234678-HpCDF 12
1234789-HpCDF 13*
1234678-HpCDD 14
OCDF 15"
OCDD 16
ND
69
1
.' .-!'':'£ :'.>•.-. : . ; .
156
2
1
2
5
19
188
2
106
55
Number
TR
20
2
:' . : -Hi.- ;;;;•';;;:•
23
1
1
1
19
7
2
• ' ' " ':'
49
64
PQ
108
194
••:•• '":' •
18
195
195
194
191
159
2
197
193
197
42
197
78
197
ND
35.0
0.5
79.2
1.0
0.5
1.0
2.5
9.6
95.4
1.0
• . . :"'\ !'':../..
53.8
27.9
Percent
TR
10.2
1.0
11.7
0.5
0.5
0.5
9.6
3.6
1.0
:•.•/• :":' '
24.9
32.5
PQ
54.8
98.5
i
9.1
99.0
99.0
98.5
97.0
80.7
1.0
100
98.0
'•'-Viv-i..:,..: '. ;
100
21.3
100
39.6
100
Note: Analytes marked by an * were excluded from some subsequent
statistical analyses due to high occurrences of nondetects
ND = Nondetect—level below detection limit
TR = Trace—level between detection and quantitation limits
PQ = Positive Quantifiable—level above quantitation limit
-------
The analyte with the largest percentage of nondetects (9.6%)
included in the analyses is 2,3,4,6,7,8-HxCDF. All other analytes retained
for subsequent analyses have percentages of nondetects below 3%.
The concentration data are summarized in Figures 1 through 16 in the
Appendix. These plots show histograms of the concentration levels found in
the 197 NHATS specimens for each of the 16 analytes.
The histograms show evidence of relatively skewed distributions of
analyte levels in the adipose tissue specimens of the study subjects. For
each analyte, most of the levels are near the lower end of the scale, with a
few relatively large values in each case. These distributions could well be
represented by log-normal distributions, and this fact is considered in the
discussion of cluster analysis in Section V.A. This approach has also been
discussed by Sielken (1987) for TCDDs in human adipose tissue in particular.
Basic descriptive statistics of the analytical results are sum-
marized in Table 2. For each analyte, mean, standard deviation, minimum, 25th
percentile, median or 50th percentile, 75th percentile, and maximum of the
analyte levels across the 197 specimens have been computed. All statistics
are in picograms per gram (or ppt). For levels below detection limits, half
the detection limit was used for the computations.
For each analyte, the median is lower than the mean, which confirms
the positive skewness of the distributions mentioned earlier. This can also
be observed in the values of the 25th and 75th percentiles for each analyte.
These values are such that 50% of the specimens have concentration levels
between these two numbers; 25% of the specimens have concentrations above the
75th percentile; and 25% of the specimens have concentrations below the 25th
percentile.
For example, 50% of the specimens have 2,3,7,8-TCDD levels between
7.8 and 16.5 pg/g. The median level is 11.5 pg/g. The average concentration
of 14.1 pg/g, however, is influenced by one high value of 106 pg/g. The
second highest 2,3,7,8-TCDD value is 55.8 pg/g and is from a control speci-
men. (It should be noted that the extreme 2,3,7,8-TCDD level of 106 pg/g also
is from a control subject. Most of this subject's analyte levels are in the
upper 25th percentile.) At a more refined level, 10% of the samples have
2,3,7,8-TCDD levels above 25 pg/g, and 5% of the sample concentrations exceed
33.4 pg/g (not shown in Table 2). On the low side of the spectrum, 10% of the
specimens have levels below 6.1 pg/g, and 5% of the sample concentrations fall
below 5 pg/g (not shown in Table 2). The histograms in the Appendix depict
this situation best for each analyte.
2. Bivariate Statistics
Correlation coefficients between each pair of analytes were
computed. All correlations are positive, and with a few exceptions, most are
statistically significantly different from zero at the 99% confidence level.
Due to the large number of specimens (197), even moderately small correlations
will be statistically significant. For example, a correlation coefficient of
0.18 is significant at the 99% level for a sample size of 200. Table 3 lists
-------
3C ^
^Q. Q)
O T-
CVJ
_2
I
E
3
9
5
JE C
Q)
Q.
c
,ss
0)
i
£ C
K g
CM £
0)
Q.
|
E
c
~o c
C 'C
£ 0)
CO Q
c
cd
0)
5
yte Name/No.
1
^
i- O
oo to
i- O
CM in
<°. to
CM T-
to m
M; ,_:
|v CO
"": N
o
co in
T CM
O
oo cn
t 0
CM T-
O T-
CM T-
•>- CM
U. Q
go co
CO
CM CM
T- tO CO
9 CM cri
|v i- CO
O i- CO
9 to d
O CM CM
CO T- CO
m. 6 to
0 CM T-
O N tO
^ * CM
O T- T-
00 O 00
9 CM CM
O
rv o T-
°° in CM
0 T- 1-
o o cn
t O ^jj _ ^
. CO CO
O CM i-
co ^f m
12378-PeCDF
23478-PeCDF
12378-PeCDD
o o i- co m o
CM in to CM T- CM
O CO CM CO O
r- CO i-
r- co co to in in
in T^ co m. r^ cri
CM T- 0 rv r-
O CM fv CO O i-
oci cri CM' ^ to in
T- O CO T-
N. iv cn m o oq
cvi to T^ T •* T^
r- O T- Y-
'r"
r^ ^ co ^ co co
CM T^ C> 9 T-^ CM
O CO
T- CM |v CO h» CO
^ CO CM m. tO r^
T- 0 0 T-
T—
^ cn co co cn o
T-^ C> CO m. CO CO
CM i- O CO T-
CO |v 00 O) O T-
LL LL LL LL Q Q
Q D Q Q Q Q
o o o o o o
X X X X x X
oo pa op cn co cn
r^ N- r» 00 |v CO
•* tO tO |v CO lv
CO CO ^ CO CO CO
CM CM CO CM CM CM
i- T- CM T- T- T-
s
CO
CM
T—
i
:
-
•* IV |V
£ cri »
i- CM
h* cn iv
CO ^ Tt
^ T- O
CO
o CM cn
cri 9 *'
CM T- T-
CM
N in oj
6 «? N
CM O •*
'"
in o T-
iri ^ cri
o in
to CM •*•
in T^ 06
CM Tf
CM
in ^- '*
co r**
CM
CVJ CO 'S'
234678-HpCDF '
234789-HpCDF '
234678-HpCDD '
1~
:
CM Tfr
CM °°.
00
CO CM
CM T-
CO "*.
|v O
* CO
T- cn
o
cn CM
9 CM
T- CO
's-
T- O
m. cri
O 00
CM
T- CO
^ 0
*~
T- 00
i.
^—
in co
P
0 0
CO
4-^
o
£
I
o
c
o
'€
fi
Q)
T3
CO
O
O)
T3
_
_o
s
a?
CO
fl
CO
fi
o
-------
V)
c
O
'3
CO
I
*
in
M-
*
CO
CM
^
T-
T~
O
o>
00
N
CO
m
TJ-
«
CO
CM
*
| Analyte Name/No.
y
11
*.;.::5i
t- CM
U- Q
Q Q
Is?
CO CO
N iv
CO CO
CM CM
. ,.">"-.
1^
m »
R;-;.: iv
^ •;'••; °
o o
in in
d d
N
co
d
co •* •»
12378-PeCDF
23478-PeCDF
12378-PeCDD
•s.-'j: : :
0)
t*.
d
S
d
CM
CD
d
CO
| 123478-HxCDF
$
d
o>
|s«
d
Cft
CO
d
in
in
d
IV
| 123678-HxCDF
;:\ !
.' ':*.*'':•'•:
CO
CO
d
CM
CO
d
CD
CD
d
R
d
co ;
234678-HxCDF
Hi
•''•A
CD
fv
d
s
d
00
CO
d
CM
o>
d
CM
CO
d
CO
in
d
-n °
TiJa/BS-MXUUh
1 23478/1 23678-HxCDD
iv!>:W£
en
d
m
00
d
CO
CO
d
|v
00
d
CD
|v
d
|v
in
d
T-
| 123789-HxCDD
CO
d
R
6
£
d
Y—
CO
d
CO
d
d
|v
CO
d
s
d
CM
| 1234678-HpCDF '
i—;¥
il:
''•„, J
CO
CO
d
s
d
fe
d
s
d
IS
d
!v
d
0)
in
d
CO
d
«
CO
1234789-HpCDF 1
'•f:\:-:-'
'£*•<$
f"x&J[
s
d
§
d
00
d
S
d
CO
en
d
R
d
CO
d
m
CD
d
CO
CO
d
s
d
U)
d
•f
1234678-HpCDD 1
en
CO
d
R
d
3
d
en
CO
d
£
d
CM
CO
d
fv
CO
d
§
d
CO
fs»
d
CO
|v
d
^~
CD
d
In
co
0>
o
e
6
-------
all correlation coefficients between pairs of analyte concentrations that are
greater than or equal to 0.50. The cutoff value of 0.50 for the correlation
coefficients was selected for the following reason. The correlation coeffi-
cient, R, between two analytes, X and Y, is a measure of the linear dependency
between X and Y. In the context of a linear regression of Y on X, R2, the
square of the correlation coefficient, will be the proportion of total varia-
tion about the mean of Y explained by the regression of Y on X. Thus, setting
the cutoff for R at 0.50 is equivalent to setting the cutoff for R2 at 25%.
For all practical purposes, anything below that figure can be considered
negligible in terms of linear association and predictive power.
Of the 120 possible pairs of analyte concentrations, slightly more
than half (66) yield correlation coefficients above 0.50. Of these, about
half (29) are above 0.80, with 1,2,3,4,7,8-HxCDF and 1,2,3,6,7,8-HxCDF being
the most highly correlated pair of analytes with a coefficient of 0.95.
Of the correlations above 0.50, 2,3,7,8-TCDD shows overall the
lowest correlation with most other analytes. Using 0.80 as a cutoff value,
one could say that 2,3,7,8-TCDD is not strongly correlated with any of the
other analytes.
B. Descriptive Statistics of the Demographic Data by Military/Civilian
Status
A set of demographic variables was made available by the VA to be
incorporated into the analysis. The variables were geographic region, race,
age, collection year, year of birth, and body mass index.
Demographic data were not available for all 197 NHATS samples. Most
data were available for 196 subjects; age, race, and geographic region were
missing for two of the 197 subjects; and body mass index was available for
only 169 subjects. Military/civilian status was reported on 195 of the
197 specimens in the study.
Table 4 provides a cross-tabulation of the 195 specimens across
military/civilian status, geographic regions, and race groups. Eighty percent
of the individuals were Caucasian, and the remaining 20% were distributed
across four race groups. Consequently, the classification of individuals as
to race was simplified in subsequent analyses to two groups—Caucasian and
non-Caucasian.
Table 5 summarizes basic descriptive statistics for the demographic
variables—number of specimens, mean, standard deviation, minimum and maximum
values, and coefficient of variation. The statistics in Table 5 are shown
across all specimens (varying sample sizes) as well as within each category
determined by the military/civilian status of the individual. The careful
choice of matched NHATS specimens across the three groups can be clearly seen
in this table by noting the small differences in these statistics.
10
-------
(0
CO
o
"^
03
a
.il
u
C75 O
CO '5)
i_ Q)
< cc
I o
Z Z
o
rn Cd
°
O 0
.1
*-<
.0
6
Q
CD
U)
O)
T-
M—
r °
78*
*-
o i_
"1
Z
Q)
£
s
C
co c
i!
E -E
3
.2
6
o
CD
CO
'55
1
C§
o
if
C
:E w
(J 3
i
CO i- |v T- •
co co N." ^r CO
T—
(V CD « 00 £2
o o o o O
o o o o O
o o o o O
i- CM in T- O)
<° * ° - W
1 =
£ to <
<3|S^
o o o 4J
z z co 5S
£
CO
C
s
>
CO
.g>
>
oq ^ en ^ m_
° CO ^ CO 0
T- iv m „, C35
CM T— CO pv
O i- O O l—
O O O i- T—
o o o o O
CM O CO O CO
en co CT> in O}
T- i- CM "^ CO
2 =
SCO
UJ
•=-=£«
t t? ^ w
ooo.®
Z Z CO ^
C
fi
"§
^
1
.2
1
o
z
CO CO Tt ; ,,_ O
w 5 ^ <* ^
^- IV ^ „ O
i~ CM CO CO
O O O i- T—
O O i- O T-
0 0 T- 0 T-
CM in o> i- ^"
« si a « S
2 =
E to <
0 CO
O UJ
0 0 0 5
Z Z CO £
1
'5
O
in
0)
CVJ
CM
o
T~
0
in
"~ 6
CO
CO
in
li
Z
CO
C
CD
'o
2L
co
^M
^
^
N
^~
O
CD
00
C
CD
CD
Q.
w
0)
'5
I
o>
CD
£
"5
CM
u.
£
Q)
_cg
1
P
co
.o
Q.
2
0
Q
8
o
11
-------
CO
_0)
•§
I
Q.
2
O)
O
0)
Q
T3
0
(0
o
1
to
CO
CD
Q
in
'o .SS *S*
£E '5 «?
0 >
0 •&
E
3
'x
cd
2
E
1
1
•o c
co -2
__ 4-1
^ co
C -5
« $
CO Q
co
0)
_ 1
1*1
CO
CO
0>
.Q
•"•
CO
^^
^^
0
£
Q.
2
O)
o
0)
Q
in CO i-; CO
CD in N CD
in T- co in
O CM O i-
CM CM CM CM
co cn co CM
m t in m
i- CM CO O
CM i^ CM CM
CO CO CO CO
S CD 0> 0
*2 oo N co
co co co
c c c
CO CO CO
0 0 ^
^ -^ o
^ <*^
E E
co co
£ £
_0 _0
^ ^
^ c
eo o
&
<
oo o cn oo
CO ^" CO CO
CM CM CM CM
00 00 00 00
cn cn cn cn
1— i T 1
j- w j- w
IN. IN. |N. IN.
cn cn cn cn
O i- O O
CO CO CO CO
* oo cn oo
IN. cn cn cn
cn i-i-i-
? S^i
CO CO CO
co cd cd
k— ^ ^^
03 03 *c
03 03 /^
•^ *^ \J
^ ^
E E
CO CO
£ £
u. 00
CO NT NT
o> •* -,
>- t=
§ *
1
5
0 IV ^ T-
cn co ° oo
SCM ^ i-
m in in
cn cn cn cn
co cn co IN.
CO CO CO CO
O) O3 O> O>
1~ 1— 00 fN,
^" CO ^ CO
T—
* [N. in co
CO ^" ^ ^
TJ cn cn cn
cn i—1—i—
T"
1C co cn o
nj 3T Jr SL
Cv ^^ CO
CO CO CO
c c c
cO co co
Vv kv ••••
03 03 '^
+5 *3 >
03 03 /^
^ *> ^
^ ^^
E E
CO CO
£ £
.2 .2
^ ^
c
l_ O
8 z
^
•c
m
co cn CM •t
T-^ CJ) •*' IN!
CM i- i- CM
00 Is; i-; 00
IN. O CD PN.
CO ^t CO CD
IN. t- ^ (N,
IN! Cn CD (v
i- i- CM i-
^N co fN. in
in m co IN.
cq rN. c> co
CO CD CD (N!
CM CM CM CM
2 t cn co
to co co co
*-?• CO CO CO
OJ c c c
« CO CO CO
*c 22 22 f
•S •> •> O
O) *^ *^
^ E E
^ CO CO
M £ -^
HB 03 03
U !^ 1^
C ^ ^
^ C
CO Z
^3
CQ
co
jj)
I
I
0
£
0
c
CO
£
£
'•o
0
E
CO
u
'S
1
CO
12
-------
C. Relationships Between 2,3.7.8-TCDD and Selected Demographic
Variables
The levels of 2,3,7,8-TCDD, the most toxic dioxin and the pre-
dominant contaminant of Agent Orange, were examined more closely than other
congeners in relationship to a few demographic variables. Relationships were
analyzed by means of simple regression analysis, that is, with one independent
variable at a time. Both the concentration levels and the natural logarithm
of the concentrations were regressed against age, collection year, birth year,
and body mass index. These regression analyses were also performed after
excluding the one specimen with the highest 2,3,7,8-TCDD level (106 pg/g).
The second highest level was 55.8 pg/g. All regression results, with and
without log-transformation and with and without the 2,3,7,8-TCDD extreme
value, are shown in Table 6.
No strong linear relationships of 2,3,7,8-TCDD levels emerged with
any of these variables. Some regression models were significant at the 5%
significance level, that is, the slopes were significantly different from 0 at
the 5% level; however, the extremely low values of R-squared, a measure of the
percent variance in the data explained by the model, do not warrant drawing
any conclusions as to the ability of these variables to predict levels of
2,3,7,8-TCDD. These results are in general agreement with those obtained from
a stepwise regression analysis performed by the VA (VA and USEPA. 1989) of
2,3,7,8-TCDD level vs. demographic variables. The regression results are
summarized in Table 6 below. Scatterplots of 2,3,7,8-TCDD levels vs.
demographic variables are displayed in Figures 1 through 4 to illustrate the
random distribution of 2,3,7,8-TCDD levels across these variables.
Figures 1 and 4 depict 2,3,7,8-TCDD levels vs. the individuals' age
(at time of death or surgery) and body mass index, respectively. These plots
underline the nonsignificant regression results summarized in Table 6.
Figure 2 shows 2,3,7,8-TCDD levels against specimen collection
year. Due to the significant regression results mentioned above, the
regression line with a slope of -0.63 has been drawn here. There is a slight
decrease of 2,3,7,8-TCDD levels with successive collection year, but the
predictive power of that variable in estimating the analyte level is very low
(R2 of 3%). The significance of the small, negative slope is mostly due to
the large sample size in this specific situation.
Figure 3 shows 2,3,7,8-TCDD levels vs. the individuals' year of
birth. The significant regression line with a slope of -0.60 has been drawn
to show the slight decrease of 2,3,7,8-TCDD levels with successive birth
year. It should be noted, however, that although this slope is significantly
different from 0 at the 5% level, the predictive power of birth year with
respect to that analyte is very low (R2 of 5%). Again, the significance of
the small, negative slope is mostly due to the large sample size.
13
-------
c
g
1
4-J
I
o
O
Q
Q
I (0
CO .0
K-9
co •§
CM .CO
-
w 2
•? »
DC o
c E
O 0
Q
-o
'co
O>
0
DC JK
0
1*
co
CO
.0
I
p *.
1 ° SL
c 2 o
.2 ,M CO
to m
c
o
'55 0
V) Q.
g) 0
0> CO
0
CC
•a
2
rt -v
§•£
CO
CC
T- C*-
t- 1
c >
CO «5
0 -1
E *
0) w
co oS
0
Q. 0.
E •-
•• //\
cd CO
CO
5
Q
O
1-
00
h-
co"
CM"
•o
K
Variable DJ
CO CO
CVJ T-
0 0
CO O
CO CO
6 d
1 1
co in
o co co n
§ LU LU 5
z >- >- z
m co co en
O) O) O) CD
(0
f
C
o
i 1
0 C
2-i
III
O r- >s
CD fl) = T3
o) a. .E o
< CO CO CO
CO
.2
Concentrat
:••••
•:'••'
,_ ,_
0 0
o o
CD Tj-
0 0
o d
i i
° N
o co co r>
y LU LU ^
Z >_ >_ Z
in co co o)
O) O5 O) CD
>
C
.0
t3 x
S •§
c ^8
III
O r- >H
CD CD M ^
O3 Q. .b O
< CO CO CO
^
O
1
CD
g
3
O)
3
.. ,.
,-.'
•'
'.:-.-•
'•:••]•
• •*•>:'
.:•.;:•:•
:'V :'
;•:;; ;
" :.
••:•:;:••
•"•••:••
••S':;
:'?:S
:.::•:'
:i*
o in
CM T-
0 0
in co
N •*
c> d
i i
h- m
n CO CO o
U LU LLJ U
Z >. ^ Z
^- in in co
en o) en CD
ffl
?
c
.2
t> x
0) O
— "O
= -i
Ili
'o ^ >>
0 CP *S "D
O) Q. . b O
< CO CO CO
«j 0 S
§ E B)
•=( 03 Q.
Concentrat
(Without extr
value of 106
-•:;.
:•••
•:::;:'
£<*
.:'•?
1
;.':•:•:
• •[ |;>
•:;•:
%•'•
:
::}i
;:;•:
• K •'•
.<;' '
-:•::!:
«
a & f •§
< CO CO CO
5
o
i S
.3 £1
s
a
2
c
>
§
g
O)
'55
S
•
(O
o
w
0)
0)
£
0)
5
»^
o
to
.2
14
-------
2378-TCDD vs. AGE
(based on 195 specimens)
^
o>
a
ID
|
c
u
c
o
o
a
a
l
o
N
n
cs
•x
a
0
|
"c
u
§
O
O
O
1
CD
N
N
no -
100 -
90 -
80 -
70 -
60 -
50 -
40 -
30 -
20 -
10 -
+
+
•
+
+
* +
4- + "*"
+ + + +
j. ± "*" + +
+ + + t * + + *± + + ± + + + * ± +
+ *** ;!ij!|jii$f::*Ij*+
H 1 1 1 1 1 i i
15 25 35 45
Age in Years
Figure 1 : Plot of 2,3,7,8-TCDD vs. age of individual at time of collection.
2378-TCDD vs. COLLECTION YEAR
(bosed on 196 specimens)
100 -
90 -
80 -
70 -
60 -
50 -
40 -
30 -
20 -
10 -
*
O1
^
o
Q
0 4 &
$
$ * ^o
t * * 1
* ^ — t 1 1 B 4 ^ * 1 8 ft
1 TT~^~7~T~t|~i* — "
5 f ^
I i 1 1 1 1 1 1 1 1 1 1 1 1
70 72 74 76 78 80 82 84
Collection Year
Figure 2: Plot of 2,3,7,8-TCDD vs. collection year.
15
-------
2378-TCDD vs. BIRTHYEAR
(based on 196 specimens)
/"S
0>
0>
a
_o
|
"c
0
c
o
O
O
a
fi
I
a
n
C4
|
|
|
'c
0
0
u
a
O
O
1
a
N
100 -
90 -
80 -
70 -
60 -
50 -
40 -
30 -
20 -
10 -
A
A
..
«•
A
A
A A A
A A
A A A A
A A A A
* — l~~l — s— ^-4— 1— 4__l 1 | A ^ A
A A A I A A iT||tSi^l~~l~~^^~^
A S A ^ 8 ^ A
i i t i i i i i i i i i i i i i i i i
35 37 39 41 43 45 47 49 51 53 51
Birth Year
Figure 3: Plot of 2,3,7,8-TCDD vs. birth year of individual.
2378-TCDD vs. BODY MASS INDEX
(Data available on 169 specimens only)
100 -
90 -
80 -
70 -
60 -
50 -
40 -
30 -
20 -
10 -
a
a
a
a
a
rfi a
a a
a an
° °cSta# a °
D ^^tfif-ffl j-i
ly-BG^^MrrH. an a a
° (j^H^^a^ ° ° %
° a^^fiSi™00 a °
_j j ( , 1 1 ,
10 30 50 7
Body Mass Index (kg/m**2)
Figure 4: Plot of 2,3,7,8-TCDD vs. body mass index.
16
-------
It should be noted that the three variables, age, year of birth, and
collection year, are interrelated as follows: age = collection year-year of
birth. Therefore, considering any two of the three variables is sufficient in
subsequent analyses, since the third variable provides no additional
information.
IV. MULTIVARIATE STATISTICAL METHODS
This section presents a brief overview of the multivariate tech-
niques described in Section V. To this point, simple statistics such as
means, medians, standard deviations, and frequencies of all the available
variables have been discussed. These statistics provide an efficient summary
of the data space, but only on an individual variable basis. To show rela-
tionships between any pair of variables, if they exist, sample correlation
coefficients are needed. The limitations of these two-way relationships are
immediately obvious, however. If one were to look at all two-way relation-
ships between the 16 analytes on the one hand and the 6 demographic variables
on the other, plotting and analyzing would require 96 bivariate graphs. And
these relationships would not include the 120 two-way plots between pairs of
analytes. Not only would it be a large analytical task, but most important,
relationships between more than two variables would never be considered. Thus
the objective is to be able to consider sets of variables simultaneously. The
problem then becomes one of dimensionality; namely, with more than three vari-
ables, we are practically unable to visualize any relationship in space.
Multivariate techniques, or pattern recognition techniques, have
been developed and provide rules for combining variables in an optimal way
(the criterion of optimality varies from one technique to another) so as to
reduce the dimensionality of the set of variables with as little loss of
information as possible. In this context, two statistical techniques have
been used to complement each other: cluster analysis and principal component
analysis.
There are three major objectives in pattern recognition: data
reduction, feature extraction, and classification. In cluster analysis, in
which little a priori knowledge of possible grouping of variables is required,
natural groupings and structures among variables are studied without prej-
udice. The same technique can be applied to the observations (specimens) in
the study. In principal component analysis, the goal is to define linear
combinations of the original variables and thus to reduce the dimension of the
data space.
The Biomedical Programs (BMDP) and the Statistical Analysis System
(SAS) software packages have been used exclusively in this study.
A. Cluster Analysis of Variables
Given a large set of variables, multivariate relationships are
sought instead of the simple bivariate relationships found with correlation
analysis techniques. Cluster analysis is applied to identify similarities
between analytes, not between observations (specimens in this case). Analytes
17
-------
contributing nearly the same information to the data structure are clustered;
one can then select one or more analytes from a cluster, discard others if
desired, and proceed. Cluster analysis thus shows whether there are natural
groups of analytes with similar behavior.
The first step in cluster analysis, then, is to define a measure of
association for clustering variables. In this study, the correlation
coefficient is chosen. The single linkage option in the BMDP P1M procedure is
selected for the clustering. This linkage process continues until all
variables have been included in a single large cluster. The clustering
results are then printed in a dendrogram or tree diagram. The determination
of the final number of clusters is, however, left to the user and depends on a
chosen minimum similarity value (the correlation coefficient in this case).
If the value chosen is too high (i.e., close to 1), then many clusters will be
formed, and most of them might be single-point clusters. Since outputs pro-
vided by the classical statistical software packages show results after com-
plete clustering, a decision as to an optimum similarity value can be made
a posteriori. This is the cluster analysis technique that was applied
exclusively to the 16 analytes.
B. Principal Component Analysis
Principal component analysis (PCA) is an effective tool for data
reduction; and at the same time, it provides a means for data representation
in two- or three-dimensional space. In PCA, the goal is to assess the struc-
ture of the variables within a given set of variables, independently of any
relationship they may have to variables outside this set.
The first step in PCA is to choose the chemical and demographic
variables to be included in the set and a measure of association between
them. In order to treat all variables as equally important, we have chosen
the correlation matrix as the measure of association (i.e., the data were
autoscaled). Also, all concentration data have been log-transformed for PCA
to improve on the skewness of the distributions of analyte results.
The next step is to combine the original variables into linear com-
binations in a manner that accounts for the maximum amount of variance present
in the data. If the starting set of variables contains m variables, then m
principal components will be generated. The first principal component chosen
is that which represents the single best linear combination of variables, in
other words, accounts for the highest percentage of the total variance in the
data. The second principal component is chosen to account for as much of the
remainder of variance as possible. Hence, in PCA, the principal components
are orthogonal to each other, which means they are uncorrelated. This process
continues until m linear combinations are found.
In general, a few principal components are sufficient to explain a
large amount of the variance in the data. The importance of a principal
component is mostly judged by its associated eigenvalue in the correlation
matrix. Eigenvalues decrease with decreasing importance of the principal
components. Past experience indicates that principal components with
eigenvalues below 1.0 explain little correlated behavior among the measured
18
-------
variables and that they often reflect random variance contributions due to
measurement and sampling error. Therefore, principal components with
eigenvalues below 1.0 are generally discarded. A second criteria in the
selection of the final number of principal components is the final communality
estimates associated with each variable. Final communality estimates are a
measure of how well the variables are accounted for by the number of principal
components selected. Thus, an additional component, although having a low
eigenvalue, might be included in the final set.
The coefficients of the variables (or transformed variables) in the
linear combinations are called loadings. They are a measure of the relative
contribution of a variable within a principal component to the variance
explained. In fact, the principal component loadings are the correlation
coefficients between the original (or transformed) variables and the principal
components, and thus provide the key information about the nature of the
principal component. These loadings can be used to order the variables within
a given principal component and help affix a "label" to this principal
component. This step in the PCA is the most difficult one because it is
largely a matter of judgment. This is where the interpretation of the results
begins.
Once the principal components have been derived and a final selec-
tion of significant principal components made, principal component scores for
each data point (specimen) are calculated from the standardized data. This is
done by using the linear combinations of variables with their loadings on each
principal component. Thus, if three principal components are retained, three
scores will be computed for each specimen, and three two-dimensional plots of
the scores can be generated: principal component 2 vs. principal component 1,
principal component 3 vs. principal component 1, and principal component 3 vs.
principal component 2, or one three-dimensional plot. Since the principal
components are linear combinations of the original (transformed) variables, it
is expected that the plot of principal component 1 vs. principal component 2
should contain more information about the data structure than any simple two-
variable plot. The second most difficult step is then to give practical
meaning to any pattern seen on the principal component plots. On the one
hand, care should be taken not to oversimplify the relationships between
variables; on the other hand, one should not read too much into the patterns
found.
When plotting the two- or three-dimensional principal component
plots, the data points can be labeled using available codes. These codes
could be those representing selected categorical demographic variables, for
example, race, geographic region, or a combination of both, or military/
civilian status. An additional approach is to plot only those scores lying
between -3 and +3, thus including all but a few large values. This achieves a
higher resolution of the plots so that possible patterns in the figures can be
made more visible.
19
-------
V. MULTIVARIATE ANALYSIS RESULTS AND DISCUSSION
Following the initial univariate and bivariate (descriptive
statistics, regressions and correlations) analyses of the data, a series of
multivariate procedures, as described, were applied. The same techniques were
used to analyze various subsets of the data, the results of one technique
leading to the use of another. Cluster analysis results for the chemical
information are discussed first, followed by a discussion of the results of
principal component analysis.
A. Cluster Analysis of Analytes
1. Approach
The objective of the cluster analysis, the first analysis step, was
to uncover possible significant interrelationships among analytes or groups of
analytes. The input for that analysis was the data matrix of all analytical
results from the 197 NHATS subjects. Levels below detection limits were
replaced by one-half the limit of detection. Since the detection limit for
most chemicals is well below typical values, the effect of this procedure for
handling nondetects is minimal.
Four sets of cluster analyses were performed. They differed with
respect to the number of analytes included and whether the concentrations or
the natural logarithms of the concentrations were used as input. The log-
transform approach was selected to resolve two potential problems: (1) non-
normality of the PCDD and PCDF concentration levels and (2) sensitivity of the
cluster analysis to extreme concentration values. The cluster analyses were
performed both with and without the five PCDFs with more than 25% nondetects.
Absolute correlations (in this case, simple correlations since all
correlations are positive) were used as the measure of similarity between any
two analytes. When a cluster contained more than one analyte, the similarity
between any two clusters was computed as the maximum similarity over all pos-
sible pairings of analytes between the two clusters. At each step in the
clustering process, the two clusters with the maximum similarity were com-
bined. The process continued until all 16 analytes were included into a
single cluster.
The BMDP P1M procedure, cluster analysis of variables, was used to
perform these analyses.
2. Results
Of the cluster analyses mentioned above, only those results obtained
when using the log-transformed concentrations for all 16 analytes are
presented here. The results of the other three cluster analyses were very
similar. Excluding those five analytes with high occurrences of nondetects
from the set of analytes did not influence the grouping of the remaining
analytes. As shown below, these five analytes remained "individuals" in the
clustering process, that is, one-variable clusters.
20
-------
The log-transformation had some effect on the correlations shown
between pairs or groups of analytes. Overall, the correlation coefficients
between pairs of log-transformed analyte concentrations dropped slightly as
compared to the correlation coefficients presented in Table 3 (based on
untransformed concentrations). However, as noted earlier, due to the skewness
of the concentration distributions (see histograms in the Appendix), emphasis
was placed on the results obtained using the log concentrations. This
approach is also in line with that taken by the VA statisticians in their
analysis of 2,3,7,8-TCDD data (VA and USEPA. 1989).
The clustering results are shown in Figure 5 in the form of a tree
diagram. A companion table (Table 7) has been developed to show the simi-
larity between two clusters at the time they are joined in the final cluster-
ing process. The analyte names and numbers are listed on the left side of the
diagram in Figure 5. The tree diagram is printed over the similarity matrix
and shows the clusters formed during the linkage steps. Solid lines indicate
boundaries of clusters such that the similarity between the last two clusters
joined exceeds 0.80. Broken lines are used in all other cases.
The first two columns in Table 7 list the two analytes that form the
cluster boundaries when a given cluster is considered. The third column shows
the number of analytes in that specific cluster. The last column indicates
the value of the similarity at the time when the cluster is formed. For
example, take analyte 10. A two-analyte cluster consisting of analyte 10
(first column) and analyte 5 (second column) is formed early in the process
with a similarity of 0.92 (last column). Since this cluster contains only two
analytes, the value of 0.92 is simply the correlation between these two ana-
lytes. A similar situation occurs with analyte 7 and analyte 6 (first and
second column in Table 7) with a correlation of 0.94.
Next, take analyte 12 (first column in Table 7). A four-analyte
cluster (third column) bound by analytes 12 and 4 (second column) is formed
with an overall similarity of 0.85 (last column). This cluster also contains
the two-analyte cluster formed by analytes 6 and 7 (see Figure 5).
An example of a larger cluster would be the cluster delineated by
analyte 4 (first column in Table 7) and analyte 8 (second column) and con-
taining 10 (third column) of the 16 analytes with an overall similarity of
0.82. The 10 analytes contained in that cluster can be read from Figure 5
where it is shown by a solid line as the largest cluster.
The last column in Table 7 shows a drop in similarity values between
the 3rd and the 4th rows (0.59 and 0.82, respectively) and between the 12th
and 13th rows (0.83 and 0.72, respectively). These relatively sharp drops in
similarity values were the basis for the cutoff point selection. Also, with a
sample size of 197, using a correlation of 0.80 as a cutoff provides a con-
servative approach to the interpretation of the clustering process.
21
-------
Analyte Name
2378-TCDF*
12378-PeCDF*
2378-TCDD
23478-PeCDF
123478-HxCDF
123678-HxCDF
1234678-HpCDF
12378-PeCDD
1234787
123678-HxCDD
123789-HxCDD
1234678-HpCDD
OCDD
234678-HxCDF
1234789-HpCDF*
OCDF*
123789-HxCDF*
Analyte
No.
/ //
/ //
/ /
/ /
/
9 /
13
15
Each horizontal or diagonal line
starts as one analyte and ends
where it intersects with a line
from another analyte or cluster.
The analytes included between any
such pair of lines form a cluster.
Solid and dotted lines delineate,
respectively, clusters with
similarities above and below 0.80.
* Analytes with high occurrences
of non-detects (above 25%)
Figure 5: Tree diagram showing clustering process of all 16 analytes based
on the correlations of the log-transformed concentrations.
22
-------
Table 7. Summary Table of Analyte Clusters Formed
and their Similarity Values
Analyte Name/No.
2378-TCDF 1*
12378-PeCDF 3*
2378-TCDD 2
23478-PeCDF 4
123478-HxCDF 6
123678-HxCDF 7
1234678-HpCDF 12
12378-PeCDD 5
1 23478/1 23678-HxCDD 10
123789-HxCDD 11
1234678-HpCDD 14
OCDD 16
234678-HxCDF 8
1234789-HpCDF 13*
OCDF 15*
123789-HxCDF 9*
Other Boundary
Of Cluster
(Analyte No.)
9
1
15
8
12
6
4
11
5
4
16
4
2
2
1
1
Number of
Analytes
in Cluster
16
2
-;'-. ' '&"> ••••'
10
3
2
4
3
2
7
2
9
11
12
15
16
Similarity
Value When
Cluster Formed
0.37
0.45
0.59
0.82
0.86
0.94
0.85
0.88
0.92
0.84
0.86
0.83
0.72
0.70
0.44
0.37
Notes: Similarity values above .80 are between the two solid lines
Analytes marked by an * had levels below detection limit in over 25% of the specimens
(see Table 1)
Results are based on correlations between log-transformed concentration levels
(Results in Table 7 correspond to those shown in Figure 5)
23
-------
A joint interpretation of the tree diagram (Figure 5) and the sim-
ilarity table (Table 7) reveals the following.
• Three two-analyte clusters with high correlations were formed. These
are:
1,2,3,4,7,8- and 1,2,3,6,7,8-HxCDF;
1,2,3,7,8-PeCDD and l,2,3,4,7,8-/l,2,3,6,7,8-HxCDD pair; and
1,2,3,4,6,7,8-HpCDD and OCDD.
These strong groupings were also observed from the bivariate statistical
analysis (see Section III) based on simple correlations. These high
positive correlations between analytes within a pair indicate that the
two analytes occur simultaneously and are either both high or both low in
concentration levels. Thus, tracking the levels of one analyte of a pair
is practically equivalent to tracking the other.
Further clusters that are noteworthy will consist of at least three analytes.
• Two three-analyte clusters with a high similarity value each emerged from
the process. These are:
1,2,3,7,8-PeCDD, l,2,3,4,7,8-/l,2,3,6,7,8-HxCDD pair, and 1,2,3,7,8,9-
HxCDD (similarity value of 0.88); and
1,2,3,4,7,8- and 1,2,3,6,7,8-HxCDF and 1,2,3,4,6,7,8-HpCDF (similarity
value of 0.86).
Next, one cluster consisting of four analytes is formed when 2,3,4,7,8-
PeCDF joins the three-analyte cluster of 1,2,3,4,7,8- and 1,2,3,6,7,8-
HxCDF and 1,2,3,4,6,7,8-HpCDF with a similarity value of 0.85.
• The four-analyte cluster then is joined with the other three-analyte
cluster and with a similarity value of 0.84 between them.
The clustering process continues until all but the five PCDFs with high
occurrences of nondetects and 2,3,7,8-TCDD are joined. That cluster
(denoted by the largest triangle of solid lines in Figure 5) contains 10
analytes with a similarity value of 0.82 when it was formed.
2,3,7,8-TCDD and the remaining five PCDFs are kept as single analyte
clusters since they do not correlate highly with each other (see also
Table 3) or with any other cluster formed earlier in the process. The
presence of the five PCDFs in this category can be explained by the large
number of specimens with levels below the detection limit. In this case,
only small variations are seen in concentration levels across specimens
which contribute little information to the data base. 2,3,7,8-TCDD,
detected in 98.5% of the specimens, exhibits a behavior that is
altogether unlike that of the other analytes.
24
-------
If one were to select an a priori number of clusters by drawing a
vertical line through the upper right-hand corner of the large triangle (solid
line) in Figure 5, then one would obtain six disjoint clusters. The final
results of this cluster analysis are depicted in Figure 6. This figure is
another way of presenting the information contained in Table 7 and Figure 5.
3. Interpretation
The results of the multivariate cluster analysis clearly demonstrate
that although 2,3,7,8-TCDD was measured at quantifiable levels in practically
all samples of the study design, there is no definitive correlation between
its incidence of detection and concentration and the other dioxin and furan
congeners that were predominant in the human adipose tissue specimens. The
lack of correlation of 2,3,7,8-TCDD with other PCDDs and PCDFs may result from
several factors which include the sources of exposure and the half-life of the
specific dioxin and furan congeners in the human body. Data to support either
or both of these factors are limited. However, it is recognized that major
sources of exposure include commercial products (such as Agent Orange, tri-
chlorophenol, pentachlorophenol, etc.) and incineration emissions. The levels
of specific PCDDs and PCDFs vary within specific chlorinated commercial prod-
ucts. For example, 2,3,7,8-TCDD was the major dioxin contaminant in Agent
Orange, whereas the hexa- through octachlorodioxins and furans are the pre-
dominant contaminants for a product such as pentachlorophenol.
Incineration emissions containing the 2,3,7,8-substituted compounds
have been implicated as another potential source of exposure. However, before
correlations between emission and body burden levels can be either determined
or disproved, further work will be necessary. As a minimum, such efforts
would consist of compiling a number of discrete but detailed data bases from
the available literature and investigating relationships among them.
The half-life estimates in the human body for dioxins and furans
have been limited solely to the 2,3,7,8-TCDD. The Centers for Disease Control
Ranch Hand study (Pirkle et al. 1989) has resulted in an estimate of half-life
for this compound at approximately 5 to 10 years. Again, since no additional
half-life information is available for the other compounds, correlations with
the observed body burden data are not possible.
B. Principal Component Analysis
Principal component analysis (PCA) was performed on the global data
set consisting of analytical and demographic data. The following subsections
present a brief description of the approach taken, the results, and the
interpretation of these results.
1. Approach
In PCA, as in cluster analysis, one concentrates on relationships
within a set of variables. Whereas cluster analysis is a tool for determining
whether groups of analytes with similar behavior exist, PCA is a technique
25
-------
1
P^MI
Q
Q
0
h-
f5
CO
CJ
I^M
1
c.
c
c
T
^
C
C
T
] 23478-PeCDF
LJ_
Q
O
Q.
I
1
1234678
I 1 1
3 C
JC
x >
L I
6 oc
~~ r-
* C£
o r
\J C\
— T"
)
>
c
>
i
j
L
C
C
c
r
t
c
c
12378-PeCDD
L
3
K
C
6
**.
?
0
VJ
I
C
C
C
>
J
0
a
r>
c*
o
i-
"*.
a
r<
T
p
c
T
i
3
)
(
}
3
IM
5
J
^
IN,
•5
M
Q
O
X
I
CO
h>
CO
CO
CVJ
Q
Q
O
£a
^ Q
S o
^0
<^-
co
CM
c
c
f
c
\J
o
s
II
5
»>
c
a>
_
o
J)
*
u_
Q
O
0
1234789-HpCDF*
U.
Q
O
X
123789-H
*
* U_
£Q
R°
O CD
H CL
co co
r^ r^.
CO CO
CM CM
£
32
CO
-»~* WJ
* £
CD
.to
CO
c
o
2
"c
o
o
c
o
o
73
E
CO
I
cn
0>
"o
CO
c
_o
"*-•
CO
o
o
O)
CO
CO
.£>
CO
CO
CO
(£>
CO
*o
*2
"5
CO
s>
CO
'co
_>.
CO
CO
.2
To
_3
o
to
I
O)
26
-------
used to reduce the dimensionality of the original set of variables in a dif-
ferent manner. The objective of PCA is to derive a few new components as
linear combinations of the original variables which will provide a description
of the structure of the data with as little loss of information as possible.
Various subsets of the data were subjected to PCA. All analyses
were performed using the principal component analysis option of the FACTOR
procedure followed by the SCORE procedure from the Statistical Analysis System
(SAS) package. Also, a subroutine was added to enhance the output by ranking
the variables by their percent contributed variance within each principal
component. The analyses were performed on the correlation matrix; i.e., the
variables were autoscaled (standardized) so as to be treated with equal
importance. All concentration data were log-transformed prior to PCA.
2. Results
Of the 16 analytes in the data base, those with less than 25% of
nondetects were excluded; a total of 11 analytes remained. The list of the
five PCDFs excluded is given in Section III.A. The demographic variables con-
sidered at first were individual's age at sampling, specimen collection year,
and body mass index. The inclusion of body mass index reduced the number of
specimens to 169, since body mass index was available for only that many.
The results, in terms of significant principal components, showed
that body mass index, for all practical purposes, did not contribute to
explaining any of the variation in the data set. Thus, the decision was made
to eliminate that variable from subsequent analyses. A total of 11 analytes
(log-transformed concentrations) and 2 demographic variables, age at sampling
and specimen collection year, and a total of 195 samples were kept for the
final analyses described below.
Three principal components were kept for further examination. The
first and most predominant principal component explains 64.4% of the total
variance in the data set. The second principal component explains 12.7% of
the total variance and the third principal component contributes 6.39% of the
variance. Thus, the three principal components account for 83.5% of the total
variance in the data, with the first two alone accounting for a total of
77.1%.
The profiles of the three principal components are presented in
Table 8 along with the variable loadings and their individual contributions
(percent variance) to the variance explained by the given principal com-
ponent. Within each principal component, the variables are listed in order of
decreasing absolute loadings; a minus sign preceding a loading denotes a
negative correlation of that variable with the principal component. The
fourth column lists cumulative percentages and thus adds up to 100% within
each principal component. The last part of Table 8, entitled "Final
Communality Estimates" is explained next.
The eigenvalues associated with these three principal components are
8.37, 1.65, and 0.83, respectively. Of a possible total of 13 (number of
variables included in the PCA), these correspond to the proportions of 64.4%,
27
-------
Table 8. Principal Component Analysis Results (cont.)
Principal Component 1 (64.4% of total variance explained)
"Higher Chlorinated Furans and Dioxins"
Variable (Analyte No.)*1
(ordered by % variance)
123478-HxCDF {6}
1237S9~HxCDD {11}
i2367^HxCDP (7)
123478/12367taxCDO (10)
•'•• A:::,,.:,;;:. 'T^, ' OCDD {16}
:• 1234678~HpCDF {12>
1234678-HpCDD (14)
12378r~PeCDO (5)
23478-PeCDF (4)
234678-HxCDF (8)
2378-TCDD (2)
Collection Year
Age at Sampling
Variable
Loading
0.94
Q«92
0*91
0.90
0.88
0,87
0*8$
0.88
0.8$
0.78
0.78
-0.33
-0.02
Variance
Explained (%)
10.55
10x02!
9.99
9.71
9.20
8,98
&83
8.71
8,28
7.23
7.20
1.29
0.00
Cumulative
Variance (%)
10.55
20,58
$0.3$
40.28
40.48
5S.4&
67.29
76.00
84.28:
91.50
98.70
100.00
100.00
Principal Component 2 (12.7% of total variance explained)
"Age and Collection Year"
Variable (Analyte No.)* "I
(ordered by % variance)
Age at Sampling
Cottectian Year
1234678-HpCDF (12)
1 23478/1 23678-HxCDD (10)
23478-PeCDF (4)
123678-HxCDF (7)
12378-PeCDD (5)
123789-HxCDD (11)
1234678-HpCDD (14)
2378-TCDD (2)
OCDD (16)
123478-HxCDF (6)
234678-HxCDF (8)
Variable
Loading
0.92
0.84
-0.16
0.14
0.14
0.12
0.10
0.06
-0.04
-0.03
0.02
-0.01
-0.01
Variance
Explained (%)
. :•••" ..51v68:'
'" ; 42.47:-
1.57
1.19
1.10
0.93
0.65
0.19
0.10
0.06
0.04
0.01
0.01
Cumulative
Variance (%)
' ••-••-51..68:.:
' ."" 94.16.':
95.72
96.91
98.01
98.95
99.60
99.79
99.89
99.95
99.98
99.99
100.00
Note 1 *: All concentrations were log-transformed
Note 2: Shaded area within each set highlights variables with a variance
contribution above average (1/13 or 7.7%)
28
-------
Table 8. Principal Component Analysis Results (concl.)
Principal Component 3 (6.39% of total variance explained)
"Lower Chlorinated Furans and Dioxins"
Variable (Analyte No.)*1
(ordered by % variance)
234678^HxCDF {8)
; i;234678??HpeDO (14)
•:. •••";[ v :iS378^£D#:r^ ;•;
: .! 2378-TCDD (2)
1234678-HpCDF (12)
OCDD (16)
Collection Year
23478-PeCDF (4)
1 23478/1 23678-HxCDD (10)
Age at Sampling
123678-HxCDF (7)
123789-HxCDD (11)
123478-HxCDF (6)
Variable
Loading
.;— &4$j
'. , -0.37
;:^.';;>0#r
""" "~D.37'f
0.22
0.21
0.21
-0.21
-0.21
-0.05
0.04
-0.03
0.00
Variance
Explained (%)
22.Q0
f:!":t6*a&1
"•• '••': 16.7$;
"f •:":""• • -: :1:6.;62
5.62
5.56
5.51
5.32
5.07
0.33
0.16
0.11
0.00
Cumulative
Variance (%)
' ';•'"••' .••22$9H
•:-.-:"'-':;:V:'.v;38s94:::
.: '.:'••.::•' ••5$6*::
! •' 72.30
77.92
83.48
89.00
94.32
99.39
99.73
99.89
100.00
100.00
Final Communality Estimates
Variable (Analyte No.)*1
2378-TCDD (2)
23478-PeCDF (4)
12378-PeCDD (5)
123478-HxCDF (6)
123678-HxCDF (7)
234678-HxCDF (8)
1 23478/1 23678-HxCDD (10)
123789-HxCDD (11)
1234678-HpCDF (12)
1234678-HpCDD (14)
OCDD (16)
Age at Sampling
Collection Year
Total Communality
Percent Variance Explained
Final Communality Estimates
PC1
0.60
0.69
0.73
0.88
0.84
0.60
0.81
0.84
0.75
0.74
0.77
0.0004
0.11
8.37
64.4
PC1+PC2
0.60
0.71
0.74
0.88
0.85
0.60
0.83
0.84
0.78
0.74
0.77
0.86
0.81
10.02
77.1
PC1+PC2+PC3
0.74
0.76
0.88
0.88
0.85
0.79
0.87
0.84
0.82
0.88
0.82
0.86
0.86
10.85
83.5
* next to last column marks those variables showing largest improvement in
final communality estimates from 2 to 3 principal components
Note 1 *: All concentrations were log-transformed
Note 2: Shaded area within each set highlights variables with a variance
contribution above average (1/13 or 7.7%)
29
-------
12.7%, and 6.39%, respectively. Generally, principal components with
eigenvalues less than 1.0 are not considered in the interpretation of PCA
results. Past experience indicates that principal components with eigenvalues
below 1.0 explain little correlated behavior among the measured variables and
that they often reflect random variance contributions due to measurement and
sampling error.
This reasoning should prompt us to discard the third principal
component with a low eigenvalue of 0.83. However, other information in the
PCA results was considered in the selection of the final number of principal
components.
Each selection of a number of principal components has associated
with it a set of final communalities for the 13 variables considered. They
are calculated as the sum of the squares of the loadings (correlations) of a
given variable on the principal components. Final communality estimates are
the multiple correlations for predicting the variables from the estimated
principal components and will therefore be limited by 1. They show how well
the variables considered in the PCA are accounted for by the number of com-
ponents selected. Thus, if one were to consider all 13 principal components
generated by the PCA on the 13 variables, then the final communality estimates
would each be 1 and therefore add up to 13, explaining 100% of the total
variance in the data.
The final communality estimates of each variable were carefully
examined in the successive consideration of one, two, and three principal
components. These results are shown in the last part of Table 8. As shown in
the column entitled "PC1+PC2", it is clear that 2,3,7,8-TCDD and 2,3,4,6,7,8-
HxCDF with final communalities of 0.60 each, are "poorly" accounted for by a
two-principal component model. Adding the third principal component to the
model considerably increases the final communalities (compare column 3 with
column 4 entitled "PC1+PC2+PC3") for these two analytes, as well as those of
1,2,3,7,8-PeCDD and 1,2,3,4,6,7,8-HpCDD. Therefore, it was found that the
third principal component is important to the interpretation of the data in
spite of its low eigenvalue (0.83) and that a three-principal component model
should be considered.
3. Interpretation
Overall, three principal components, explaining approximately 84% of
the total variance, were judged to be important in the interpretation of the
data. The selection of three rather than two principal components was made in
spite of the low eigenvalue of the third component which contributes little
(6.4%) to the variance in the data. The reason for its inclusion was
discussed above.
Since the PCA was performed with 13 variables, variables with con-
tributions above 7.7% (=1/13), the average relative contribution should all
variables be equally important within a given principal component, were
considered significant. This cutoff is indicated by the shaded area in each
set in Table 8.
30
-------
Principal component 1, which explains 64.4% of the total variance in
the data set, consists mainly of the higher chlorinated furans and dioxins.
This principal component is purely a chemical principal component—the first 9
of the 13 variables considered account for about 84% of the variance explained
by this principal component. These nine analytes each contribute an above-
average (1/13 or 7.7%) share of the variation that can be explained by
principal component 1; they have about equal loadings ranging from 0.94 to
0.83. Principal component 1 was thus labeled the "Higher Chlorinated PCDDs
and PCDFs" component. This labeling is used on all PCA plots. Due to the
loading pattern, this principal component is approximately the sum of the
concentrations of these nine analytes.
An interesting fact should be noted in relationship to the outcome
of the cluster analysis (see Figures 5 and 6). In these figures, the list of
analytes in the large cluster of nine analytes (similarity of 0.83) is iden-
tical to the list of the nine analytes listed in order in the first principal
component in Table 8. This fact is not surprising. Considering that princi-
pal components are orthogonal to each other—that is, variables contributing
significantly to one principal component do not to another and vice versa—one
would expect that highly correlated analytes, such as those seen in the large
cluster from Figure 7, would all contribute to the same principal component.
Were this not the case, then some of the analytes would significantly contrib-
ute to the next principal component, meaning that some of the remaining vari-
ance could be explained by these analytes. This would be in contradiction
with the fact that these analytes are highly correlated.
The second principal component, which explains considerably less
(12.7%) of the total variance than does the first component, is interesting in
the fact that it is a purely "non-chemical" component. The two variables, age
of subject at time of death or surgery and specimen collection year, account
for 94% of that principal component. This principal component was naturally
labeled the "Age and Collection Year" component. It was shown earlier that
these variables, taken one at a time, contributed very little to the
variability in 2,3,7,8-TCDD levels. It is shown here, however, that consider-
ing all chemical and demographic variables simultaneously, can improve the
predictive power of a combination of variables with respect to the whole data
set. It should be noted, though, that an overall contribution of 12.7% leaves
considerable room for unexplained variance, were one to look at principal
component 2 only.
The third principal component, which explains only a small propor-
tion (6.4%) of the overall variance in the data, consist mainly of four
analytes. Two of these, 2,3,4,6,7,8-HxCDF (positive loading) and 2,3,7,8-TCDD
(negative loading), were those not contributing above average (1/13 or 7.7% of
variance) to the first principal component. However, the other two,
1,2,3,4,6,7,8-HpCDD (positive loading) and 1,2,3,7,8-PeCDD (negative loading),
were contributing to the first component as well. It is for these 4 analytes
that the addition of the third principal component resulted in the largest
increase in final communalities (see last part of Table 8.) This principal
component was labeled the "Lower Chlorinated PCDDs and PCDFs" component.
31
-------
It should be noted that the third principal component is the only
component to which 2,3,7,8-TCDD contributes a considerable amount of variation
(16.62%). However, the third principal component contributes only a low 6.39%
to the total variance in the data. Overall, when considering the three
principal components described above, 2,3,7,8-TCDD contributes approximately
5.7% to the variation in the whole data set determined by the 195 specimen
results. The figure 5.7% is obtained as the sum of the weighted percent
variances explained by the three principal components. The weights are the
percent variance attributed to 2,3,7,8-TCDD within each principal component.
Thus 5.7% is obtained as 7.20x64.4% + 0.06x12.7% + 16.62x6.39%. Of these
5.7%, the largest proportion of variance (4.6% = 7.20x64.4%) attributed to
2,3,7,8-TCDD is contributed by the first principal component. It should not
be expected that any pattern in the specimens will be visible along the axis
determined by this principal component.
4. Plotting of Principal Component Scores
The three selected principal components were used to compute factor
scores for each of the 195 specimens. These standardized scores were cal-
culated by using the respective loadings as the coefficients in the linear
combinations of the variables in each principal component. Plotting these
scores in the space spanned by the three principal components will provide a
means of detecting visible patterns or grouping in the specimens, should they
exist. Due to the disproportion of variance explained by the three principal
components, the specimen scores were plotted on two principal components at a
time, rather than in a three-dimensional figure. The latter approach, in
which the third dimension would add only about 6% to the 77% contributed by
the first two dimensions, would tend to obscure rather than enhance possible
patterns.
Figures 7 through 15 show nine selected principal component plots.
These are two-dimensional plots of principal component 2 vs. 1, 3 vs. 1, and
3 vs. 2, each using a different label for encoding the data points. The
principal component explaining the highest proportion of variance in a given
pair is always plotted on the X-axis. Plots of principal component 2 vs. 1
will display 77.1% of the variation in the data; plots of principal com-
ponent 3 vs. 1 will display 70.8% of the variation with most of the variation
along the first component; and plots of principal component 3 vs. 2 a total of
19.1%.
The axes representing principal components 1, 2, and 3 have been
labeled "Higher Chlorinated PCDDs and PCDFs", "Age + Collection Year", and
"Lower Chlorinated PCDDs and PCDFs", respectively, on all 24 plots. The 24
principal component plots are shown in four set of six as follows:
Figures 7 through 9: military/civilian status was used as the labeling
code.
Figures 10 through 12: geographic region was used as the labeling code.
• Figures 13 through 15: race (Caucasian vs. non-Caucasian) was used as
the labeling code.
32
-------
CO
^^
^M
CO
z
<
I
_J
>
o
>-
DC
Ij
5
ill
Q
0
O
to
CM
*f
to
to t
to
CM
CMCM
^ K)
tO tOf^j K
« toro
,_ to
~« ^S^V
,_*0 ,_ CM^' ^
CM ^Qii3Q '
IO *TO>^
CM CMr*i2TJ
rj
To n
to
to
CM
CM
1 1
)
CO O>
3 C^
2 x~. c
co
0 8 52 E _
11 > 1 8
i I" 1 > 2
55 c i *=
CD C c e
.. & .gj o o
f^ > Z O
II II II
O i- CM CO
to
)
to
/XJ
^1 ^"
^^ CM'0 *~
^ ^ ru
^_Nt«j to to ^
tWJ °*Va ^-~ CM'0
CM CM CM ?5
c^P ^ ron
NT*«
«N ^ ^
to ^
CM
CM
CM _
CM n
to
1 1
—
- "> t
(0
u.
Q
O
Q.
•o
- i
w
O
O
a.
^3
CO
.C
2
O
CO
.S>
6
II
0
^_ Q.
— i
1
to
1
tO CM T- O *- CM tO
1 1
to
i
•«
w)
C
jS
i|
t"^
1
£
•o
(0
c
0)
E
y
Q.
W
IT)
O>
'el
i
i
CM
«
componer
^
1
'E
"5
^^
0
Q.
Q)
3
.0)
• |
LL
.JB9A
e6v,
33
-------
CO
D
i
z
i
_j
Q
?
QC
•
^
5
^^
lij
Q
O
o
ro
i
T-
n I*
rfo
ro "~ ^ »
roro
CM ^ ^W
c^b/o ^1
r\i^ n«5^Vlpft
Nh ^ JSk
ro ",
^So tN
CsP ^nu >
CM ^
^^ i
CM n
ro ^
CM
1 1 1
w ST
2 fc.
eg ^ c
co «o 2
:= ^^ § >
o f a E
•> '<0 05 * o*
M I * S | |
i II? 1
.. W .2 o o
0 II II II
n O ^- cvi co
ro
CM
I CM
) ^ *~
CM
«r>"
C^CM" "
WN CM to *"
jn (W^ '"CM N
ViS- ^C?
_T-TO ^
.§ r^J^%
M *Tl* .j^
o £> ro ro
*~ ro
0
ro
CM
1 1
— uj
- ro
^
w
u.
Q
O
Q.
•o
§
Q
O
a.
-- I
1
.0
o
0)
.S>
^L
6
II
0
_ *- o-
ro
1
qppooqqop
<«• ro CM r- o y ro
status.
c
(0
1
££
1
j£
•o
«
cimens c(
$
«•
T-
"as
i
T"
5
CO
gj
^3
^^1
1
I
1
Q.
2
o
DL
00
2>
3
O)
LL
Vsda0d PUB SQQOd
34
-------
CO
p
<
1-
co
<
_l
>
0
>
cc
D
2
ili
Q
O
0
to
to
K
« CMO
to (
*~ fr^CM
^ CM K)
Cvf
% ^ K>^7~ r-
n ^$£ *c
** "° "Vim**
^
tO^^j^
0X1 to
" ^^ CM
CM
K5> ^
A. •
^ CM \r- ^
U) O> *~
a fc. ,0
a - s " i
to Z 8
0 II H II
O i- CM CO
1 1 1
M
CM CM
CM
s| *~ CM n
to
M^_CM ^_
I*!M
MVI CtL KIM ^
HN r< cvj
ft? -
° R -CM
1 1_
^ *~
^ CM
to ^~
^ CM
CM fO
CM
00 CM
5 tO
CM
*~
CM
CMCM n
1 I 1
ooooooooc
(0
1
i
'5
^
8
(0
t i
I I
.1 OT
I |
? T
CM
<
6
5
m
n -
a I
I
I
10
O)
I
O)
il
*SdQOd PUB saaod peiBuuoino
35
-------
z
o
o
LJJ
CC
O
I
Q_
^
CC
o
UJ
0
ili
Q
o
o
CM
CM
ro
CM f
04
ro
roro
" CM
CM CMj^i ff
•0 roro
ro ^ «
CM ^.nctfjijap)
ro" ro" ^ST"*
"~ <=• ioi
"" JW '
'ro CM
C(i
ro
ro
CM "
CM
ro
1 1
c
o
f
cc o>
o £2. -^
•= "oT — ?
•s M CO u)
f^ kll "'-
OB W •£ *-• ^^
II i 1 i t
..JO. o o o .2
Oj Z Z CO §
^J
o II II II II
O i- CM CO ^t
' -
°
"*"
" <+• CM*"
10 « to" n ""
'P) pg
rviro0*- N " fvl
w4*J ^ T~ ^^
i*i«i_ _*n »i- *~ _.(\J
r^r >r> ro CM
CM^ro ^
^ rocM r*i *~
-\ ,•
ro "*
ro
CM
CM ""*"
'-
1 1
- U}
-
- ro 6
Q.
•o
W
g
U
Q.
s
'C
|
0
1
I
t
II
0
0.
• T
ro
1
ro CM i- o T- CM ro
1 1
•
c
'S
hi*
1
O)
0
Cj)
0)
1
c
0)
a
w
t—
^
J_
S
CM
I
a
1
s.
•5
.E
Q.
^
O
Q.
O
0
^
O>
E
,JB9A uoiioa||OQ
36
ZOd
-------
O
0
LU
DC
g
I
Q_
<
DC
§
Lil
0
lii
G
O
O
CM
tO
_o
0>
o
"o.
2
O>
o
0>
o
O
N
0)
Q.
i
I
2
o>
C)
i
i
ii
'-
o"
50,
to
CD
W
i
i
ii
CM
I
|
O
CO
II
CO
s
to
i
II
•«•
IO
- to
CM
to
i
o
l
o
l
o
i
Q
O
Q.
I
&
Q
O
Q.
"S
I
•c
o
6
I
O)
if
o
Q.
CO
I
o
Q.
to
l
o
i
o
t
o
q
I
PUB SQQOd paiBuuomo iatM)-\a
37
-------
2
O
o
LLI
DC
0
I
0_
<
DC
O
O
LLJ
0
LU
Q
O
O
CM
CM
^
to
to •«-
fMh
wo
to
10
_f.
^^
"4-
o
'5>
CD ^
GC en
0 ~ & ~
18 "5 §
H * « * ~
c?l 5 A &8
o 1 i § i «
CO 0 0 n V
JB ^ 2 2 CO 5
o II II U II
O T- CM CO *
1 1
o o o
•«•' tO CM
CN
j
K
to
" -to
CM
fO tOKD
« - CM^
^
N ou^"
^^to
**T %
^ "*
CM ^ ^
1>I - rvif-,
rj tMf0cij
*«^»
to
,_CM
to
10 to
to
-
CM ^_
_t
~
*f\
^ '
r^« CM ^
1
M
r- CM
'~TJO
h """to «
o ^ n
Jto^ TI-
C^CM^
N t-
CM- 01 MO n «
ocO? "
^r «
° to" toto
N-> tO
._ to
i"
" " «
TM ^ tO
^ -t
to
l^\
10 to
tO CM
? - ^
to'i
?> N
*~ CM
to
'"^ CM
1 1 1
— r )
C
o
O)
2
,o
"" !c
&
£
O)
§>
^
T3
•8
8
(0
Is o>
o .i
^» o
^" ^i
O (A
1 8
•"" Q) W#
3 =
0 J
CD CM
O) •
< $
»
II
II +*
*- CM S
•' « |
1
"s
?
"g.
•5
0
Q.
CM
•^
tO 0
1 5
q q q q q q o>
^~ o <7 CM to •* j^
-SdQOd PUB SQQOd peiBuuomo JOMOI
38
-------
UJ
1
CODE:
K
0
O
a
o
0
o
o
« 0 '°
0 « o *
* **
* * *
° o $4-0
0 OO^^O^
o o ^
°° o° 0$ *
^k jfc
0 0 Sr^
o ° oo
0 *>0^ «
w Q
*o o
Q
o
o
o o
o
*
1 1
} CM *- C
"s
CO °>
CO
Q) vi
"5. £" c
E w .SS
55 T- v>
«. c o
o .2 2
o w /5
1 s v
tr « jL
CO O
a> 0 2
E II II
O o «
1
o
o
>
*
o
o « *
°o *. * o
w
_0<» OO°
* •<. V>°o*
00. ^o
tP 00
* ^ * ft oo
0*0 * o°
DO
f\ —f+
<& 0 ° °° 0
f
* B o
0 _
0 *
o
0
o o
o
\ 1
5 »- CM K
£
Q
O
Q.
8
Q
O
a.
«
o
6
I
O)
'±
*
II
O
Q.
8
8
I
t
CM
£
i
|
I
ro
I
o
S
E
oo
0)
3
O)
il
UOSP9HOQ
20d
39
-------
LJJ
O
DC
• •
LJJ
D
O
O
c
o
0
o ,
c
0 G
«« J«
o * ,
0 o <
* *
o
o 0»° °*
ft*°3?rf5J
* ° <£<
*^ 3
o o o °^
1 §
— .— N *""'
| ^ 'K
CO CO
11!
0 .. 3 §
f^^ ^Z
ii u
II 11
0 o «
0°
1
*
o
0
0
* *
J( _£) ^ .
f Cr O
* #
0 ^
*o °°o °
^•"°- °°° °
Ob " rt 0
^•c^o
3 Qp
, _0<^,° ° ° °o o
(to ^ 0 P^O 0
« ° r ou o
* o c
0 o o
o
0 °
0 <
*
1 1 1
D o o o e
*
o
0
o
o
o
1 1 1
D O O O C
in
i
Q
O
Qu
CO
Q
O
O
a.
o
6
0)
ra
if
o
Q.
I
8
CO
0)
5
CO
R
I
1
1
Q)
3
O)
ii
PUB SQQOd paiBuuomo
40
eOd
-------
HI
^
rr
m^m
• •
HI
G
O
O
o
c
c
o
0 00
0
* 00
* o o C
04
_ o^
°3 ° O
^Oj o
* QC O
* « ^9° o
^\ \J "
0 0 «
O^^o _ &
o
*
° 0°
* o
0 0
o
O QO
0
0
0 *
oT o
N _, WO <
? « o o
I 1 1 °° ° °°
OT "^ g
8*1
c S V
(BO
o 0 Z
o II u
O o «
1 1 1
0
0 0
o
*0 * o
5 0 °
°o° *
«° o** °
o ^
_^_ o
ep « °° o
' *^^J^^ — ^^
fj^^^j^^ ^J
O Qg
-fc
-------
Legends and sample sizes within each group as defined by the labeling
variables are given in a box on each plot.
The standardized scores of all 195 specimens vary between -2.11 and
4.02 along the first principal component, between -2.09 and 2.86 along the
second principal component, and between -3.33 and +3.04 along the third
principal component. Very few data points fall outside the -3 to +3
boundaries, in fact, none along principal component 2, an indication that the
plots are very "tight." Of the 195 specimens, only 5 have scores falling
outside the ±3 range. These five subjects consist of four non-Vietnam
veterans and one civilian.
These plots do not provides any clear stratification of the
specimens. Particularly Figures 7 through 9, which use military/civilian
status as the coding label, demonstrate that the group of Vietnam veterans in
this study exhibits no difference from the two other groups of non-Vietnam
veterans and civilians.
In summary, the plots of the two most important principal com-
ponents, the "Higher Chlorinated PCDDs and PCDFs" and the "Age Plus Collection
Year" components, explain 77.4% of the total variation in the data set.
However, these plots do not reveal any stratification of the specimens into
groups, whether defined by military/civilian status of the individual,
geographic region, or race.
The result of most interest, based on this set of matched NHATS
specimens, is that the group of Vietnam veterans does not exhibit higher
levels of PCDDs and PCDFs, including 2,3,7,8-TCDD, than either the group of
non-Vietnam veterans or the control group of civilians. However, to the
extent that the NHATS population is not representative of Vietnam veterans,
the results are not representative of 2,3,7,8-TCDD levels in the general
population of Vietnam veterans. The relatively small sample size (36) of
Vietnam veterans would limit the significance of results relating to
2,3,7,8-TCDD levels in the population of Vietnam veteran.1; in any case.
However, within the restricted NHATS population, the comparative results are
valid.
42
-------
VI. REFERENCES
Dunn JW, De Vault D, Rappe C, Wiberg K, Bergqvist, PA. 1988. The application
of SIMCA pattern recognition to dioxin and dibenzofuran residue data in Great
Lakes fish. The 8th International Symposium on Chlorinated Dioxins and
Related Compounds, UrneS, Sweden, August 21-26, 1988.
Lindstrom G, Rappe C. 1988. Multivariate data analysis applied in studying
the distribution of PCDDs and PCDFs in human milk. The 8th International
Symposium on Chlorinated Dioxins and Related Compounds, UmeS, Sweden,
August 21-26, 1988.
Pirkle JL, Wolfe WH, Patterson DG, Needham LL, Michalek JE, Miner JC,
Peterson MR, Phillips DL. 1989. Estimates of the half-life of 2,3,7,8-
tetrachlorodibenzo-p-dioxin in Vietnam veterans of Operation Ranch Hand.
Journal of Toxicology and Environmental Health 27:165-171.
Pitea D, Bonati L, Lasagni M, Moro G, Todeschini R, Chiesa G. 1988. The
combustion of municipal solids wastes: PCDD and PCDF in MSW and in emis-
sions. A chemometric approach. The 8th International Symposium on
Chlorinated Dioxins and Related Compounds, UmeS, Sweden, August 21-26, 1988.
Sielken, Jr. RL. 1987. Statistical evaluations reflecting the skewness in
the distribution of TCDD levels in human adipose tissue. Chemosphere
16:2135-2140, No. 8/9.
Sjostrom M, Nygren M, Hansson M, Rappe C. 1988. A multivariate comparison of
PCDDs and PCDFs levels in Vietnam veterans and controls. The 8th Inter-
national Symposium on Chlorinated Dioxins and Related Compounds, UrneS, Sweden,
August 21-26, 1988.
Tysklind M, Rappe C. 1988. Multivariate data analysis of operation and
emission variables from scrap metal melting at a steel mill. The 8th
International Symposium on Chlorinated Dioxins and Related Compounds, UmeS,
Sweden, August 21-26, 1988.
USEPA. 1986. U.S. Environmental Protection Agency, Office of Toxic
Substances. Analysis for polychlorinated dibenzo-p-dioxins and dibenzofurans
in human adipose tissue: method evaluation study. Washington, DC. USEPA.
EPA 560/5-85-022.
VA and USEPA. 1989. The Department of Veterans Affairs, Office of Environ-
mental Epidemiology, and the U.S. Environmental Protection Agency, Office of
Toxic Substances. Dioxins and dibenzofurans in adipose tissue of U.S. Vietnam
Veterans and controls. Washington, DC. USEPA. EPA 560/5-89-002.
Wold S. 1988. An overview of multivariate data analysis. The 8th
International Symposium on Chlorinated Dioxins and Related Compounds, Umei,
Sweden, August 21-26, 1988.
43
-------
BMDP Statistical Software. 1983. 1985 Printing. University of California
Press, Berkeley.
SAS: Statistical Analysis System, SAS Institute, Inc.
SAS® User's Guide: Basics, Version 5 Edition, 1985.
SAS® User's Guide: Statistics, Version 5 Edition, 1985.
Lotus 1-2-3 Release 2.01. 1986. Lotus Development Corporation, 55 Cambridge
Parkway, Cambridge, Massachusetts 02142.
44
-------
APPENDIX
HISTOGRAMS OF CONCENTRATION LEVELS OF ALL 16 ANALYTES
A-l
-------
HISTOGRAMS OF CONCENTRATION LEVELS OF ALL 16 ANALYTES
Each vertical bar on a histogram represents the percentage of the
total number of specimens whose concentrations fall within a given concentra-
tion range. The first bar in each histogram indicates the percentage of
specimens with concentrations below detection limit, if applicable. The
ranges are represented by their midpoints on the graphs. The width of each
range is consistent within a given histogram. However, they change from
histogram to histogram, depending on the concentration range of any given
analyte.
To determine the width of a range, simply subtract two consecutive
midpoints from each other. The ranges are defined in such a way that the
lower end concentration is included while the upper end is excluded. Using
2,3,7,8-TCDD as an example, there are approximately 43% of the subjects with
concentrations above detection limit but below 10 pg/g; about 42% of the
subjects have concentrations equal to or greater than 10 pg/g but less than
20 pg/g, etc.
A-2
-------
Q
O
00
i>
o
"S
~Q)
&5
in
m
in
o
CM
m
m
en
m
- en
CO
- Q
CP
Q.
D
0)
C
O
O
C
o
o
(s^osfqns
A-3
-------
Q I
O g
u •
o
0)
•+•>
0)
00
C\2
in
- o
in
en
in
oo
in
in en
CD \
D)
Q.
in
m
m
m
- Q
D
0)
c
£
'+->
D
-*-1
o
c
o
o
m .9-
-------
Q I
I
1
a.
o
o>
c
o
c
0)
o
c
o
o
V
0)
- Q
Z
O
00
0000
CD in "t K)
o
-------
p I
0 o
o
Q)
00
^
^H
CO
m
in
m
CM
o
CM
in
m
m
h o
in
O)
in
oo
in
in
CO
in
m
m
•<*•
in
in
(N
m
o>
\
o>
Q.
D
0)
C
O
c
0)
o
c
o
o
QL
'3
(s^osfqns
A- 6
-------
gl
o §
00
I
TJ
*
«
o
D>
O
2
-«->
c
o
O x
(s^osfqns
A-7
-------
D
Is
o ^-
I «
o
o
S 6
.9-E
(O
CM
CM
CM
CM
O 00 (O
CM *- T-
CM
oo
co
(N
(s}osfqns
A-8
-------
Dl
CT
a
D
0)
O
-t-1
c
0)
o
c
o
o
;o
'a.
o
to
o
A-9
-------
HH
00
CD
CO
c
tO
+-<
o
0)
TJ
g
CM
h I
r O)
- LO
to
I- Q
Q.
"o
0
C
O
c
0)
o
c
o
o
o
m
o
-------
o
l-i-t
I-H
c
_o
'->->
o
CD
•+->
CD
oo I
JO
a>
CO
r-.
CN
in
ro
en
o>
o.
D
CO
-i-j
_c
c
o
c
0)
o
c
o
o
;g
'a.
o>
o"
CO
a
fc?
o
o
o
en
o
03
o
ID
O
m
o
K)
O
CM
(s^osfqns I_Q i ^o)
A-ll
-------
Q
Q
00 §
o
0)
CO •§
co
02
Q)
^ KO
00 fe
CO
o
o
in
A
m
(- CM
m
m
CM
in
r^
CM
m
CM
CM
m
CM
m
in
CM
o>
CD
Q.
o
Q)
o>
C o>
Q.
"o to
*~ .e-E
m
•<*•
m
to
o
to
in
CM
o
CM
m
m
(s^osfqns /.6L
A-12
-------
o>
\
o>
Q.
C a>
Q.
O
-------
PH
Q
c
HH .0
71
00
CO I
CO g
C\2 ~
O
ro
A
iO
- (N
in
in
>- O
in
O)
£ o.
in
m
CO
m
m
m
in
ro
m
CN
m
D
O
O>
00
CD
(N
(N
-------
O I
h-H .^
PH -5
I "
I d)
-o
oo
CO
I
10
in
in
ro
in
oj
in
C31
o.
o
0)
c
o
0)
o
c
o
o
;g
'a.
in
CD
m
- Q
z:
o
CO
o
m
o
ro
o
CM
(s}osfqns
A-15
-------
Q
HH .2
M—I •+-•
, °
0)
1 0
CD '
CD »
C^
^H
O
O
O
A
O
O
O)
O
in
CO
o
o
CO
o
m
o
o
o
m
CO
o
o
CO
o
o
m
o
m
o
o
o
m
o
o
o
10
CM
o
o
CM
o
in
o
o
o
m
en
\
u>
o.
o _
m o
m >
^ 0)
Q.
o r^
CM
o
oo
o
o
28
a. c
^ CM O
CM CM CM
00
CO
CM
00
CD
CM
A-16
-------
Q
c
o
l
0)
.Q
00
-------
c
1
0)
o
m
CM
in
A
o
o
o
in
o
o
in
o
o
o
o
o
in
o
o
o
o
o
m
CM
o
o
8 .s
o
o
m
en
0>
a.
o
0) 0>
•^3 O
O 00
tt>
0
O
28
o
o
o
o
- o
m
in
to
o
to
lO
(N
O
CM
m
in
(s^osfqns
L J-°)
A- 18
-------
50272-101
REPORT DOCUMENTATION
PAGE
1. REPORT NO.
EPA-560/5-90-006
3. Recipient's Accession No.
4. Title and Subtitle
Pattern Recognition Analysis of VA/EPA PCDD and PCDF Data
5. Report Date
September 1990
7. Authors)
Karin M. Bauer and John S. Stanley
8. Performing Organization Rept. No.
8863-A
9. Performing Organization Name and Addre
Midwest Research
Institute
425 Volker Boulevard
Kansas City, MO 64110
10. ProJect/Task/Wortc Unit No.
Work Assignment 25
11. Contraet(C) or Qrant(G) No.
-------
------- |