EPA-560/1-77-001
MODELS FOR BIOCHEMICAL TOXICITY
PREPARED FOR
ENVIRONMENTAL PROTECTION AGENCY
OFFICE OF TOXIC SUBSTANCES
WASHINGTON, D,C, 20460
MARCH 1977
-------
EPA-560/1-77-001
MODELS FOR BIOCHEMICALS TOXICITY
Subcontract Report
by
Kurt Enslein
Contract No. 68-01-2657
Environmental Protection Agency
Office of Toxic Substances
Washington, D.C. 20460
-------
NOTICE
The report has been reviewed by the Office of Toxic Substances,
EPA, and approved for publication. Approval does not signify
that the contents necessarily reflects the views and policies
of the Environmental Protection Agency, nor does mention of
trade names or commercial products constitute endorsement or
recommendation for use.
-------
TABLE OF CONTENTS
Page
1. Purpose of This Project 2
2. Data Base 2
3. Chronology of Data Analysis and Model
Building 2
3. s Leadin to Final Results 10
4. Discussion and Conclusions 13
5. Suggestions for the Future 14
Appendix
-------
1. Purpose of This Project
The work that Genesee was to carry out under this project had as
its overall goal the investigation of uses of multivariate techniques
applied to a data base to predict toxicity of chemical compounds. For
this purpose various multivariate techniques such as clustering, multiple
regression, multiple discriminant analysis and allied methods were to be
used.
The overall goal was divided into two subgoals:
a) to derive a model for the prediction of toxicity on a continuous
scale, i.e. via regression equations and,
b) to be able to classify compounds into upper and lower quartiles
of toxicity via discriminant equations.
2. Data Base
The data base consisted of 686 compounds. The data for each compound
consisted of the LD5Q (C) for rat and mouse, the log of the partition
coefficient (log P), 421 fragment keys and the chemical formula of the
compound. In fact, only 549 compounds had all the items required for
pur analysis. These items were rat toxicity, partition coefficient,
fragment keys, and the chemical formula. In the description to follow
it was these 549 compounds which were used. Toxicity was first
converted to log 1/C. Later, log (1000M/C), (M=Molecular weight) was
used for toxicity. Appendix A shows various listings of these data.
3. Chronology of Daa Analysis and Model Building
Rather than simply describing the results of the analyses and model
building as though they had been picked out of thin air, in this section
we will describe the logical steps through which we preceded to arrive
at the results which will be shown at the end of this section.
a) We started off by calculating simple statistics for all 686
compounds on log P and log 1/C in order to be able to determine
upper and lower quartiles. The results are shown in Table 1.
-------
Table 1
Simple Statistics on 2 Parameters
VaViable Nr of Mean Standard Win Max Coeff. of Skew Kurtosis
Observations Error variation
453 1.562 .o87 -10.0 7.3 1.295 -.865 6.6
logioO/C) 592 -2.819 .026 -3.9640.0 -.228 1.111 4.4
It can be seen that log P was only available for 543 compounds, and
that it was not too well distributed, (later it was discovered that by
correcting a format error, in fact we had log P for 549 compounds).
b) Based on the data in (a) two groups called HI and LO were formed
as follows:
HI: 118 compounds with loglfl (1/C) >_ - 2.45
LO: 145 compounds with logio (1/C) _< - 3.3
c) A stepwise discriminant analysis 'was then performed on these two
groups, with the following results:
Number of compounds Classified as
LO HI
123 22
Actual L0
Classes HI
35 83
For this discriminant analysis only keys with frequency of
occurence >10 were given a chance; molecular weight, log P or
2
(log P) were not given a chance. Only 61 variables including some
cross-product terms were allowed to enter the discriminant function,
of which 18 actually entered the function.
d) A discriminant analysis was then performed on the two groups
without using cross-terms with the following results:
-------
Actual LO
Classes HI
Numbers of compounds classified as
. LO HI
124 21
35 83
Note that there is very little difference in the accuracy of
classification when the cross-products terms'are not allowed to
enter the equation.
e) Since the discriminant analysis approach did not seem to result
in what we considered an acceptable classification we reverted
to a continuous variable approach and therefore performed a
stepwise regression ^ ' on 543 compounds for which log P and
LDgQ in rats existed, with 101 keys having frequency >_ 10. Results
are shown below:
R = .5941
R2 = .3529
S.E. = .536 (Standard error of estimate)
S.D. = .518 (Standard deviation of residuals).
NV = 36 (Number of variables in equation).
From an examination of the residuals this result showed particularly
that reasonable regressions should be possible and therefore also
discriminant analyses.
f) The fragment set was now increased from 101 to 134 keys by allowing
those which had a frequency 21 7 to be included instead of 10. The
HI and LO groups as shown below were used:
HI = 126 Compounds
LO = 145 Compounds
The additional 8 compounds in the HI group were due to the resolution
of some technical problems in reading of the data files.
-------
'(3)
g) Each of the two groups was clustered independently via ISOGEfT
with the results shown below:
Group Cluster # # of Compounds Discarded Compounds
Army # Formula
LO 1 105 132124
HIM .1 98
2 19
3 7
124200
LO 3 Army # Formula
121454 CioH 3N2Na308
Note that three compounds were discarded as being too far removed
from the clusters. One also wonders about cluster 3 in the LO group.
h) A stepwise discriminant analysis was then performed on the LO and HI
classes after removal of the three outliers. The variable set
o
included log P and (log P) , with the following result:
Number of comoounds classified as
. LO
Actual LO
Classes HI
132
33
HI
12
91
The classification table still showed no substantial improvement.
This indicated thiat the removal of the outliers did not materially
'influence the discriminant functions.
j) In view of the fact that no progress seemed to be possible
by dealing with the upper and lower quartiles separately, we now
clustered the HI and LO groups together via ISOGEN with the
following result:
-------
Cluster # # of Compounds Army # Formula
1 193
2 13
3 60
4 2 121454 CioH 13N2Na308
121471 CioH lltN2Na208
The objects in the largest cluster (cluster 1) were divided into
HI and LO groups. A stepwise discriminant was then performed. The
results from this discriminant did not show improvement over the
previous analyses. Therefore this path of investigation was
discarded as a dead end.
k) Up to this point we had not used molecular weight in our analysis.
Molecular weight was now calculated and the toxicity values
adjusted by the following formula:
TOXN = log 10 (1000 M/C) . ' .
where M = molecular weight
C = LD50 dose in mg.
The following atomic weights were used for the elements:
Table 2
H 1.008
G 12.011
N 14.0067
0 15.9994
F 18.9984
Na 22.9898
P 30.9738
S 32.06
Cl 35.453
As 74.9216
Br 79.904
I 126.9045
1) We now decided to deal with the entire set of compounds rather than
with the upper and lower quartiles. Due to the number of data
elements it was not possible to conveniently use all compounds simulta-
neously. Thus, 142 compounds were randomly selected and clustered
with 137 variables, after scaling all variables to lie in the
-------
in the range of approximately 0 to 1. Included in the variables
2
were 134 keys, molecular weight, log P and (log P) .. The following
clusters resulted:
Clusters # # of Compounds
1 129
2 6
3 2
4 4
It turned out that 15 keys did not discriminate among the 4 clusters.
These keys are shown in Table 3.
Table 3
15 keys that were not used
EC4=0
FG181
Fg223R
HR12E
HR3R
SCN103
SCN71
GCN2=3
6CN2=6,6
GCN2= 6,6,7
GCN3= C3N1S1
GCN4=C7N1S1
GCN4=C901
GCN4=C10
GCN5=5
m) We now could use a larger number of compounds due to the removal
of the 15 keys. This time we clustered a randomly selected set
of 305 compounds with the following results:
-------
Cluster #
1
2
n)
# of Compounds
297
7
Discarded Compounds
Army # Formula
Army # Formula
C3H5C10
C2H5N
2 7 100086
f 100162
107821
110721
117842
119356
126407
A stepwise regression was now performed on cluster 1 using the
normalized toxicity as the dependent variable. 122 candidate
variables were used with the following results:
R
2
IT =
S.E. =
S.D. =
.7935
.629
.539
.500
NV
41
p) From an examination of the residuals, it appeared that the high
toxicity compounds were introducing undue amounts of residual and
were influencing the fit disproportionately. -As a result, all
compounds with TOXN _> 4 were set aside and a regression on 285
compounds performed with the following results:
R2 =
R^ =
S.E. =
S.D. =
NV =
.7040
.496
.462
.437
30
-------
p) Examination of the residuals again revealed that the TOXN 'selection
threshold should be lowered further. At this.step it was lowered
to 3.5. A regression was again performed, this time on 267
compounds with the following results:
R2 =
IT =
S.E. =
S.D. =
NV =
.7473
.559
.363
.336
38
The regression equation is shown in Table 4.
Table 4
Regression Equation and Statistics for Step (p)
Variable
MW
F6120
GCN5=0
FG80
HR12R
GCN5=6
FG51R
FG96
DACN=2
EC1=0
GCN1=3
FG94
FG117
GCN5=2
FG35R
HR2ER
FG86R
FG112
FG85
SCN49
GCN3=C201
GCN4=C4N2
FG96R
NCN=3
GCN4=C501
LP .
DACN=3
FG24
EC1=1
Coefficient
.266E-02
.693E+00
.795E+00
-.305E+00
-.934E+00
.807E+00
.448E+00
-.302E+00
.186E+00
.605E+00
-.685E+00
-.247E+00
.308E+00
.650E+00
-.415E+00
-.293E+00
-.443E+00
.257E+00
-.345E+00
-.699E+00
-.137E+01
-.730E+00
-.313E+00
-.570E+00
.804E+00
-.502E-02
.156E+00
.308E+00
.392E+00
Std. Error
.001
.161
.203
.088
.293
.254
.147
.106
.066
.222
.270
.098
.125
.266
.179
.127
.194
.113
.156
.311
.664
.368
.162
.291
.413
.003
.088
.174
.233
38.
18.
15.
10,
10.
12.0
9.29
8.11
7.
7.
6.
6.
6.
5.
5.
5.
5.
5.
4.
4.
4.
3.
3.
3.
3.
.87
,41
.45
.40
.03
.98
.41
,28
,22
,20
.89
.45
.26
,93
.86
.85
,79
3.17
3.13
3.13
2.84
-------
Table 4 Cont'd
Variable
A-C=3
FG122
FG82R
FG85R
DACN=0
HR4E
FG143
HR1E
GCN1=2
Constant
r)
on Equation and
Coefficient
.287E+00
.392E+00
. 349E+00
-.290E+00
-.300E+00
-.356E+00
.181E+00
.904E-01
.145E+00
Statisti
Std.
.173
.238
.223
.187
.196
.241
.126
.066
.118
cs for
Error
2.
2.
2.
2.
2.
2.
2,
1,
75
72
44
40
34
18
05
86
1.52
.112E+01
In order to determine whether cross terms would improve the
regression such a regression was now performed on 297 compounds,
without using a threshold for removal of the more toxic com-
pounds with the following results:
R? =
R =
S.E. =
S.D. =
NV =
.800
.640
.521
.492
32
s) From the previous few regressions we observed that some keys
occurred only rarely and contributed unduly to the regression.
We thought that separating the problem into pieces might be
beneficial. As a result the set of compounds were split into
three classes:
Class I: those compounds in which a rare key does not
occur (rare being defined: with frequency <7).
Class II: those compounds with at least onerare key.
Class III: the outliers i.e. mostly those compounds
with high toxicity.
- 10 -
-------
The thought then was to seperate Class III from Classes I and II
by discriminant analysis and then have separate regression equations
for each of the other two classes.
t) A series of Class I regressions were now performed in order to
arrive at as robust a structure as possible. The best comprise
result from regression is shown below with the coefficients in
Table 5.
R2 =
FT =
S.E. =
S.D. =
NV =
.6277
.394
.442
.432
14
Table 5
"Best" Compromise Class I Regression Equation
Variable
FG112.
MW
FG120
HR1R
NCN=0
FG83
GCN5=3
FG268R
FG96
A-C=0
GCN4=C2N
(Log P)2
FG96R
FG144
Coefficient
.783+00
.302-02
.318+00
.336+00
.143+01
.212+00
-.257+00
.198+00
-.214+00
.109+01
-.742+00
-.878-02
-.209+00
.162+00
Std. Error
.116
.000
.093
.102
.483
.087
.107
.084
.094
.501
.455
.055
.149
.122
45.9
45.4
11.7
10.9
8.73
5.95
5.80
5.
5.
,53
,16
4.77
2.66
2.60
1.96
1.76
Constant
,188+01
-------
u) Class II regressions. Similarly a series of regression were
performed for Class II compounds with the best compromise results
shown below and in Table 6.
^2
IT =
S.E. =
S.D. =
NV =
.6922
.479
.490
.466
21
Table 6
"Best" Compromise Class II Regression Equation
Variable
MW
FG80
Log P
FG154R
FG51R
FG120
GCN2=3
GCN4=C5N1
GCN5=6
HR1E
FG34R
DACN=6
GCN5=5
FG120R
GCN3=C5
FG24R
HR2ER
GCN2=5
GCN4=C501
FG1 78R
Coefficient
.217-02
.617+00
-.782-01
.618+00
.634+00
.430+00
. 524+00
.647+00
.415+00
.219+00
-.545+00
.471+00
. 598+00
-.341+00
.463+00
. 346+00
-.216+00
-2236+00
-.401+00
-.209+00
Std. Error
.001
.151
.019
.185
.190
.135
.198
.247
.173
.092
.230
.202
.289
.189
.258
.226
.156
.171
.291
.157
22.5
16.8
16.
11.
11.
7,
6.
10.2
00
85
5.77
5.
5.
.68
,64
4.27
4.27
,24
,22
,35
.92
.91
1.91
1.89
3.
3.
2.
1.
1,
Constant
.205+01
-------
v) Finally a stepwise discriminant analysis was performed to
separate Class II from Classes I and II combined. The results
are shown below with the equation in Table 7.
Compounds classified into Classes
I + II III
Actual I + II 510 10
Classes III 9 14
Table 7
Discriminant Equations for Classes (I+II) vs. Ill
Variable Class (I+II) Class III £
FG205 1.24 43.3 82.1
GCN3=C5 .983 18.9 65.6
FG29R 1.24 43.3 41.2
SCN107 .095 38.4 33.6
SCN84 .095 38.4 33.6
SCN45 .727 13.1 30.8
GCN4=C3 1.15 22.3 20.5
HRIR 1.24 4.90 15.9
FG118 .852 13.5 11.0
FG34 .852 5.30 10.8
FG112 1.16 5.30 10.8
FG231R -.488 14.7 10.2
Constants -.140
From the classification table it is clear that a substantial number
of false positives but more importantly an almost equal number of
false negatives would be detected by this scheme. It would be
possible in future work to adjust the discriminant equations so as to
minimize the false negative problem at the expense, of course, of an
increase in the false positives.
4. Discussion and Conclusions
From the work that has been described it is clear it is possible to devise
a three-step system for calssification and prediction of toxicity of
chemical compounds, at least based on those compounds that were included
in the sample with which we worked. The system would consist of first
separating off the potentially highly toxic compounds (Class III) via the
-------
discriminant equations, separating the remaining compounds unto two
classes depending upon the presence or absence of rare keys and then
predicting toxicity via the application of the appropriate regression
equation. The sample with which we worked had relatively few highly
toxic compounds and this is one of the reasons why the discrimination
between Class III and Classes I + II did not result in as satisfactory a
result as one would wish. However, it is clear that the great bulk of
compounds can be separated out in this fashion.
It has also become quite evident in this work that many explanatory
variables were not included among the set of parameters that were
available for analysis. The major reason why the correlation coefficients
in regression were not as large as desired must be attributed to this
fact. It is of course difficult to stipulate just what these features
should be At the very least however, one should consider some further
physical constants and some further steric constants. This view if
reinforced by that fact that molecular weight was the most important
variable in most of the regressions in which it was given a chance to be
included. This probably means that molecular weight is a summary variable
for many other variables amd might provide a lead as to what other
features to include in the future.
Our conclusions from this analysis are that statistical processes can be
effectively used to predict toxicity of chemical compounds.
We found it interesting that clustering did not contribute materially to
the derivation of a solution to the classification problem. This may
have been a reflection of the relative homogeneity of the sample.
5. Suggestions for the Future
We believe that further work could usefully be performed applying
the same techniques of clustering, regression and discriminant
analysis to a larger set of data, particularly including a larger
-------
number of relatively toxic compounds, that other features should be
included in any further data sets and that a relatively simple on-
line system could be designed to implement the classification and
prediction system which we have outlined. It would also be possible
to extend these concepts to more specific types of toxicity such
as carcinogen!city, mutagenicity etc.
-------
REFERENCES
1. R. Jehnrich, "Stepwise Discriminant Analysis" in Statistical
Methods for Digital Computers, K. Enslein, A. Ralston, H. Wilf,
eds., Wiley (in Press).
2. R. Jennrich, "Stepwise Regression", as in Ref. 1.
3. Statistical Computer Programs, Genesee Computer Center, Inc.,
Rochester, NY.
-------
APPENDIX
-------
List of Fragment Kevs
ICY rCRMULA FRIC Cr ZZ~ .
T
-1
C.
7
W
4
J
5
7
3
Q
10
11
12
13
14
15
15
17
13
13
20
21
22
23
24
25
.25
27
23
23
30
31
32
33
34
35
35
37
38
33
40
41
42
43
44
45
45
47
43
45
50
51
52
53
54
55
55
57
53
1
2
3
4
5
5
7
3
3
10
11
-. 12
13
14
15
IS
17
13
IS
20
21
22-
23
24
25
25
27
23 .
23
30
31
32
33
34
35
' 35
37
33
39
40
41
42
43
44
45
45
47
43
43
50
51
. 52
53
54
55
56
57
' 53
A-Crc
A-C = 1
A-CZ2
A-C-3
A-C-4
A-CrS
A-C = 3
A-C = 3
DACN=0
OACN=1
CACN=1
DAC,\ = 1
DACN=2
CACN=3
CACN=4
DACN=5
OACN=S
CACN=7
DACN=8
3AC;-J=9
ZC1 = Q
EC1=1
EC1=2
iC2ro
cC2 = l
EC3 = 0
EC3=1
EC3=2
EC4 = 0
EC4rl
NCN = 0
NCN = I
NCN=2
NC.N=3
' NCN=5
FG101
FG101R
FG103
FG109
FG112
FG112R
FG113
FG115
FG115R
FG117
FG117R
FG113
FG119
FG112R
FG120
FG120R
FG121
FG122
FG123
FG125R
FG12SR
FG130
FG130R
27C
-23
o c
11
i 3
103
1
4
^
137
ac
77
27
23
o
5
2
G13
5S
6
579
7
554
27
C
w
534
2
212
35S
105
17
^
^
1
1
1
5
48
2
j
10
2
21
3
5
7
3
55
15
1
10
2
1
1
1
-------
so
51
- 1
S3
54
55
55
57
. S3
S3
70
71
72
73
74
75
75
77
73
73 .
30
31
32
33
84
35
35
37
33
33
' 30
91
92
93
94
95
95
97
-93
39
100
101
102
103
104
105
10S
107
103
109
110
111
112
113
114
115
115
117
118
119
sc
51
S2
G 3
54
55
65
57
S3
S3
70
71
72
73
74
75
75
77
73
73
30
81
32
33
34
35
35
37
33
39
30
91
92
93
.34
35
95
37
. 98
99
' 100
101
102
103 '
104
105
105
107
103
109
110
111
112
113
114
115.
115
117
113
113
FG 1 31 R
-G123
FG133
FG1Z5.R
FGiZo
FG143
FG143R
FG144
FG144R
FG145
FG145R
FG14S
FG147
FG147R
FG150R
FG151
FG154
FG154R
FG155
FG157R
FG153R
FG1S7
FG157R
FG1SS
FG172R
FG174R
FG173
FG173R
FG181
FG187
FG139
FG205
FG207
FG207R
FG203R
FG217
FG213
FG220
FG223
FG223R
FG231
FG231R
FG222
FG24
FG24R
FG245
FG245R
FG245
FG24SR
FG248
FG248R
FG251
FG2E8
FG2S3R
FG23R
FG3R
FG32
FG32R'
FG34
FG34R
27
13
7
4
3
i
7
w
3
43
+
.4
8
3
1
2
1
13
84
10
2
^
J.
5
1
1
X
2
2
2
1
1
7
3
2
3
3
5
2
4
4
4
1
1
4
3C
241
2
1
1
1
3
-------
123 120 FG35 7
121 121 FC-35R 13
11
>
122
123
124
125
125
127
123
129
130
131
132
133
134
135
135
137
133
133
140
141
142
143
144
145.
145
147
143
143
1.50
1:51
152
1.53
154
155
156
157
153
159
ISO .
151
152
153
154
155
165
157
153
159
170
171
172
173 .
174
175
176
177
173
179
122
122
124
125
125
127
123
123
130
131
132
133
134
135
135
'137
133
133
140
141
142
143
144.
145
145
147
148
149
.150
151
152
153
154
155
156
157
158.
159
160
161
162
163
154
165
156
157
153
159
170
171
172
173
174
175.
176
177
173
173
^3-:
F336R
FG37
FG37R
FG40R
FG41
FG44
FG47
FG51
FG51R
FG54
FG35
FG56R
FG57
FG51
FGSSR
FGS7R
FG5S
FG74
FG75
FG75R
FG75
FG3C
FG30R
FG31
FG31R
FG32
. FG32R
FG33
FG33R
FG34
FG35
FG35R-
FG3S
FG8SR
F337
FGSSR
FG33
FG3
FG92
FG92R
FG94
FG94R
FG95
FG96
FG9SR
FG93
FG99
HRIE
HR1R
HRlOc
HR10R
HRUE:
H R 1 1 R
' HR12E
HR12R
HR13R
HR14E
2
4
1
1
5
4
a
2
47
5
2
IS
2
3
38
2
1
13
7
9
13
95
1
1
1
3
1
61
19
1
44
25
' 5
1
265
113
7
. 2
31
7
10
3
4
9
-------
130
1 21
132
133
134
135
1 So
137
133
133
130
131
192
133
134
125
195
197
133
133
2QG
201
202
203
204
205
205
207
.203
209
210
211
212
213
2m
215
215
217
213
219
220
221
222
223
224
??5
225
227
223
223
230
231
232
233
234
235
235
237
233
239
180
131
1S2
133
134
135
13S
137
133
139
130
191
132
193-
194
135
135
137
133
133
200
201
2C2
203
204
205
205
207
203
209
210
211
212
213
214
215
215
217
213
219
220
221
222
223
224
225
225
227
223
223
230
231
232
233
234
235
235
-237
233
233
HR14ER
HRisrr
HR151R
KR1SR?
HR17IE
HR17ER1
HR13ICI
HR2ir
HR2ER .
HR2RR
. HR20~
.HR21E:
HR21R
HR22I
HR22R
HR23;
HR23R
HR24EE
HR25E
HR25R
HR25£
HR25R
HR3o
HR3R
HR31Z
HR31R
HR34E!£
HR3SE1
HR4=:
HR4R
HR41R
HR47E
HR53E
HRSEI
HRSER
HR7EE
HR7ER
HR8E:!
HRG23E
HRG23R
HRG33E
HRG33R
HRG42E
HRG42R
HRG43E
HRG54E
ND22
N030
N030R
NC31
WD33
SCN1
SCN102
SCN103
SCN105
SCN107
SCN103
SCN109
SCN111
SCN112
7
j
7
28
o
2 3
1
i
1
1
1
2
19
2
7
3
3
1
5
4
.05
e
3
5
2
3
5
2
1
3
1
3
-------
24C
2m
242
243
244
245
245
247
24E
243
250
251
252
253
254
255
25S
257
258
259
2SO
2S1
2S2
253-
254
2S5
255
257
258
259
27G
271
272
273
27!;' -
27b
275
277
273
273
230
231
282
233
2S4
235
285
237
233
239
290
291
292
293
294
295
295
297
293
299
240
241
242
243
244
245
245
247
243
242
250
251
252
233
254
255
255
257
253
259
250
251
252
. 253
254
255
255
257
253
259
270
271
272
273
274
275
275
277
273
279
230
231
2S2
233
234
235
236
237
233
239
290
291
292
293
294
295
295
297
293
299
3CK119
SCN125
SCK127
SCN130
SCN125
SCN15
SCN17
SCM2
SCN24
SCN2S
SCN27
SCN23
SC.N29
SCN3
SCN34
SCN35
SCN33
SCN40
SCN42
SCN44
SCN45
SCN47
SCN48
SCN49
SCNSO
SCN52
SCN53
SCN5S
^ f* \t ^ ^
.i C iN o o
SCNS9
SCN71
SCN72
SCN73
SCN75
3CN78
3CN79
SCN34
SCN87
SC7432
SCN99
GCN2=3
GCN2=3» 5
3CN2 = 3»5t 5»5 »5
GCN2=3»S
GCN2=5
GCN2=5»5
GCN2=5«5t 5
GCN2=5t 5t 5»5
GCN2=5»5t 7
GCN2=5»S
SCN2=5f S»S
GCN2-5tStS»S
5CN2-5»S»S»5 »S
GCN2=S
GCN2=Sf 5
GCN2=Sf StS
GCN2-6»5f7
GCN2-5.7
SCN2=St7.7
GCM2=7
11
3
2
2
4
4
7
11
3
393
13
c
4
3
1
1
2
7
1
1
3
2
' 3
2
2
24
7
4
2
1
29
o
4
3
452
22
11
13
7
1
3
-------
30G 3GC 3CN'2 = 5 1
301 201 GC,\'3 = C2 SI 2
302 302 GCNZ=C2 Cl 17
303 303 . GCN3=C2 Nl 2
304 ' 304 GCN3=C2 N2 SI 1
305 305 GCN3=C2 N3 SI 4
305 305 GCN3-C2 N4 S2 2
307 307 GCN3=C3 3
303 303 GCN'3rC3 S2 2
30S 302 GCN3=C5 C2 Z
310 310 GCN3=C3 Nl SI 2
311 311 GCN3-C3 Nl Cl - 3
312 . 312 GCN2=C3 N2 13
313 313 GCN3=C3 N2 SI 1
314 314 GCN3-C3 N3 1
315 315 GCN3=C4 SI 2
315 315 GCN3 = C4 Cl .20
317 317 GCN2=C4 C2 SI 1
313 313 GCN33C4 Nl 9
319 319 GCN3=C4 Nl SU 1
320 320 GCN3=C4 Ml Cl 2
321 321 " GCN3=C4 N2 19
322 322 GCN3=C4 N4 1
323 . 323 GCN3=C5 33
324 324 GCN3=C5 Cl 31
325 325 GCN3-C5 Nl 33
325 325 GCN3=C5 Nl SI 2
327 327 GCN3=C5 Nl S2 1
323 323 GCN33C5 N2 3
329 329 GCN3=C6 579
330 330 GCN3=CS 01 . 4
331 331 GCN3rcS Nl 9
332 332 GCN3zC7 13
333 333 GCN3=C7 Cl 3
334 334 GCN3=C7 Nl 1
335 335 GCN3=C3 3
335 335 GCN4=C2 SI 2
337 337 GCN4=C2 Cl . 11
538 333 GCN4 = C2 Nl -2
333 339 GCN4=C3 2
340 340 GCN4=C3 02 2
341 341 GCN4=C3 Nl SI 1
342 342 GCN4=C3 N2 .5
343 343 GCN4=C3 N3 1
344 344 GCN4=C4 SI 2
345 345 GCN4=C4 Cl 12
345 345 GCN4=C4 Nl 1
347 347 GCN4=C4 Nl SI 1
348 343 GCN4=C4 Nl Cl 2
349 349 GCN43C4 N2 15
350 350 GCN4rC4 N4 S2 1
351 351 GCN4=C5 Cl 7
352 . 352 GCN4-C5 Nl 19
353 353 GCN4rC5 Nl S2 1
354 354 GCN4=C5 N4 3
355 355 ' GCN4=CS ' 420
35S 355 GCN4=CS Cl 1
357 357 GCN4=CS Nl ' 3
353 353 GCN4=CS N2 SI 1
359 359 GCN4rC7 2
-------
25G
351
352
353
354
355
3S£
367
3G£
35 =
370
371
372
373
374
375
375
377
373
373
330
331
332
333
334
335
335
337
333
333
390
391
392
393
394
395
396
397
393
399
400
401
402
403
404
405
405
407
408
409
410
411
412
413
414
415
415
417
418
419
350
3S1
352
353
354
' 3£5
3oo
3c7
35S
3£9
370
371
372
373
374
375
375
377
373
373
330
331
382
383
384
385
335 '
337
333 .
339
390
391
392
393
394
335
395
337
398
399
400
401
402
403
404
405
405
407
403
409
410
411
412
413
414
415
415
417
413
419
GCN4=C7 22
GCN4=C7 Nl
GCN4TC7 Nl SI
GCN4rC7 Nl Gl
GCN4=C7 N2
GCN4=C7 N2 SI
GCN4=C8 SI
GCN4ZC8 Cl
GCN4=C8 Nl
GCN4=C9 Cl
GCN4=C9 C2 SI
GCN4rC2 Nl
GCN4=C9 Nl SI
GCN4rC3 N2
GCN«i=C9 N2 S2
GCN4ZC1G
GCN4rCll 02
GCN4=C12
GCN4rC12 01
GCN4ZC12 02
GCN4=C12 Nl
GCN4rC!3
GCN4=C13 01 :
GCN4TC13 Nl
GCN4=C13 Nl SI
GCN4TC13 N2
GCN4=C14
GCM4-C14 Cl
GCU4-C14 Nl
GCN4=C15
. GCN4-C15
GCN4rC17
GCN4-C13 03
GCN4 = C18. N2 01
GCN4=C19 N2
GCNS=1,2
GCNSrl,3
GCN'S = 1 » 3t 5
GCNS=1»4
GCM1=1
GCN1=2
GCN1=3
GC^a = 4
GCN1=5
GCN5=0
GCN5=1
GCN5=2
GCN5=3
GCN5-4
GCN5-5
GCNS^S
3CN5=7
GCN5=3
LIG=C3 Hll N2 03
LIG=C11 H 17 N2 02
LIG=C12 H 15 N2 03
'LIG = C12 H 17 N2 23
ALK
AN
CHAL
3
7
3
4
2
1
c
1
8
2
2
2
1
4
1
o
3
2
3
1
. 1
1
3
13
1
S
507
SS
'41
S
5
74
29
23
422
37
10
22
o
1
1
1
1
1
4
7
4
-------
1?
<421
RICCRSS R
I4D ART
-------
DF c^
KCLECULAP r
KCL.WT
LCG .F ,
I
%
£
5
7
S
s :
10
11
'12
13
14
15
16
17
18
15
20
21
22
23
24
25
2S
2^
...25
2*3
'30
' 31'
32
33
34
35
35
' 37
38
29
<(0
, J» *
' tl ?
? -
''.<» 4
'
1 ,E9
3,20
2 .1C
2.33
Z .OS
2.72
2.31.
1.3E
1 .25
1.53
2.53
3.C5
.1 .£3
2 .60
2.75
3.03
2.33
3.01
2.62
2.73
1 1.-69
.1.51
3.56
1.49
1.51
1.47
2.77
2.47
1.74
1.47
3.04
3.50
3.67
.1.45
1.37
1.51
1.S2
1.47
1 .48
1.33
1.53
2.13
2.17
1 .95
2.25
2.0-8
1.54
2.14
. 2. 92
1.35
2.4S
2.28
2.50
-------
A f? M y T c
MOLECULAR
MCL.WT
LCP.P.
NC3M.T
f
i
£
5
c
7
5
Q
10
.11
12
1 T
14
15
15 .
17
IS
15 .
20
21
22
23
102373
105733
105047
111011
114372
11C073
116420
.118035
122123
122605
126775
128110
128404
123504
128813
123314
122290
131571
132115
135211
142223
1435G1
C10H11N105P1S1
CUH1SCL1C233P1
C17H12Q7
C20H18CL1N1CS
C5H12CL3M1
C2H4F1N1G1
C12H18N2C2
C14H1705P121
C8H1532P1S3
C15K23N1C4.
C20H24CL1N1C3
C4H4C1
C10K17N3C2
C13H19N1C2S1
C2K2trlNAlC2
C12H3CLG
C12H8CLSG1
C15H12N2C3
C14H1SCL106P1
C41H54C13
C4KSF2C2
C12H15N1C3
C1SH22CL1N1C3
7 91
342
328
403
192
77
222
323
274
281
361
S3
211
253
ICO
354
380
268
345
754
124
221
347
.25
.35
.28
.32
.52
.OS
.22
.32
.39
.35
.87
.08
.26
.36
.02
.21
.21
.27
.71
.26
.CS
.2£
.84
2
3
1
1
-1
2
2
1
-
-
3
3
-4
4
4
2
2
1
-
2
-1
* A 3
4 S2
.73
.44
.77
,03
.43
.16
.93
.33
.30
.82
.CC
.30
.00
.50
.37
.3C
.11
.73
.75
.11
.00
5
4
5
4
4
4
4
4
4
5
4
4
4
S
3
4
5
c
4
5
5
4
4
« J. C
.54
.31
.31
.2;
1 T
4 *. -
.2:
.24
.44
.IE
.22
.0]
.2:
.1C
.ES
.7:
.1C
. 8 :
.5!
,8£
OS
.44
.54
- A-30 -
-------
-" TECHNICAL REPORT DATA
t (Please read Insovctions on the reverse before completing)
1. REPORT NO. 2.
EPA-560/1 -77-001
4. TITLE AND SUBTITLE
Models for Biochemical Toxicity
"^ AUTHOR(S)
Kurt Enslein
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Genesee Computer Center, Inc., Rochester, NY 14
for: The Franklin Institute Research Labs
Philadelphia, PA 19103
12. SPONSORING AGENCY NAME AND ADDRESS
Office of Toxic Substances
Environmental Protection Agency
15. SUPPLEMENTARY NOTES
3. RECIPIENT'S ACCESSION-NO.
5. REPORT DATE
February 1976
6. PERFORMING ORGANIZATION CODE
8. PERFORMING ORGANIZATION REPORT
10. PROGRAM ELEMENT NO.
^605
NO.
11. CONTRACT/GRANT NO. '
68-01-2657
13. TYPE OF REPORT AND PERIOD COVERED
^iihrnntrart rpoort
14. SPONSORING AGENCY 'CODE
16. ABSTRACT
Multivariate techniques of data analysis were applied to a data base of
549 chemical compounds. The techniques used included multiple regression
and multiple discriminant analysis.
17. KEY WORDS AND DOCUMENT ANALYSIS
a. DESCRIPTORS b.lDENTIFI
multiple regression
stepwise discriminant analysis
18. -DISTRIBUTION STATEMENT 19. SECURI
Release Unlimited
20. SECURI
j
ERS/OPEN ENDED TERMS C. COS AT I Field/Group
06/20
06/04
12/01
TY CLASS (This Report) 21. NO. OF PAGES
TY CLASS (This page) 22. PRICE
EPA Form 2220-1 (9-73)
-------
INSTRUCTIONS
1. REPORT NUMBER
Insert the EPA report number as it appears on the cover of the publication.
2. LEAVE BLANK
3. RECIPIENTS ACCESSION NUMBER
Reserved for use by each report recipient.
4. TITLE AND SUBTITLE
Title should indicate clearly and briefly the subject coverage of the report, and be displayed prominently. Set subtitle, if used, in smaller
type or otherwise subordinate it to main title. When a report is prepared in more than one volume, repeat the primary title, add volume
number and include subtitle for the specific title.
5. REPORT DATE . '
Each report shall carry a date indicating at least month and year. Indicate the basis on which it was selected (e.g., date of issue, date of
approval, date of preparation, etc.).
6. PERFORMING ORGANIZATION CODE
Leave blank.
7. AUTHOR(S)
Give name(s) in conventional order (John R. Doe, 1. Robert Doe, etc.}. List author's affiliation if it differs from the performing organi-
zation.
8. PERFORMING ORGANIZATION REPORT NUMBER
Insert if performing organization wishes to assign this number.
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Give name, street, city, state, and ZIP code. List no more than two levels of an organizational hirearchy. ,
10. PROGRAM ELEMENT NUMBER
Use the program element number under which the report was prepared. Subordinate numbers may be included in parentheses.
11. CONTRACT/GRANT NUMBER
Insert contract or grant number under which report was prepared.
12. SPONSORING AGENCY NAME AND ADDRESS
Include ZIP code.
13. TYPE OF REPORT AND PERIOD COVERED
Indicate interim final, etc., and if applicable, dates covered.
14. SPONSORING AGENCY CODE
Leave blank.
15. SUPPLEMENTARY NpTES
Enter information not included elsewhere but useful, such as: Prepared in cooperation with, Translation of, Presented at conference of,
To be published in, Supersedes, Supplements, etc.
16. ABSTRACT , '
Include a brief (200 words or less) factual summary of the most significant information contained in the report. If the report contains a
significant bibliography or literature survey, mention it here.
17. KEY WORDS AND DOCUMENT ANALYSIS
(a) DESCRIPTORS - Select from the Thesaurus of Engineering and Scientific Terms the proper authorized terms that identify the major
concept of the research and are sufficiently specific and precise to be used as index entries for cataloging.
(b) IDENTIFIERS AND OPEN-ENDED TERMS - Use identifiers for project names, code names, equipment designators, etc. Use open-
ended terms written in descriptor form for those subjects for which no descriptor exists.
(c) COSATI FIELD GROUP - Field and group assignments are to be taken from the 1965 COS ATI Subject Category List. Since the ma-
jority of documents are multidisciplinary in nature, the Primary Field/Group assignment(s) will be specific discipline, area of human
endeavor, or type of physical object. The application(s) will be cross-referenced with secondary Field/Group assignments that will follow
the primary posting(s).
18. DISTRIBUTION STATEMENT
Denote reusability to the public or limitation for reasons other than security for example "Release Unlimited." Cite any availability to
the public, with address and price. / -
19. & 20. SECURITY CLASSIFICATION
DO NOT submit classified reports to the National Technical Information service.
21. NUMBER OF PAGES
Insert the total number of pages, including this one and unnumbered pages, but exclude distribution list, if any.
22. PRICE
Insert the price set by the National Technical Information Service or the Government Printing Office, if known.
EPA Form 2220-1 (9-73) (Reverie)
------- |