EPA-560/1-77-001
               MODELS FOR BIOCHEMICAL TOXICITY
                        PREPARED FOR


               ENVIRONMENTAL PROTECTION AGENCY
                 OFFICE OF TOXIC SUBSTANCES
                   WASHINGTON, D,C,  20460
                         MARCH 1977

-------
EPA-560/1-77-001
                   MODELS FOR BIOCHEMICALS TOXICITY
                          Subcontract Report


                                  by


                             Kurt Enslein



                        Contract No. 68-01-2657
                    Environmental Protection Agency
                      Office of Toxic Substances
                        Washington, D.C.  20460

-------
                          NOTICE
The report has been reviewed by the Office of Toxic Substances,
EPA, and approved for publication.  Approval does not signify
that the contents necessarily reflects the views and policies
of the Environmental Protection Agency, nor does mention of
trade names or commercial products constitute endorsement or
recommendation for use.

-------
                      TABLE OF CONTENTS

                                                  Page

1.    Purpose of This Project                      2
2.    Data Base                                    2
3.    Chronology of Data Analysis and Model
     Building                                     2
3.  s Leadin to Final Results  •                   10
4.    Discussion and Conclusions                  13
5.    Suggestions for the Future                  14
     Appendix

-------
1.    Purpose of This Project

     The work that Genesee was to carry out under this project had as
its overall goal the investigation of uses of multivariate techniques
applied to a data base to predict toxicity of chemical compounds.   For
this purpose various multivariate techniques such as clustering,  multiple
regression, multiple discriminant analysis and allied methods were to  be
used.

The overall goal was divided into two subgoals:
a)    to derive a model for the prediction of toxicity on a continuous
     scale, i.e. via regression equations and,
b)    to be able to classify compounds into upper and lower quartiles
     of toxicity via discriminant equations.

2.    Data Base
     The data base consisted of 686 compounds.  The data for each compound
     consisted of the LD5Q (C) for rat and mouse, the log of the partition
     coefficient (log P), 421 fragment keys and the chemical formula of the
     compound.  In fact, only 549 compounds had all the items required for
     pur analysis.  These items were rat toxicity, partition coefficient,
     fragment keys, and the chemical formula.  In the description to follow
     it was these 549 compounds which were used.  Toxicity was first
     converted to log 1/C.  Later, log (1000M/C), (M=Molecular weight) was
     used for toxicity.  Appendix A shows various listings of these data.

3.    Chronology of Daa Analysis and Model Building
     Rather than simply describing the results of the analyses and model
     building as though they had been picked out of thin air, in this  section
     we will describe the logical steps through which we preceded to arrive
     at the results which will be shown at the end of this section.

     a)   We started off by calculating simple statistics for all 686
          compounds on log P and log 1/C in order to be able to determine
          upper and lower quartiles.  The results are shown in Table 1.

-------
                                       Table 1
                          Simple Statistics on  2 Parameters
VaViable       Nr of          Mean      Standard  Win  Max  Coeff.  of Skew      Kurtosis
               Observations             Error               variation
               453            1.562     .o87    -10.0  7.3  1.295     -.865           6.6
logioO/C)     592           -2.819     .026    -3.9640.0  -.228     1.111           4.4


                 It can be seen that log P was only available for 543 compounds,  and
            that it was not too well distributed, (later it was discovered that by
            correcting a format error, in fact we had log P for 549 compounds).

            b)   Based on the data in (a) two groups called HI and LO were formed
                 as follows:
                      HI:  118 compounds with loglfl (1/C) >_ - 2.45
                      LO:  145 compounds with logio (1/C) _< - 3.3
                                          •
            c)   A stepwise discriminant analysis   'was then performed on these two
                 groups, with the following results:
                                     Number of compounds Classified as
                                               LO        HI
                                              123        22
Actual  L0
                 Classes HI
                              35        83
                 For this discriminant analysis only keys with frequency of
                 occurence >10 were given a chance; molecular weight, log P or
                         2
                  (log P)  were not given a chance.  Only 61 variables including some
                 cross-product terms were allowed to enter the discriminant function,
                 of which 18 actually entered the function.
            d)   A discriminant analysis was then performed on the two groups
                 without using cross-terms with the following results:

-------
Actual LO
Classes HI
Numbers of compounds classified as

           . LO        HI
           124        21
            35        83
     Note that there is very little difference in the accuracy of
     classification when the cross-products terms'are not allowed to
     enter the equation.

e)   Since the discriminant analysis approach did not seem to result
     in what we considered an acceptable classification we reverted
     to a continuous variable approach and therefore performed a
     stepwise regression ^ ' on 543 compounds for which log P and
     LDgQ in rats existed, with 101 keys having frequency >_ 10.  Results
     are shown below:

          R    =     .5941
          R2   =     .3529
          S.E. =     .536      (Standard error of estimate)
          S.D. =     .518      (Standard deviation of residuals).
          NV   =     36        (Number of variables in equation).

     From an examination of the residuals this result showed particularly
     that reasonable regressions should be possible and therefore also
     discriminant analyses.

f)   The fragment set was now increased from 101 to 134 keys by allowing
     those which had a  frequency 21 7 to be included instead of 10.  The
     HI and LO groups as shown below were used:
                     HI   =    126 Compounds
                     LO   =    145 Compounds
     The additional  8 compounds in the HI group were due to the resolution
     of some technical  problems in reading of the data files.

-------
                                                                 '(3)
g)   Each of the two groups was clustered independently via ISOGEfT
     with the results shown below:
Group          Cluster #      # of Compounds      Discarded Compounds
                                                  Army #    Formula
LO             1              105                 132124
HIM         .1               98
               2               19
               3                7
                                                  124200
LO             3              Army #              Formula
                              121454              CioH 3N2Na308
     Note that three compounds were discarded as being too far removed
     from the clusters.  One also wonders about cluster 3 in the LO group.

h)   A stepwise discriminant analysis was then performed on the LO and HI
     classes after removal of the three outliers.  The variable set
                               o
     included log P and  (log P)  , with the following result:

                         Number  of comoounds classified as
. LO
Actual LO
Classes HI
132
33
HI
12
91
      The  classification  table  still  showed no substantial  improvement.
      This indicated  thiat the removal  of the  outliers did  not materially
     'influence  the discriminant  functions.
 j)    In view  of the  fact that  no progress seemed to be possible
      by dealing with the upper and  lower quartiles separately, we now
      clustered  the HI  and LO groups  together via ISOGEN with the
      following  result:

-------
Cluster #      # of Compounds      Army #         Formula

     1              193
     2               13
     3               60
     4                2            121454         CioH 13N2Na308

                                   121471         CioH lltN2Na208


     The objects in the largest cluster (cluster 1)  were divided into

     HI and LO groups.  A stepwise discriminant was  then performed.   The

     results from this discriminant did not show improvement over the

     previous analyses.  Therefore this path of investigation was

     discarded as a dead end.


k)   Up to this point we had not used molecular weight in our analysis.

     Molecular weight was now calculated and the toxicity values

     adjusted by the following formula:

          TOXN      =    log 10  (1000  M/C)       .  '    .

           where  M  =    molecular weight

                  C =    LD50 dose in mg.


     The following atomic weights were used for the elements:


                         Table 2

                    H    1.008
                    G    12.011
                    N    14.0067
                    0    15.9994
                    F    18.9984
                    Na   22.9898
                    P    30.9738
                    S    32.06
                    Cl   35.453
                    As   74.9216
                    Br   79.904
                    I    126.9045

1)   We  now decided to deal with the  entire set of compounds rather than

     with  the upper and lower quartiles.  Due to the number of data

      elements it was not possible  to  conveniently use all compounds simulta-

      neously.   Thus, 142 compounds were  randomly selected and clustered

     with  137 variables, after scaling all variables to lie in the

-------
in the range of approximately 0 to 1.  Included in the variables
                                                  2
were 134 keys, molecular weight, log P and (log P) ..  The following

clusters resulted:


          Clusters #               # of Compounds

               1                        129
               2                          6
               3                          2
               4                          4

It turned out that 15 keys did not discriminate among the 4 clusters.

These keys are shown in Table 3.

                         Table 3

               15 keys that were not used

                    EC4=0
                    FG181
                    Fg223R
                    HR12E
                    HR3R
                    SCN103
                    SCN71
                    GCN2=3
                    6CN2=6,6
                    GCN2= 6,6,7
                    GCN3= C3N1S1
                    GCN4=C7N1S1
                    GCN4=C901
                    GCN4=C10
                    GCN5=5

m)   We now could use a larger number of compounds due to the removal

     of the 15 keys.  This time we clustered a randomly selected set

     of 305 compounds with the following results:

-------
Cluster #
     1
     2
n)
               # of Compounds
                    297
                      7
Discarded Compounds
Army #    Formula
                                             Army #    Formula
                                                       C3H5C10

                                                       C2H5N
2                     7                 100086
                             f          100162
                                        107821
                                        110721
                                        117842
                                        119356
                                        126407
A stepwise regression was now performed on cluster 1  using the
normalized toxicity as the dependent variable.  122 candidate
variables were used with the following results:
R
2
IT =
S.E. =
S.D. =
.7935

.629
.539
.500
          NV
                 41
p)   From an examination of the residuals, it appeared that the high
     toxicity compounds were introducing undue amounts of residual and
     were influencing the fit disproportionately. -As a result, all
     compounds with TOXN _> 4 were set aside and a regression on 285
     compounds performed with the following results:
          R2   =
          R^   =
          S.E. =
          S.D. =
          NV   =
               .7040
               .496
               .462
               .437
                 30

-------
p)   Examination of the residuals again revealed that the TOXN 'selection
     threshold should be lowered further.  At this.step it was lowered
     to 3.5.  A regression was again performed, this time on 267
     compounds with the following results:
R2 =
IT =
S.E. =
S.D. =
NV =
.7473
.559
.363
.336
38
     The regression equation is shown in Table 4.

                           Table 4

       Regression Equation and Statistics for Step (p)
Variable

MW
F6120
GCN5=0
FG80
HR12R
GCN5=6
FG51R
FG96
DACN=2
EC1=0
GCN1=3
FG94
FG117
GCN5=2
FG35R
HR2ER
FG86R
FG112
FG85
SCN49
GCN3=C201
GCN4=C4N2
FG96R
NCN=3
GCN4=C501
LP  .
DACN=3
FG24
EC1=1
 Coefficient
 .266E-02
 .693E+00
 .795E+00
-.305E+00
-.934E+00
 .807E+00
 .448E+00
-.302E+00
 .186E+00
 .605E+00
-.685E+00
-.247E+00
 .308E+00
 .650E+00
-.415E+00
-.293E+00
-.443E+00
 .257E+00
-.345E+00
-.699E+00
-.137E+01
-.730E+00
-.313E+00
-.570E+00
 .804E+00
-.502E-02
 .156E+00
 .308E+00
 .392E+00
Std. Error

     .001
     .161
     .203
     .088
     .293
     .254
     .147
     .106
     .066
     .222
     .270
     .098
     .125
     .266
     .179
     .127
     .194
     .113
     .156
     .311
     .664
     .368
     .162
     .291
     .413
     .003
     .088
     .174
     .233
38.
18.
15.
10,
10.
12.0
 9.29
 8.11
 7.
 7.
 6.
 6.
 6.
 5.
 5.
 5.
 5.
 5.
 4.
 4.
 4.
 3.
 3.
 3.
 3.
  .87
  ,41
  .45
  .40
  .03
  .98
  .41
  ,28
  ,22
  ,20
  .89
  .45
  .26
  ,93
  .86
  .85
  ,79
 3.17
3.13
 3.13
 2.84

-------
                       Table 4 Cont'd
Variable

A-C=3
FG122
FG82R
FG85R
DACN=0
HR4E
FG143
HR1E
GCN1=2

Constant

     r)
on Equation and
Coefficient
.287E+00
•.392E+00
. 349E+00
-.290E+00
-.300E+00
-.356E+00
.181E+00
.904E-01
.145E+00
Statisti
Std.
.173
.238
.223
.187
.196
.241
.126
.066
.118
cs for
Error









                                             2.
                                             2.
                                             2.
                                             2.
                                             2.
                                             2.
                                             2,
                                             1,
75
72
44
40
34
18
05
86
                                             1.52
     .112E+01
In order to determine whether cross terms would improve the
regression such a regression was now performed on 297 compounds,

without using a threshold for removal of the more toxic com-
pounds with the following results:
R? =
R =
S.E. =
S.D. =
NV =
.800
.640
.521
.492
32
     s)   From the previous few regressions we observed that some keys
          occurred only rarely and contributed unduly to the regression.
          We thought that separating the problem into pieces might be
          beneficial.  As a result the set of compounds were split into
          three classes:

               Class I:  those compounds in which a rare key does not
                         occur (rare being defined: with frequency <7).


               Class II: those compounds with at least onerare key.


               Class III:  the outliers i.e. mostly those compounds
                           with high toxicity.
                                  -  10 -

-------
     The thought then was to seperate Class III  from Classes I  and II
     by discriminant analysis and then have separate regression equations

     for each of the other two classes.

t)   A series of Class I regressions were now performed in order to

     arrive at as robust a structure as  possible.   The best comprise

     result from regression is shown below with  the coefficients in
     Table 5.
R2 =
FT =
S.E. =
S.D. =
NV =
.6277
.394
.442
.432
14
                           Table 5

        "Best" Compromise Class I Regression Equation
Variable
FG112.
MW
FG120
HR1R
NCN=0
FG83
GCN5=3
FG268R
FG96
A-C=0
GCN4=C2N
(Log P)2
FG96R
FG144
Coefficient
.783+00
.302-02
.318+00
.336+00
.143+01
.212+00
-.257+00
.198+00
-.214+00
.109+01
-.742+00
-.878-02
-.209+00
.162+00
Std. Error
.116
.000
.093
.102
.483
.087
.107
.084
.094
.501
.455
.055
.149
.122
                                                            45.9
                                                            45.4
                                                            11.7
                                                            10.9
                                                             8.73
                                                             5.95
                                                             5.80
                                                             5.
                                                             5.
                                          ,53
                                          ,16
                                        4.77
                                        2.66
                                                             2.60
                                                             1.96
                                                             1.76
Constant
,188+01

-------
u)   Class II regressions.  Similarly a series of regression were
     performed for Class II compounds with the best compromise results
     shown below and in Table 6.
^2
IT =
S.E. =
S.D. =
NV =
.6922
.479
.490
.466
21
                           Table 6

       "Best" Compromise Class II Regression Equation
Variable
MW
FG80
Log P
FG154R
FG51R
FG120
GCN2=3
GCN4=C5N1
GCN5=6
HR1E
FG34R
DACN=6
GCN5=5
FG120R
GCN3=C5
FG24R
HR2ER
GCN2=5
GCN4=C501
FG1 78R
Coefficient
.217-02
.617+00
-.782-01
.618+00
.634+00
.430+00
. 524+00
.647+00
.415+00
.219+00
-.545+00
.471+00
. 598+00
-.341+00
.463+00
. 346+00
-.216+00
-2236+00
-.401+00
-.209+00
Std. Error
.001
.151
.019
.185
.190
.135
.198
.247
.173
.092
.230
.202
.289
.189
.258
.226
.156
.171
.291
.157
                                                            22.5
                                                            16.8
                                                            16.
                                                            11.
                                                            11.
                                                             7,
                                                             6.
                                                            10.2
                                           00
                                           85
                                                             5.77
                                                             5.
                                                             5.
                                          .68
                                          ,64
                                         4.27
                                         4.27
                                          ,24
                                          ,22
                                          ,35
                                          .92
                                          .91
                                        1.91
                                         1.89
                                                             3.
                                                             3.
                                                             2.
                                                             1.
                                                             1,
Constant
.205+01

-------
v)   Finally a stepwise discriminant analysis was performed to

     separate Class II from Classes I and II  combined.   The results

     are shown below with the equation in Table 7.


                                   Compounds  classified into Classes

                                             I    +    II    III

          Actual    I +  II                       510       10
          Classes        III                        9       14


                           Table 7

      Discriminant Equations for Classes (I+II) vs. Ill

Variable            Class (I+II)        Class III           £

FG205               1.24                43.3                 82.1
GCN3=C5              .983               18.9                 65.6
FG29R               1.24                43.3                 41.2
SCN107               .095               38.4                 33.6
SCN84                .095               38.4                 33.6
SCN45                .727               13.1                 30.8
GCN4=C3             1.15                22.3                 20.5
HRIR                1.24                 4.90               15.9
FG118                .852               13.5                 11.0
FG34                 .852                5.30               10.8
FG112               1.16                 5.30               10.8
FG231R              -.488               14.7                 10.2

Constants           -.140

From the classification table it is clear that a substantial number

of false positives but more importantly an almost equal number of

false negatives would be detected by this scheme.  It would be

possible in future work to adjust the discriminant equations so as to

minimize the false negative problem at the expense, of course, of an

increase in the false positives.

4.   Discussion and Conclusions
     From the work that has been described it is clear it is  possible to  devise
     a three-step system for calssification and prediction of toxicity of
     chemical compounds, at least based on those compounds that were  included

     in the sample with which we worked.  The system would consist of first
     separating off the potentially highly toxic compounds (Class  III)  via the

-------
discriminant equations, separating the remaining compounds unto two
classes depending upon the presence or absence of rare keys and then
predicting toxicity via the application of the appropriate regression
equation.  The sample with which we worked had relatively few highly
toxic compounds and this is one of the reasons why the discrimination
between Class III and Classes I + II did not result in as satisfactory a
result as one would wish.  However, it is clear that the great bulk of
compounds can be separated out in this fashion.

It has also become quite evident in this work that many explanatory
variables were not included among the set of parameters that were
available for analysis.  The major reason why the correlation coefficients
in regression were not as large as desired must be attributed to this
fact. It is of course difficult to stipulate just what these features
should be At the very least however, one should consider some further
physical constants and some further steric constants.  This view if
reinforced by that fact that molecular weight was the most important
variable in most of the regressions in which it was given a chance to be
included.  This probably means that molecular weight is a summary variable
for many other variables amd might provide a lead as to what other
features to include in the future.

Our conclusions from this analysis are that statistical processes can be
effectively used to predict toxicity of chemical compounds.

We found it interesting that clustering did not contribute materially to
the derivation of a solution to the classification problem.  This may
have been a reflection of the relative homogeneity of the sample.

5.   Suggestions for the Future
     We believe that further work could usefully be performed applying
     the same techniques of clustering, regression and discriminant
     analysis to a larger set of data, particularly including a larger

-------
number of relatively toxic compounds,  that other features should be
included in any further data sets and  that a relatively simple on-
line system could be designed to implement the classification  and
prediction system which we have outlined.   It would also be possible
to extend these concepts to more specific  types of toxicity such
as carcinogen!city, mutagenicity etc.

-------
                         REFERENCES

1.   R. Jehnrich, "Stepwise Discriminant Analysis" in Statistical
     Methods for Digital Computers, K.  Enslein, A. Ralston,  H.  Wilf,
     eds., Wiley (in Press).

2.   R. Jennrich, "Stepwise Regression", as in Ref.  1.

3.   Statistical Computer Programs, Genesee Computer Center, Inc.,
     Rochester, NY.

-------
APPENDIX

-------
                  List of Fragment Kevs
ICY  rCRMULA   FRIC Cr ZZ~ .
T
-1
C.
7
W
4
J
5
7
3
Q
10
11 •
12
13
14
15
15
17
13
13
20
21
22
23
24
25
.25
27
23
23
30
31
32
33
34
35
35
37
38
33
40
41
42
43
44
45
45
47
43
45
50
51
52
53
54
55
55
57
53
1
2
3
4
5
5
7
3
3
10
11
-. 12
13
14
15
IS
17
13
IS
20
21
22-
23
24
25
25
27
23 .
23
30
31
32
33
34
35
' 35
37
33
39
40
41
42
43
44
45
45
47
43
43
50
51
. 52
53
54
55
56
57
' 53
A-Crc
A-C = 1
A-CZ2
A-C-3
A-C-4
A-CrS
A-C = 3
A-C = 3
DACN=0
OACN=1
CACN=1
DAC,\ = 1
DACN=2
CACN=3
CACN=4
DACN=5
OACN=S
CACN=7
DACN=8
3AC;-J=9
ZC1 = Q
EC1=1
EC1=2
iC2ro
cC2 = l
EC3 = 0
EC3=1
EC3=2
EC4 = 0
EC4rl
NCN = 0
NCN = I
NCN=2
NC.N=3
' NCN=5
FG101
FG101R
FG103
FG109
FG112
FG112R
FG113
FG115
FG115R
FG117
FG117R
FG113
FG119
FG112R
FG120
FG120R
FG121
FG122
FG123
FG125R
FG12SR
FG130
FG130R
                                              27C
                                              -23
                                               o c
                                               11
                                               i 3
                                              103
                                                1
                                                •4
                                                ^
                                              137
                                               ac
                                               77
                                               27
                                               23
                                                o
                                                5
                                                2
                                             G13
                                               5S
                                               6
                                             579
                                               7
                                             554
                                              27
                                               C
                                               w
                                             534
                                               2
                                             212
                                             35S
                                             105
                                              17
                                               ^
                                               ^
                                               1
                                               1
                                               1
                                               5
                                              48
                                               2
                                               j
                                              10
                                               2
                                              21
                                               3
                                               5
                                               7
                                               3
                                             55
                                             15
                                              1
                                             10
                                              2
                                             • 1
                                              1
                                              1

-------
so
• 51
- 1
S3
54
55
55
57
. S3
S3
70
71
72
73
74
75
75
77
73
73 .
30
31
32
33
84
35
35
37
33
33
' 30
91
92
93
94
95
95
97
-93
39
100
101
102
103
104
105
10S
107
103
109
110
111
112
113
114
115
115
117
118
119
• sc
51
S2
G 3
54
55
65
57
S3
S3
70
71
72
73
74
75
75
77
73
73
30
81
32
33
34
35
35
37
33
39
30
91
92
93
.34
35
95
37
. 98
99
' 100
101
102
103 '
104
105
105
107
103
109
110
111
112
113
114
115.
115
117
113
113
FG 1 31 R
-G123
FG133
FG1Z5.R
FGiZo
FG143
•FG143R
FG144
FG144R
FG145
FG145R
FG14S
FG147
FG147R
FG150R
FG151
FG154
FG154R
FG155
FG157R
FG153R
FG1S7
FG157R
FG1SS
FG172R
•FG174R
FG173
FG173R
FG181
FG187
FG139
FG205
FG207
FG207R
FG203R
FG217
FG213
FG220
FG223
FG223R
FG231
FG231R
FG222
FG24
FG24R
FG245
FG245R
FG245
FG24SR
FG248
FG248R
FG251
FG2E8
FG2S3R
FG23R
FG3R
FG32
FG32R'
FG34
FG34R
27
13
  7
  4
  3
  i
  7
  w
  3
 43
  +
  .4
  8
  3
  1
  2
  1
 13
 84
 10
  2
  ^
  J.
  5
  1
  1
  X
  2
  2
  2
  1
  1
  7
  3
  2
  3
  3
  5
  2
  4
  4
  4
  1
  1
  4
 3C
241
   2
   1
   1
   1
   3

-------
123        120     FG35                                      7
121        121     FC-35R                                    13
                                                           11
                                                            •>
122
123
124
125
125
127
123
129
130
131
132
133
134
135
135
137
133
133
140
141
142
143
144
145.
145
147
143
143
1.50
1:51
152
1.53
154
155
156
157
153
159
ISO .
151
152
153
154
155
165
157
153
159
170
171
172
173 .
174
175
176
177
173
179
122
122
124
125
125
127
123
123
130
131
132
133 •
134
135
135
'137
133
133
140
141
142
143
144.
145
145
147
148
149
.150
151
152
153
154
155
156
157
158.
159
160
161
162
163
154
165
156
157
153
159
170
171
172
173
174
175.
176
177
173
173
^3-:
F336R
FG37
FG37R
FG40R
FG41
FG44
FG47
FG51
FG51R
FG54
FG35
FG56R
FG57
FG51
FGSSR
FGS7R
FG5S
FG74
• FG75
FG75R
FG75
FG3C
FG30R
FG31
FG31R
FG32
. FG32R
FG33
FG33R
• FG34
FG35
FG35R-
FG3S
FG8SR
F337
FGSSR
FG33
FG3
FG92
FG92R
FG94
• FG94R
FG95
FG96
FG9SR
FG93
FG99
HRIE
HR1R
HRlOc
HR10R
HRUE:
H R 1 1 R
' HR12E
HR12R
HR13R
HR14E
                                                            2
                                                            4
                                                            1
                                                            1
                                                            5
                                                            4
                                                            a
                                                            2
                                                           47
                                                            5
                                                            2
                                                           IS
                                                            2
                                                            3
                                                           38
                                                            2
                                                            1
                                                           13
                                                            7
                                                            9
                                                           13
                                                           95
                                                            1
                                                            1
                                                            1
                                                            3
                                                            1
                                                           61
                                                           19
                                                            1
                                                           44
                                                           25
                                                           ' 5
                                                            1
                                                           265
                                                           113
                                                            7
                                                            . 2
                                                            31
                                                            7
                                                           10
                                                            3
                                                            4
                                                            9

-------
130
1 21
132
133
134
135
1 So
137
133
133
130
131
192
133
134
125
195
197
133
133
2QG
201
202
203
204
205
205
207
.203
209
210
211
212
213
2m
215
215
217
213
219
220
221
222
223
224
??5
225
227
223
223
230
231
232
233
234
235
235
237
233
239
180
131
1S2
133
134
135
13S
137
133
139
130
191
132 •
193-
194
135
•135
137
133
133
200
201
2C2
203
204
205
205
207
203
209
210
211
212
213
214
215
215
217
213
219
220
221
222
223
224
225
225
227
223
223
230
231
232
233
234
235
235
-237 •
233
233
HR14ER
HRisrr
HR151R
KR1SR?
HR17IE
HR17ER1
HR13ICI
HR2ir
HR2ER .
HR2RR
. HR20~
.HR21E:
HR21R
HR22I
HR22R
HR23;
HR23R
HR24EE
HR25E
HR25R
HR25£
HR25R
HR3o
HR3R
HR31Z
HR31R
HR34E!£
HR3SE1
HR4=:
HR4R
HR41R
HR47E
HR53E
HRSEI
HRSER
HR7EE
HR7ER
HR8E:!
HRG23E
HRG23R
HRG33E
HRG33R
HRG42E
HRG42R
HRG43E
HRG54E
ND22
N030
N030R
NC31
WD33
SCN1
SCN102
SCN103
SCN105
SCN107
SCN103
SCN109
SCN111
SCN112
  7
  •j
  7
 28
  o
 2 3
  1
  i
  1
  1
  1
  2
 19
  2
  7
  3
  3
  1
  •5
  4
.05
  e
  3
  5
  2
  3
  5
  2
  1
  3
  1
  •3
  
-------
24C
2m
242
243
244
245
245
247
24E
243
250
251
252
253
254
255
25S
257
258
259
2SO
2S1
2S2
253-
254
2S5
255
257
258
259
27G
271
272
273
27!;' -
27b
275
277
273
273
230
231
282
233
2S4
235
285
237
233
239
290
291
292
293
294
295
295
297
293
299
240
241
242
243
244
245
245
247
243
242
250
251
252
233
254
255
255
257
253
259
250
251
252
. 253
254
255
255
257
253
259
270
271
272
273
274
275
275
277
273
279
230
231
2S2
233
234
235
236
237
233
239
290
291
292
293
294
295
295
297
293
299
3CK119
SCN125
SCK127
SCN130
SCN125
SCN15
SCN17
SCM2
SCN24
SCN2S
SCN27
SCN23
SC.N29
SCN3
SCN34
SCN35
SCN33
SCN40
SCN42
SCN44
SCN45
SCN47
SCN48
SCN49
SCNSO
SCN52
SCN53
SCN5S
^ f* \t ^ ^
.i C iN o o
SCNS9
SCN71
SCN72
SCN73
SCN75
3CN78
3CN79
SCN34
SCN87
SC7432
SCN99
GCN2=3
GCN2=3» 5
3CN2 = 3»5t 5»5 »5
GCN2=3»S
GCN2=5
GCN2=5»5
GCN2=5«5t 5
GCN2=5t 5t 5»5
GCN2=5»5t 7
GCN2=5»S
SCN2=5f S»S
GCN2-5tStS»S
5CN2-5»S»S»5 »S
GCN2=S
GCN2=Sf 5
• GCN2=Sf StS
GCN2-6»5f7
GCN2-5.7
SCN2=St7.7
GCM2=7
11
  3
  2
  2
  4
  4
  7
 11
  3
393
 13
  c
  4
  3
  1
  1
  2
  7
  1
  1
  3
  2
  ' 3
   2
   2
  24
   7
   4
   2
   1
  29
   o
   4
   3
 452
  22
  11
  13
   7
   1
   3

-------
30G       3GC    3CN'2 = 5                                   1
301       201    GC,\'3 = C2  SI                               2
302       302    GCNZ=C2  Cl                              17
303       303 .   GCN3=C2  Nl                               2
304  '     304    GCN3=C2  N2  SI                            1
305       305    GCN3=C2  N3  SI                            4
305       305    GCN3-C2  N4  S2                            2
307       307    GCN3=C3                                  3
303       303    GCN'3rC3  S2                               2
30S       302    GCN3=C5  C2                               Z
310      • 310    GCN3=C3  Nl  SI                            2
311       311    GCN3-C3  Nl  Cl                           - 3
312     . • 312    GCN2=C3  N2                              13
313       313    GCN3=C3  N2  SI                            1
314       314    GCN3-C3  N3                               1
315       315    GCN3=C4  SI                               2
315       315    GCN3 = C4  Cl                              .20
317       317    GCN2=C4  C2  SI                            1
313       313    GCN33C4  Nl                               9
319       319    GCN3=C4  Nl  SU                            1
320       320    GCN3=C4  Ml  Cl                            2
321       321 "  GCN3=C4  N2                              19
322       322    GCN3=C4  N4                               1
323     .  323    GCN3=C5                                 33
324       324    GCN3=C5  Cl                              31
325       325    GCN3-C5  Nl                              33
325       325    GCN3=C5  Nl  SI                            2
327       327    GCN3=C5  Nl  S2                            1
323       323    GCN33C5  N2                               3
329       329    GCN3=C6                                579
330       330    GCN3=CS  01                     .          4
331       331    GCN3rcS  Nl                                9
332       332    GCN3zC7                                 13
333       333    GCN3=C7  Cl                                3
334       334    GCN3=C7  Nl                               1
335       335    GCN3=C3                                   3
335       335    GCN4=C2  SI                                2
337       337    GCN4=C2  Cl         .                     11
538       333    GCN4 = C2  Nl                                -2
333       339    GCN4=C3                                   2
340       340    GCN4=C3  02                                2
341       341    GCN4=C3  Nl  SI                            1
342       342     GCN4=C3  N2                         .5
343       343     GCN4=C3  N3                                1
344       344     GCN4=C4  SI                                2
345       345     GCN4=C4  Cl                              12
345        345     GCN4=C4  Nl                                1
347        347     GCN4=C4  Nl  SI                            1
348        343     GCN4=C4  Nl Cl                            2
349        349     GCN43C4  N2                             15
350        350     GCN4rC4  N4  S2                            1
351        351     GCN4=C5 Cl                               7
352  .      352     GCN4-C5 Nl                              19
353        353     GCN4rC5 Nl S2                            1
354        354     GCN4=C5 N4                               3
355        355  '   GCN4=CS                     '           420
35S        355     GCN4=CS Cl                               1
357        357     GCN4=CS Nl   '                            3
353        353     GCN4=CS N2 SI                            1
359        359     GCN4rC7                                  2

-------
25G
351
352
353
354
355
3S£
367
3G£
35 =
370
371
372
373
374
375
375
377
373
373
330
331
332
333
334
335
335
337
333
333
390
391
392
393
394
395
396
397
393
399
400
401
402
403
404
405
405
407
408
409
410
411
412
413
414
415
415
417
418
419
350
3S1
352
353
354
' 3£5
3oo
3c7
35S
3£9
370
371
372
373
374
375
375
377
373
373
330
331
382
383
•384
385
335 '
337
333 .
339
390
391
392
393
394
335
395
337
398
399
400
401
402
403
404
405
405
407
403
409
410
411
412
413
414
415
415
417
413
419
GCN4=C7 22
GCN4=C7 Nl
GCN4TC7 Nl SI
GCN4rC7 Nl Gl
GCN4=C7 N2
GCN4=C7 N2 SI
GCN4=C8 SI
GCN4ZC8 Cl
GCN4=C8 Nl
GCN4=C9 Cl
GCN4=C9 C2 SI
GCN4rC2 Nl
GCN4=C9 Nl SI
GCN4rC3 N2
GCN«i=C9 N2 S2
GCN4ZC1G
GCN4rCll 02
GCN4=C12
GCN4rC12 01
GCN4ZC12 02
GCN4=C12 Nl
GCN4rC!3
GCN4=C13 01 :
GCN4TC13 Nl
GCN4=C13 Nl SI
GCN4TC13 N2
GCN4=C14
GCM4-C14 Cl
GCU4-C14 Nl
GCN4=C15
. GCN4-C15
GCN4rC17
GCN4-C13 03
GCN4 = C18. N2 01
GCN4=C19 N2
GCNS=1,2
GCNSrl,3
GCN'S = 1 » 3t 5
GCNS=1»4
GCM1=1
GCN1=2
GCN1=3
GC^a = 4
GCN1=5
GCN5=0
GCN5=1
GCN5=2
GCN5=3
GCN5-4
GCN5-5
GCNS^S
3CN5=7
GCN5=3
LIG=C3 Hll N2 03
LIG=C11 H 17 N2 02
LIG=C12 H 15 N2 03
'LIG = C12 H 17 N2 23
ALK
AN
CHAL
  3
  7
  3
  4
  2
  1
  c
  1
  8
  2
  2
  2
  1
  4
  1
  o
  3
  2
  3
  1
 . 1
  1
  3
 13
  1
  S
507
 SS
 '41
  S
  5
 74
 29
 23
422
 37
 10
 22
  o
  1
  1
  1
  1
  1
  4
  7
  4

-------
                                                    1?
 <421
RICCRSS  R

I4D ART

-------
DF c^

                 KCLECULAP r
KCL.WT
LCG .F ,
I
%
£
5
7
S
s :
10
11
'12
13
14
15
16
17
18
15
20
21
22
23
24
25
2S
2^
...25
2*3
'30
' 31'
32
33
34
35
35
'• 37
38
29
<(0
, J» *
' tl ?
? -
'•'.<» 4
' 
1 ,E9
3,20
2 .1C
2.33
Z .OS
2.72
2.31.
1.3E
1 .25
1.53
2.53
3.C5
.1 .£3
2 .60
2.75
3.03
2.33
3.01
2.62
•2.73
1 1.-69
.1.51
3.56
1.49
1.51
1.47
2.77
2.47
1.74
1.47
3.04
3.50
3.67
.1.45
1.37
1.51
1.S2
1.47
1 .48
1.33
1.53
2.13
2.17
1 .95
2.25
2.0-8
1.54
2.14
. 2. 92
1.35
2.4S
2.28
2.50

-------

A f? M y T c
MOLECULAR
MCL.WT
LCP.P.
NC3M.T
f
i
£
5
c
7
5
Q
10
.11
12
1 T
14
15
15 .
17
IS
15 .
20
21
22
23
102373
105733
105047
111011
114372
11C073
116420
.118035
122123
122605
126775
128110
128404
123504
128813
123314
122290
131571
132115
135211
142223
1435G1
           C10H11N105P1S1
           CUH1SCL1C233P1
           C17H12Q7
           C20H18CL1N1CS
           C5H12CL3M1
           C2H4F1N1G1
           C12H18N2C2
           C14H1705P121
           C8H1532P1S3
           C15K23N1C4.
           C20H24CL1N1C3
           C4H4C1
           C10K17N3C2
           C13H19N1C2S1
           C2K2trlNAlC2
           C12H3CLG
           C12H8CLSG1
           C15H12N2C3
           C14H1SCL106P1
           C41H54C13
           C4KSF2C2
           C12H15N1C3
           C1SH22CL1N1C3
7 91
342
328
403
192
77
222
323
274
281
361
S3
211
253
ICO
354
380
268
345
754
124
221
347
.25
.35
.28
.32
.52
.OS
.22
.32
.39
.35
.87
.08
.26
.36
.02
.21
.21
.27
.71
.26
.CS
.2£
.84
2
3
1
1

-1
2
2
1

-
-
3
3
-4
4
4
2
2
1
-
2
-1
* A 3
4 S2
.73
.44
.77
,03
.43
.16
.93
.33
.30
.82
.CC
.30
.00
.50
.37
.3C
.11
.73
.75
.11
.00
5
4
5
4
4
4
4
4
4
5
4
4
4
S
3
4
5
c
4
5
5
4
4
« J. C
.54
.31
.31
.2;
1 T
4 *. -
.2:
.24
.44
.IE
.22
.0]
.2:
.1C
.ES
.7:
.1C
. 8 :
.5!
,8£
• OS
.44
.54
                              - A-30 -

-------
-" TECHNICAL REPORT DATA
t (Please read Insovctions on the reverse before completing)
1. REPORT NO. 2.
EPA-560/1 -77-001
4. TITLE AND SUBTITLE
Models for Biochemical Toxicity
"^ AUTHOR(S)
Kurt Enslein
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Genesee Computer Center, Inc., Rochester, NY 14
for: The Franklin Institute Research Labs
Philadelphia, PA 19103
12. SPONSORING AGENCY NAME AND ADDRESS
Office of Toxic Substances
Environmental Protection Agency
15. SUPPLEMENTARY NOTES
3. RECIPIENT'S ACCESSION-NO.
5. REPORT DATE
February 1976
6. PERFORMING ORGANIZATION CODE
8. PERFORMING ORGANIZATION REPORT
10. PROGRAM ELEMENT NO.
^605
NO.

11. CONTRACT/GRANT NO. '
68-01-2657
13. TYPE OF REPORT AND PERIOD COVERED
^iihrnntrart rpoort
14. SPONSORING AGENCY 'CODE

16. ABSTRACT
Multivariate techniques of data analysis were applied to a data base of
549 chemical compounds. The techniques used included multiple regression
and multiple discriminant analysis.
17. KEY WORDS AND DOCUMENT ANALYSIS
a. DESCRIPTORS b.lDENTIFI
multiple regression
stepwise discriminant analysis
18. -DISTRIBUTION STATEMENT 19. SECURI
Release Unlimited
20. SECURI
•j
ERS/OPEN ENDED TERMS C. COS AT I Field/Group
06/20
06/04
12/01
TY CLASS (This Report) 21. NO. OF PAGES
TY CLASS (This page) 22. PRICE


EPA Form 2220-1 (9-73)

-------
                                                     INSTRUCTIONS

1.   REPORT NUMBER
     Insert the EPA report number as it appears on the cover of the publication.

2.   LEAVE BLANK

3.   RECIPIENTS ACCESSION NUMBER
     Reserved for use by each report recipient.

4.   TITLE AND SUBTITLE
     Title should indicate clearly and briefly the subject coverage of the report, and be displayed prominently.  Set subtitle, if used, in smaller
     type or otherwise subordinate it to main title. When a report is prepared in more than one volume, repeat the primary title, add volume
     number and include subtitle for the specific title.

5.   REPORT DATE                                         .                                                     '
     Each report shall carry a date indicating at least month and year.  Indicate the basis on which it was selected (e.g., date of issue, date of
    approval, date of preparation,  etc.).

6.   PERFORMING ORGANIZATION CODE
     Leave blank.

7.   AUTHOR(S)
     Give name(s) in conventional order (John R. Doe, 1. Robert Doe, etc.}.  List author's affiliation if it differs from the performing organi-
     zation.

8.   PERFORMING ORGANIZATION REPORT  NUMBER
     Insert if performing organization wishes to assign this number.

9.   PERFORMING ORGANIZATION NAME AND ADDRESS
     Give name, street, city, state, and ZIP code. List no more than two levels of an organizational hirearchy.                        ,

10.  PROGRAM ELEMENT  NUMBER
     Use the program element number under which the report was prepared.  Subordinate numbers may be included in parentheses.

11.  CONTRACT/GRANT NUMBER
     Insert contract or grant number under which  report was prepared.

12.  SPONSORING AGENCY NAME AND ADDRESS
     Include ZIP code.

13.  TYPE OF  REPORT AND PERIOD COVERED
     Indicate interim final, etc., and if applicable, dates covered.

14.  SPONSORING AGENCY CODE
     Leave blank.

15.  SUPPLEMENTARY  NpTES
     Enter information not included elsewhere but useful, such as: Prepared  in cooperation with, Translation of, Presented at conference of,
     To be published in, Supersedes, Supplements, etc.

16.  ABSTRACT                      ,       '
     Include a brief (200 words or less)  factual summary of the most significant information contained in the report. If the report contains a
     significant bibliography  or literature survey, mention  it here.

17.  KEY WORDS AND DOCUMENT ANALYSIS
     (a) DESCRIPTORS - Select from the Thesaurus of Engineering and Scientific Terms the proper authorized terms that  identify the major
     concept of the research and are sufficiently specific and precise to be used as index entries for cataloging.

     (b) IDENTIFIERS AND OPEN-ENDED TERMS - Use identifiers for project names, code names, equipment designators, etc. Use open-
     ended terms written in descriptor form for those subjects for which no descriptor exists.

     (c) COSATI FIELD GROUP - Field and group assignments are to be taken from the 1965 COS ATI  Subject Category List. Since the ma-
    jority of documents are  multidisciplinary in nature, the Primary Field/Group assignment(s) will  be specific discipline, area of human
     endeavor, or type of physical object.  The application(s) will be cross-referenced with secondary Field/Group assignments that will follow
     the primary posting(s).

18.  DISTRIBUTION STATEMENT
     Denote reusability to the public or limitation for reasons other than security for example "Release Unlimited."  Cite any availability to
     the public, with address and price. /                                               •       -

19. & 20. SECURITY CLASSIFICATION
     DO NOT submit classified reports to  the National Technical Information service.

21.  NUMBER  OF PAGES
     Insert the total number of pages, including this one and unnumbered pages, but exclude distribution list, if any.

22.  PRICE
     Insert the price set by the National Technical Information Service or the Government Printing Office, if known.
   EPA Form 2220-1 (9-73) (Reverie)

-------