FINAL REPORT
                    on
EVALUATION OF A COMPUTER METHOD TO PREDICT
   OCTANOL WATER PARTITION COEFFICIENTS
          TECHNICAL DIRECTIVE 12


                    by

               ALBERT J, LEO
      SUBCONTRACT NO.  T6415(7197)-029
          CONTRACT NO. 68-01-5043
       R.G.  Wilhelm, Project Officer
         R.  Lipnick, Task Officer


  Office of  Pesticides and Toxic Substances
    U.S. Environmental Protection Agency
             September 1982

-------
This document has been reviewed and approved for publication
by the Office of Toxic Substances, Office of Pesticides and
Toxic Substances, U. S. Environmental Protection Agency.
Approval does not signify that the contents necessarily reflect
the views and policies of the Environmental Protection Agency,
nor does the mention of trade names or commercial products
constitute endorsement or recommendation for use.

-------
                             TABLE of CONTENTS
Executive Summary 	1






I. Shake Flask Measurements	3






II. HFLC as Alternative Measurement Method ... 5






III. Improvements in Manual Calculation






   Rule Change	10






   Rule Simplifications	11






   Electronic Effects 	  14






   Alkyl-Aryl Effects 	  15






   Intramolecular H-Bonding 	  16






   Negative Ortno Effect 	   17






IV. Use of CLOGP-I	   19






V. Analysis of CLOGP-I Test Results	23






Bibliography 	   29
                                  TABLES

-------
 1.  Compounds Measured by  Shake Flask
 2.  Sigma-Rho  Constants for Electronic Effect
3.  Negative Ortho  Corrections
4.  Example  Calculations Assisting CLOGP-I
5. Solutes with Highest Deviations—Reasons
6. WLN Test Structures Ordered by Formula
7. WLN Test Structures Ordered by FCON Number
                               FIGURES
1. Halogen-H-Folar Interaction in Aliphatic Systems
2. Statistics for CLOGP-I, Mode (1)
3. Statistics for CLOGP-I, Mode (2)

-------
   4. Statistics for CLOGP-I, Mode (3)
   5. Plot of Residuals vs. CLOGP, Mode (3)
                                SUPPLEMENTS




1. Output at 2N Level of Test Structures






2. Magnetic Tape of CLOGP With Inconsistencies Corrected






3. Complete Paper on Calculating Hydrophobicity in Aromatic Rings

-------
                             EXECUTIVE SUMMARY
   Octanol-vater partition coefficients  of over 160 solutes  were measured

by shake-flask  procedure.   These  were used to  confirm old  Fragment and

Factor values and to establish new ones.   A number of important pesticides

were included as 'benchmarks'.


   A minimum amount of experimental effort  was expended on HPLC procedures

with the objective of  developing one which would provide a  more rapid but

just as  reliable method for  determining the hydrophobic  parameter.   The

difficulties of using water as the mobile  phase led us to undertake a more

thorough literature search, which in turn led us to work in progress at the

U.   of  Illinois  (Champaign-Urbana),   and to  the  conclusion  that  any

successful solution  to the problem  was likely to be  found in the  use of

very expensive, computer-driven,  multi-pump HPLC units,  which were beyond

the scope of this contract.  (See body of report for evaluation.)


   Several very  important improvements were  made in  the rules for  log F

calculation  as a  result  of  this contract  effort.    A  new method  for

calculating the  proximity effect  of polar  fragments imbeded  in aromatic

rings was devised and tested.   A new  simpler system of accounting for the
       t*
electronic  effect  in  aliphatic  systems  between  halogens  and  H-polar

fragments was implemented.   Finally, a comprehensive method of calculating

the interaction of  multiple-substituents on aromatic rings  was perfected.

It will  be essential  to include all  of these  improvements in  the CLOG?

algorithm if it is to perform the tasks EPA expects of it.

-------
                                                                      PAGE 2




    The task  of installing a working  version of CLOGP (initial  version by




 Chou & Jurs,  Pennsylvania State Univ.)   at EPA in Washington,  D.C.   for




 practical application work was,  frankly,  disappointing.   The programming




 strategy  initially  chosen for  speed  on  a Modcomp  16-bit  minicomputer




 baffled our attempts to  convert it from a research tool  into working tool




 to be handed over to someone with  minimal experience in log P calculation.




 In spite of considerable effort to  eliminate the problem during the period




 of this contract,   the output of CLOGP  depended  on the ORDER in which the




 structural information  was input.   Although this  is rarely a  problem if




 only a  dozen structures are  input at a  'sitting1,  it often  occurs when




 computing from a file of hundreds  or thousands.    Post-contract effort has




 (apparently)  eliminated this problem.   However,   the original programming




 strategy also set  a practical upper limit of 300  fragments and factors,  and




 thus new ones cannot be added as they determined.    Finally,  our efforts to




 install  CLOGP on the DEC 20 at EPA in Washington,   D.C.   were unsuccessful,




 but  it  is operational at ERL-Duluth.






   For  these reasons,   and  after consulting  other  programmers familiar with




 CLOGP who also were unable  to circumvent  these  problems,   some of the later




 programming  effort   by  Pomona  personnel  under  this contract  was directed




 towards  devising a   table-driven second version of  CLOGP.     But until such




 time  as   a  'stand   alone*  program  becomes  available,    the  instructions




 provided  in  this report  will  enable the operator to use  CLOGP-I in a manner




which equals  or  exceeds  the original  objectives set forth  for the contract.

-------
                                                                      PAGE 3




                         I. Shake-Flask Measurements
    Since any  measurements used to establish  Fragments or Factors  used in




 computer  calculations tend  to be  used  without frequent  review,   it   is




 incumbent to  set the highest standards  on the procedures  for determining




 the  octanol/water partition  coefficients.   Experimental  details  can   be




 found in Ref.  1,  p.  587.   Lots of MC/B octanol (m.p.-16 to -15) were tested




 for absence of  absorbtion down to 220 nm.   A single lot was  used  for  the




 entire set of  determinations without need for further purification.    Stock




 solutions of octanol saturated with distilled  water (or buffer)  and water




 (or buffer) saturated with octanol were maintained  in an air-conditioned




 laboratory.   Temperature records were kept,  and a variance of five  degrees




 (C) necessitated  a re-saturation of the solvent phases followed by a 12  hr.




 settling period.






     A minimum of   three  determinations  over  at   least  a  three-fold




 concentration  range  was   required to establish a  partition coefficient  for




 this work.    A  ten-fold  concentration  range  was often  employed,    and




 concentration  dependence  was  looked  for  as  an indication  of possible




 association in either  phase (usually the water).   It was not unusual for 12




 or  more  runs   to  be  made  to establish  one log P.    (The  current charge by




 Pomona Medchem for  each   partition coefficient  measurement meeting  these




 criteria  is  $250.00.)






     Unless  there was a poor material  balance,   it was the usual  practice to




 analyse  only one  phase (usually  the water).   Standard analytical procedures




were followed.      Samples were weighed  on  a  Cahn 25  Electrebalance.    U.v




 determinations   were  made  on   a  Gary   15  spectrophotometer.      Gas

-------
                                                                      PAGE 4




 chromatography was   performed  on  a Varian  2700  using  a Vidar  6300 digital




 integrator.   HFLC   (used for  analysis,  not   partitioning)  was  done on a




 Waters  unit with LKB Uvicord 2138 detector operating at 208 nm.






     It  is readily apparent from the  compounds  listed  in Table 1.  that the




 choice  of structures for measurement was not  always based  on the simplest




 one  with the feature in question.  Considerations of cost, availability and




 purity  were also important.  Furthermore, because of the association  of the




 Pomona  Medchem Project with other research  efforts,   it was often possible




 to use  a set  of chemicals  on which  other important  determinations were




 being made.   For example,  there were X-ray coordinates determined for the




 set  of  hippurates  in  Table 1,    and  conformational  analyses  made  to




 rationalize other unusual properties of these  substituents were expected to



 help predict  the hydrophobic  Factors needed  for  these groups   in similar




 circumstances.   A   similar opportunity arose  for the  N-methoxy-ureas,  a




 functional group fairly common to compounds of environmental concern.   The




 availability of a solute set expected to show  some conformational anomalies




was most fortunate.






    Table 1 lists the compounds  whose partition coefficients were measured




under this contract. They are  sorted by structure via WIN.

-------
                                                                      PAGE 5



              II. HPLC Procedures as an Alternate to Shake-Flask






     There are numerous reports of work  that attempt to relate the relative



 retention time  in high-performance  liquid chromatography  columns to  the



 shake-flask partition coeffiecient between  octanol and water.2  3  **  5  6 From



 a theoretical viewpoint,  it would seem that the closest relationship ought



 to exist  where the  stationary phase  was 'bound*   octanol and  the mobile


                                           2  7
 phase  was water  saturated  with octanol.     Given  conditions of  column



 length and  pumping rate so that  an equilibrium was attained,   log (T-To)



 should be proportional to log P.   Although a few laboratories still report



 notable  success with  this method,   mechanical details  have plagued  the



 procedure,  and  even frequent  runs of reference  standards do  not always



 eliminate  errors.    The  columns  are not  available  pre-packed from  the



 manufacturer,  and while column preparation procedures have been simplified,



 one  can expect  considerable variation between laboratories,  especially with



 the  shorter  columns where channeling can introduce  large errors.   Most HPLC



 pumps do not operate  well on a mobile phase as viscous  as  water,   and  the



 check valves are much more prone to malfunction with it.





     Most of  the  mechanical problems can be overcome by  using a commercially



 available C-18  column and a less viscous mobile phase than water.  The long



 alkyl  chains are thought to  mimick the C-8   chain of  octanol,   while  the



 residual  polar  silanol   sites  take  the  place  of  the  polar -OH  group.



 Mixtures of  water and an organic solvent  can  be used  as  the  mobile phase.



 While  acetonitrile  has been proposed as  the organic component,  we  can  see



 no reason for not employing as many   -OH groups as  possible,   and  therefore



we think methanol   is to  be  preferred,   except  for   very lipophilic  solutes



where  i-propanol may  be advantageous.

-------
                                                                     PAGE 6


    While   our work was   still  in   the   preliminary   stage  of  determining


whether  there  was  a linear  relationship   between  % methanol and log(T-To)


for various structures,   we  became  aware  of   the work  of  Dr.  John Garst at


the University of Illinois (Champaign).   He had available a  Hewlett-Packard


model  1084B HPLC which  could  be programmed   to run and plot (T-To)  for a


series  of   solvent percentages.     In    return   for  providing  him  with


'preferred1  octanol/water shake-flask values  from  our  data base,  he kindly


kept us  abreast of his  research,    a report of  which   has recently  been

                          Q Q
submitted for publication.     His  investigations  followed closely  to what


we had planned,  but his  advanced  equipment  allowed him  to  accomplish much


more than we would  have  in the same time.   In addition,   he developed some


techniques   to   prolong  column  life and reproducibility which  could  be


crucial  to any  procedure  which  EPA may depend upon   for a  regulatory


function.



   Both  Veith6   and Garst8  have appreciated  that  their HPLC procedures can


extend a measured hydrophobicity beyond the practical  limits for the shake-


flask  method.  It may well be  that  the model  will  not  prove  to be identical


to  the  equilibrium of   a  solute  between two  liquid   phases,  but  with


compounds whose  log P values  would  be above 6,   the  concentrations being


delt with in the aqueous  phase are  so low that adequacy  is  all that should


be hoped for in  a model.   We  think  it very  likely that  HPLC parameters in


this high range  will be found  to correlate certain biological data as soon


as there are enough appropriate  ones available.



   There are a few  aspects  of Garst1s work with solutes   in the 'ordinary*


log P  range  which should  raise a note of  caution  and  indicate that further


work should  be undertaken before his method can be considered as a reliable

-------
                                                                      PAGE 7




 replacement of the shake-flask for hydrophobia parameter measurement.   The




 first objection is not  a serious one for EPA's purposes:    Because of the




 small  difference (T-To),   log P  values  below 0.4  cannot be  accurately




 determined.   The  second is  more important:   Sometimes the  relationship




 between % methanol and log(T-To) is curvilinear.  For example,   in the case




 of benzene,  the curvature is concave  downward,  and the intercept at 100%




 water  is  lower  than  a tangential  projection  at  appreciable  methanol




 percentages.   The log P calculated from the actual intercept is low by over




 0.5 log unit.    In the range of  60% methanol down to roughly 35% methanol,




 the curvature  is  slight and can be  treated as linear within  a confidence




 limit of  96.9%.   Extrapolation of this  linear portion gives   a log(T-To)




 intercept which results in a calculted log  P of 2.00.    This compares more




 favorably with the accepted shake-flask determination of 2.13.






    Garst  provides only one example of a plot of log(T-To) vs.    methanol %




 where the  curvature  is concave  upward—that for methyl-i-butyl xanthine.




 Although  the partition coefficent for this  solute has  not been measured  by




 shake-flask,    it can  be calculated  from  the dimethyl analog with  some




 confidence.    It appears that  an  extrapolation from high methanol content,




 where the curvature is least,   gives an intercept which yields  a calculated




 log P which is   much higher  than what is  expected by shake-flask.   The




 intercept   from  low   or  zero  % methanol   would make the discrepancy  even




 greater.






   Garst  suggests  that  the  curvature of these plots  results  from failure  to




 reach equilibrium with  the  columns  and  flow  rates investigated  so far,  but




 that  extrapolation   from   the  'linear'  portions   can  circumvent   this




difficulty.   But he  also suggests  that  many  of  the  shake-flask values are

-------
                                                                      PAGE 8




 not correct for the  given structures because of dimer  and other multiple-




 component formation,  and the values from HPLC may be the 'true1  parameters.




 In this regard,  his   argument seems to us to have little  merit in view  of




 the fact that,  in the case of benzene,  there is no concentration dependence




 over a  10+ concentration range,   and  there is agreement between  HPLC and




 shake-flask for naphthalene  and  anthracene.   The fact  that  there IS good




 agreement for  such a   wide variety of structures is important,    and  if the




 classes of structures where disagreement is  great is carefully  studied and




 defined,   it may well be  that HPLC can  lead us  to a  new  parameter with




 significance for biological QSAR.






    Caffeine, like the xanthine derivative discussed above,  is found by HPLC




 to  have a much  higher log P  value than determined by shake-flask (0.63 vs.




 -0.07).    Garst  attributes this  to  the possibility of   hydrate formation.




 Hydrate   formation has   been shown to  be  a   factor   in  the  partition




 coefficient  in   solvent  pairs where  the organic   phase  is  as non-polar  as




 benzene.10   Evidence  for   its influence  in  alcohol/water  partitioning   is




 lacking.   As the HPLC-shakeftask differences become  more clearly defined  it




may be found  that  the former are  better parameters   for certain biological




 activity.    If not,   then  the differences must   be   analyzed for  their




 predictability  if HPLC is  to  serve as   a 'plug-in* replacement for  the more




 costly shake-flask method.

-------
                                                                      PAGE 9



                  III. Improvements in Calculation Methods





    The earliest  method to  be proposed for  calculating log  P (oct/water)



 from structure was based on the 'pi1  system of replacing the hydrogen on an



 aromatic ring with a specified substituent group.11   This system vas  widely



 used for over a decade,  but was not well suited for reduction to a computer


                     19
 algorithm.    Rekker 1£   proposed an additive (as opposed  to a replacement)



 system in which  a hydrogen was also  assigned a  hydrophobic constant.    In



 accounting  for constituitive Factors,  Rekker made use of a 'Magic Constant'



 which  was  multiplied  by  an  integer  to  yeild the  desired  interaction



 correction.     To date   no method  has  been published  which allows   these



 integers to be predicted from structure,   and so it has appeared impossible



 to  reduce Rekker1s method  to a  computerized form.     (Another difficulty



 arose from the fact that Rekker never  defined a Fragment;   if one was found



 in  his table  of values,  you knew it qualified as such.)




    The log P  calculation method of Hansch  and Leo 13also uses the system of



 adding Fragments  and interaction  Factors,   but  it was designed  from  the



 outset with a  computer  algorithm in mind.     It  was apparent in  the early



 stages   of this   effort  that   the  partition  coefficient was an  extremely



 complex   parameter,   influenced  by   apparently    subtle   changes  in   the

        >

 arrangement of  the  component   parts of  the  solute  structure.    The most



 effective  computational   strategy,    therefore,    depended   on  the best



 compromise between  two conflicting demands:    If  the Fragments were defined



as very  small units  (i.e.  the  constituent atoms),  the Factors necessary  to



allow for the many possibilities  of interconnection would be unmanageable;



if  the Fragments  were  defined too   large (thus   containing many of   the



troublesome Factors within a measured  value)  many more measurements would

-------
                                                                     PAGE 10
 be  required  to  begin  a  workable system,    and  in  fact   it would  be
 indistinguishable from the 'pi1  system.    It   was  hoped that  a satisfactory
 compromise  would  result if  Fragments were defined by means   of 'Isolating
 Carbon Atoms'  (ICs):    An Isolating Carbon atom is one not multiply-bonded
 to a hetero atom; all  atoms  or groups of  atoms whose remaining bonds are to
 ICs are  fundamental Fragments.   This definition was used quite successfully
 for a great number of  manual calculations,  but the only way it was  possible
 to clearly   appreciate any improvements   which might  be called for  was to
 greatly  extend  this range  through calculation via computer—a task  made
 possible largely through the contract which supported this work.

 Rule Change

    The  only  desirable rule change indicated from this  study   involves
 Fragments fused  in hetero-aromatic  rings.   The original rule (Ref. 13,p.34)
 produced unnecessarily large Fragments for  the common purine analogs,   for
 example.    Furthermore,each  aromatic system seemed to  act as a unit as far
 as  proximity  effect is  concerned,   and   so  the  positive Factor   was  more
 closely  related  to how many  Fragments were  present,  not  to how closely  they
were  spaced.   Based on these observations  the revised  rule  states:    "All
 carbon  atoms  in  an aromatic ring are   isolating unless  they are  doubly
bonded to a hetero atom outside  the ring."

   Using  the  revised   definition,   the   Factor for   multiple occurence  of
hydrophilic Fragments  (called 'proximity  effect*   in aliphatic systems)   is
proportional to   the number  of   Fragments present  and to the sum   of their
values:         p   _  n oor  f j. f
                r - =  0.22Z  ft +f2
                f^ =  0.32E  fi +/2 +/3
                F   =  0.42Z

-------
                                                                     PAGE 11




    The  new aromatic  Fragment  definition  and multiple-Factor  have  been




 implemented on the  operating version of CLOGP,  except for  rings  in which




 1 aromaticity1  is difficult to define.     For.the purpose of caclculating log




 P,   it  is expedient  to define an  aromatic ring as  any not  containing a




 saturated carbon atom,  but we have not devised a system to allow a  carbonyl




 group  to be accepted into an aromatic ring without   disturbing other parts




 of   the  logic   of  CLOGP-I.   The   present  version   recognizes a  di-vinyl




 attachment in   the case  of quinone  and a  styryl/aromatic attachment   for




 naphthoquinone.   Adequate agreement with observed values is obtained if an




 aromatic Fragment value is  taken  in the first instance and  the average of




 aromatic and di-aromatic taken in  the  second.






 Rule Simplifications






   When  work on  this contract  began,  calculation of  the Factor  in aliphatic




 systems  arising   from the  electronic  effect  of halogens  near an   H-polar




 group was  not well-defined.    See  Ref.i3,p.   27-28.   Brand s tromlt*  recently




 published  a method for  calculating this effect  in the  Hammett  fashion  as  a




 product  of  (sigma x rho).    This   approach appears to treat halogen-halogen




 interactions  on  a  more  rational  basis  than  the geminal  and   vicinal




 corrections  incorporated in CLOGP,  but  it is no more accurate and much  more




 difficult   to  program.     In   Brandstrom's  examples   of  halogen-H-polar




 interactions,  the  (sigma  x rho)    product  works  well,   but in the wider




 selection  employed in  this  work it  does   not,   as will  be  apparent  below.




For  this reason we are  continuing with  a  pragmatic approach in  CLOGP.






   When  two isolating carbons  separate  a  halogen from a Fragment  capable of




H-bonding,  the  correction   Factor  is +0.55.    Normally   this Factor  should




only  be applied   once  for   each H-polar   Fragment even   if  a  beta-halogen

-------
                                                                    PAGE 12




appears on  more than one  of its valencies.    The early version  of CLOGF




introduced a Factor for EACH such occurrence.






   As expected,  the electronic effect of  a halogen on an H-polar Fragment




is greater when only one isolating carbon intervenes.  The Factors for each




type of occurrence which  had been measured by 1979 appeared  in Table IV-2




in Ref. 13 (p. 28).   All but two of these were handled by the early version




of CLOGP and all are now operational on the current Pomona version.






   To widen the scope of computer calculation of alpha-halogen effects, the




blanks in Table IV-2 (Ref. 13)  were filled in,  either by measurement (e.g.




the chloroacetic acids found in Table 1.) or by interpolation.  It was seen




that a  simplification was possibly whereby  the Factor does not  depend on




halogen type and there  is need to distinguish only three  types of H-polar




fragments.   The overall effect can best  be appreciated from Fig.  1 which




plots the correction Factor (always positive)  against the number of alpha-




halogens.   It  can be  seen that  sulfonyl-containing Fragments  remain as




sensitive to the electronic enhancement with the addition of the second and




third alpha-halogen,  while the increment is less for other Fragment types.




This data does NOT fit the Brandstrom  (sigma x rho)  scheme.   It is hoped




that  as  additional partitioning  data  is  acquired for  this  structural




feature, three Fragment types will continue to suffice.   Implementation of




this system could not be included in CLOGP-I but is planned for CLOGP-II.






   One of  the earliest  constituitive factors  recognized as  necessary to




calculate log  P (oct/water)  from structure  was the electronic  effect of




aromatic substituents.      The first  attempt by this  .author to  deal with




this  very  commonly-occurring and  often  sizeable  effect was  to  assign




'exalted1  values to certain susceptible Fragments when they appeared on the

-------
                                                                     PAGE 13

 same aromatic  ring as  one which  was strongly  electron-attracting.    See

 Table IV-1, p.23, Ref. 13.   in implementing this in the earliest version of

 CLOGP the sigma level for a substituent  to qualify as an 'Inducer'  was set

 too low,   and  the unnecessary corrections almost offset  the desired  ones.

 This has  been changed in the present operating version of CLOGP-I,  but this

 feature still is in need of improvement.


    Both Fujita16  17  and Brandstrom ^have proposed a Hammett-like treatment

 of  this correction Factor.    Both proposals  are unsuitable  for  our present

 computer  needs,   because of lack  of  suitable rho and sigma values,   because

 of  the  complications of sorting out   appropriate meta and para distinctions

 (especially for  fused ring  systems),    and because ortho substitution  could

 not  be  treated.   Much  of the  Principal Investigator's time  under   this

 contract* was devoted to  working out  a   practical  compromise  between the

 'quantum  level'   approach   (tried  and  found    wanting)    and   the   too-

 sophisticated approach of Fujita-Brandstrom.


   The  octanol/water partition coefficient of   aromatic  solutes  can  deviate

 from the sum of  Fragment values  due  to:    electronic interaction,    to

 intramolecular  hydrogen bonding,    to  alkyl  substitution,   and  due  to  a

 special  ortho effect.    It was   shown that  in 400   examples of  aromatic

 solutes,  these four Factors could reduce the  deviation between  calculated

 and  observed  log P values by over  three-fold.
*The work on this phase was not  complete at the expiration of sub-contract
T-6415(7197)-029.   It was completed under  ERL-Duluth Grant CR 809295-01-0
and  submitted for  publication in  the  Journal of  the Chemical  Society,
Perkin II.    The journal article  was too lengthy  to be included  in this
report in its entirety,  but is  submitted as supplementary material.   The
summary included in this section should suffice for most purposes.

-------
                                                                    PAGE 14



Electronic Factor






   It was found  that  the electronic effect  could be dealt with  using the



Hammett  (sigma x rho)  product 15with the following simplifications:   1)  a



single sigma parameter could be used for ortho, meta and para interactions;



2)   most substituents could  be  assigned to  either  an  'Inducer* or  a



'Responder* class,   greatly  limiting  the need  for  considering  a  'bi-



directional' interaction;  3) all halogens could be assigned the same sigma



parameter;  and 4)  generalized substituent structures could be used,  each



class receiving the same sigma or rho value.  Rather than use the classical


                                                      1S
sigma  constants derived  from ionization  equilibria,  it  was decided  to



derive a set which might be more appropriate to partitioning equilibria.  A



program for successive approximations was written in APL   and applied to a



model set of 90 di-substituted benzenes chosen to eliminate or minimize bi-



directionalinteractions.    An average  of  sigma  (meta+para)  values  was



introduced as the first approximation from which the first level rho values



were calculated.   These were used to re-calculate the second level sigmas,



and the dialectic process continued until the  change in both sets was less



than 0.01  unit.   The entire process was  repeated using  sigma inductive



constants   instead of the meta/para average,  and for a  third time using



values  for Field  effect  .   In  every  case the  hydrophobicity-oriented



sigma/rho  set  turned out  the same  within  .01  units.    The  greatest



difference from accepted Hammett sigma constants  were found for the nitro,



sulfonyl and carboxaldehyde groups.  This set, enlarged with values for bi-



directional substituents (Inducer/Responders) appears in Table 2.






   Ideally,   the  effect of  Hammett parameters  are additive;   i.e.,  two



chlorines have twice the effect of one.  This did not apply to their effect

-------
                                                                     PAGE 15



 on hydrophobicity,  but  it was possible to use the  same diminishing power



 series  to  all those  sigma  values  in  Table  2 which  were  studied  in



 multiples.   For two groups, the sum of sigmas was multiplied by 0.75;   for



 three, 0.60; for four or more, 0.35.   When the Inducer is any group except



 -N=,   the  rho values  of the  Responders are  averaged;  with  -N= as   the



 Inducer,   rho  values of Responders are  added until the product  reaches a



 maximum of 2.80.  See Table 4 for example calculations.





 Alkyl Substitution





    A   comparison has  previously  been made  between  aliphatic and  purely



 aromatic  solutes  which showed a different  relationship between log P and


              20
 molar volume.    It might be expected,  therefore,   that a correction Factor,



 albeit a  small   one,   would be required  when alkyl chains are  attached to



 aromatic  rings.    Because the simplest example,  toluene,   did not  appear to



 require any  such correction,   the effect  was overlooked  for some time.    In



 the 400 solutes  used in this study,  over  60 exhibited  this feature,  and its



 existence  was established beyond reasonable doubt.





    To evaluate the  alkyl-aryl  effect,  an indicator variable was  added to



 the regression equation relating the observed log P to the 'simple  additive



 log P';.>.  i.e.   that which would result   from addition  of  Fragment values



without correction Factors.   This indicator  variable was  given the value  of
        *

 the number of  alkyl groups on  the   ring  system on  which   there was  already



present some other  substituent.   Thus  the variable would  be  0 for  toluene,



1 for p-cresol,  2  for  xylene,   and  3  for 3,5-dimethyl phenol  (i.e.,  1  for



-OH/3-CH3; 1 for -OH/5-CH3; and  1 for 3-CH3/5-CH3).  For 69  examples in the



training set, the alkyl-aryl Factor was evaluated as -0.17.

-------
                                                                     PAGE 16



 Ortho Effects






    The effect   which ortho  substitution has  upon most  reaction  rates  and



 equilibria is  so complex that  only a few  authors have dealt with  it in  the



 classical  Hammett fashion. 21     In terms of  its   hydrophobic effect,  both



 Fujita  and Brandstrom   elected not to deal with it  in their first papers



 on electronic  effects.   Ogino  and Fujita22 did develop an equation for  ALog



 P in  2- and  2,6-disubstituted guanamines which relates  to the   problem.



 They showed that the sigma para value had to  be corrected by a Field Effect



 term for use in  the ortho position,   and that a   steric parameter was also



 necessary.   As  will be seen  below,    this information  proved  valuable in



 filling out the chart of Ortho Effects (Table 3)   by interpolation from  the



 rather sparse  data presently available.    Very strong  ortho effects show up



 in environmentally  important  compounds   from pesticides  to PCBs,    and in



 estimating   log P  values the   first  question  to be  asked  is whether   an



 intramolecular H-bond can be formed.






 Intramol H-Bonding






    The partition  coefficients  of   solutes   with substituents  in ortho



 position  are  generally   lower  than   if  they  were  in   the meta   or  para



 position,   UNLESS they have  the capability of forming  an intramol  H-bond to



which  the octanol/water  solvent pair  is   sensitive.    In that event a large



 positive correction is required.






   The  question of   the  'sensitivity*  of  the octanol/water solvent  pair



needs  to   be  addressed   further.     The   position of equilibrium  in   the



partitioning process  depends   on  the  free  energy   of solvation/de-solvation



as the solute  passes from one  phase   to  the  other.    Free   energy is,   of

-------
                                                                     PAGE  17




 course,   the difference  between the enthalpy and entropy  for the process,




 and if these differ  in the same direction and amount  between the free and




 H-bonded forms  of the solute,    then no difference  in log P  will result.




 These  enthalpic-entropic differences  are  NOT the  same  for all  solvent




 pairs,  and  the H-bond  Factor must differ  accordingly.   Looking  at the




 problem  from  a  slightly  different perspective,    we  see  that  in  the




 heptane/water pair,   heptane solvation is changed little whether  or not the




 nitro group in nitrophenol can  H-bond  with the hydroxyl;  the water phase,




 on the other hand,  can lose two potential solvation forces—H-accepting and




 H-donating when this  occurs.    In the octanoI/water pair,  both phases lose




 solvating  power when  this occurs,    with octanol   losing  more,    perhaps,




 because  of greater ordering necessary to orient the hydroxyl group attached




 to a long alkyl chain.    The observed difference in log F between o- and m-




 nitrophenol  in  the   heptane/water   system  is  +3  log  units,    but   in




 octanol/water it is -0.21.






    In evaluating the  H-bond Factor,   the appropriate (sigma x rho)   product




was  applied but  the F-HB  was  allowed to account for all other interactions.




 It was  applied as an  indicator  variable in  a regression  equation  of the




 form:  OLP = a(ALP) + b(rho x sigma)  + c(F-HB)  + d,  where F-HB can take the




value  of-1  or 0.   In 15  solutes  where a carboxy1 group was  ortho  to either




an -OH  or  -NH-  group,  F-HB  was found to  have the value  of  +0.63.   The




qualifying  pairs  of  substituents are seen  in  the  square  (sub-matrix)  in



Table  3.






Negative Ortho Effect






   There are  several  plausible reasons  for  a   reduction  in  log P  when two




appropriate  substituents  are  in close  proximity  on  an aromatic ring.  There

-------
                                                                     PAGE 18




 is reason to expect  that separated charges will have a  positive effect on




 log F if the distance between them exceeds a certain minimum,  but will  have




 a  negative  effect if  they  are  closer.   And  if  one  or  both  of   the




 substituents is  a polar  group attached by  a hetero  atom with  lone  pair




 electrons,  then the other member of the pair,  if bulky enough,   can prevent




 the first from attaining true planarity with the ring.   This would inhibit




 delocalization and  make  the group  more  like  one which  was   aliphatic-




 attached; i.e.  one with a lower Fragment value.






    Evaluated through regression  analysis,   with the negative  ortho Factor




 taking  integral values from   1  to 5,   59 solutes yielded   a value of -0.28.




 Table 3,   in matrix format,   shows the  number  of times this Factor must  be




 applied for   each  of  many substituent  pairs.    Keeping   in mind  that  the




 effect  arises  from both field (electronic)   and  steric forces,   Table 3 has




 been enlarged with interpolated values which appear  in italics.   An example




 of  a calculation using this  Factor appears  in Table  4.






 Summary of Aromatic Substituent  Interactions






   The  log P of  aromatic  solutes can  be  closely  approximated if,  to the  sum




 of  the appropriate  Fragment  values, one  adds four  correction Factors:    (1)




 sigma x rho  (+  second  sigma  x rho  if  both  substituents I/R).   (2)   -0.17 x




 (number of alkyl groups on already  substituted ring).   (3)  +0.63 x (number




of ortho  groups which  can  H-bond)   (4)  -0.28 x  (integer  in Table 3 for



appropriate ortho pairing)

-------
                                                                     PAGE 19

                         IV. Use of CLOGP-I Program
 A. Documentation
    Two  booklets—one describing  overall strategy  with  examples,   and  a

 second  alphabetically listing  the  subroutines—  have been  prepared  by

 Pennsylvania State  University and appear as  a supplement to  this  report.

 The program changes made at Pomona, which insure consistency of calculation

 regardless of the order in which structures are added,   are included in the

 complete program  (on tape,  1600 BPI)    submitted as  a supplement   to  this

 report.*


 B.  Program Options


    1.  The usual  one of output  to 'printer1  or  'terminal*.


    2.  Degree of  detail in output for calculation:   Penn State version gives

 three:    No   debug (NDO);   Calculation  debug  (CDO);   and  all  debug (ADO).

 Since  operator -  inspection of   CLOGP-I  output   is  ALWAYS  recommended,   the

 first  option is  dropped in the operational  version provided by  Pomona,   and
*It  should be  a matter or record  that   these were major  changes  and not a
minor  de-bugging.    It  was   largely due   to this  extra  effort that  the
original.timetable had to be abandoned.   The problem was first appreciated
when an  early version of  CLOGP would fail  to recognize a  simple halogen
fragment  after  it   had processed  several  multi-halogenated  structures.
Correcting a re-initializing failure seemed to solve the  problem.  However,
it cropped up again (with hydrogen-bonding Factors,  ether oxygen Fragments
etc.)  when an attempt was made to develop performance  statistics using the
Selected File of  Log P values from  the Pomona Data Base.    We were aware
that a similar version  of CLOGP was being used by  a commercial firm which
also was attempting calculations in  long runs on large  files.  Their output
was also judged  seriously flawed,   apparently from  the  same inconsistency
problem."  Serious efforts by this firm to correct the  program problem were
not successful  and were abandoned  in favor of  work on  a new approach.
Efforts at Pomona  were not successful either,   in the time  period  of the
contract or its extension.  Nevertheless, work on it was  continued, and the
program errors were found and remedied by September 1982.

-------
                                                                    PAGE 20

the others are called by (2N) and  (IN) respectively.


   3. Structural Input

a.   Cursor-equipped  CRT   (Penn   State):   Atom  and  bond  entry  creates

structural diagram.  (Program is hardware-dependent.)

b. Wiswesser Line Notation  (Pomona CLOGP-I; requires PL/I compiler)

(i.)  Entered individually  (following program-prompt)

(ii.)  Entered from structural file using command 'Fragval'.*

c.  Entry  by SMILES (SLOGP,  courtesy  Dr.  David Weininger,   Duluth ERL)

SMILES is a  rapid,  easily-learned system of converting  a two dimensional

structural diagram into  a  linear  array of conventional  atomic symbols and

bonds which  is processed by  computer to  yield an ADAPT  connection table

(needed to drive CLOGP) and also to return, to a suitable CRT,  the diagram

so that encoding accuracy can be verified.   Other features are the same as

entry via WLN.   This entry system is  operational at ERL-Duluth and can be

implemetented on any EPA installation of CLOGP-I.


C. Testing the Program




   With  a program  as complex  as CLOGP  it is  advisable to  periodically

verify that it processes all the Fragments and Factors as it did when first

put into operation.   In the event any changes are made, it is essential to
*If a measured log P is stored in file, program will give clogp, obsv.   log
P and deviation as output.  SAS (statistical analysis systems) treatment of
this data  is seen  in the  Results section.    It would  be desireable  to
determine the number of times each Fragment  and Factor type is  called when
calculating the 'select1 set and the average deviation associated with each
of them.    It is  not practical  to do this  with CLOGP-I,   but it  is an
important objective for  version II.   These statistics can  be  expected to
vary, of course, depending upon file orientation (i.e.,  drug-pesticide vs.
general organic raw materials, etc.).

-------
                                                                    PAGE 21




 verify that   other portions of   the  program were   not  disturbed.    It  is a




 simple task  to perform if  the structures  which appear in Table 6,  encoded




 in WIN,    are kept  as a file  in permanent storage.    They  appear  as the




 simplest  structures which will call each  Fragment and Factor.    They are




 sorted on Fragment formula.   They are  repeated in Table 7,   sorted on FCON




 number.






 D.  Need for  Human  Intervention




 1.  Program Warnings or Suggestions:   (values provided  by program)






    (a) Correction  for  ionic  form of  carboxy1 group.






    (b) Correction  for  zwitterion form of amino acid.






    (c) Correction  for  tertiary  amine chains.




 2.  Known  Corrections Not Yet  Implemented (No warning)






    (a)  Electronic effect  on  aromatic rings EXCEPT when R = -OH or -NH- and




 I » -N02, -CN, "-CF3, or -S02(X);   CLOGP-I averages all  these as +0.77;  for




 all other  combinations  see text  and  Table 2.






    (b) Alkyl-aryl  effect - n(-0.17);  see text.






    (c) Negative ortho effect - n(-0.28); text and Table 3.






    (d) Lactone » -0.9.






   (e)  Alicyclic   clusters = -0.45  (e.g bornyl and  adamantyl derivatives




Steroids  (fused  four-ring  system)   = -1.1;   these  also  need  special




corrections for substitution in 11 & 17 positions.






   (f)  N-oxide fragment value increased by 0.63 if adjacent to ring fusion

-------
                                                                     PAGE 22

 as in quinoline or acridine.


    (g)  If three adjacent Isolating Carbons in a chain have -OH and/or -NH-

 fragments attached,  the proximity correction  should be increased by 0.45;

 e.g. in chloramphenicol analogs.


 3. Anomalous Calculations to be Expected.


    A. Folded Conformations:

 N,N-disubstituted phenoxyacetamides provide a good example of this anomaly.

 Log F  values go  through a  minimum at  the diethyl  analog and  then rise

 normally at  least to  the dibutyl.    Conformation analysis  by CAMSEQ 21+

 indicates  that the  diethyl  is optimal  to fold  over  the benzene  ring,

 eliminating two hydrophobia surfaces  from aqueous solvation.   Lengthening

 one or both  alkyl chains necessarily exposes the extension  to the solvent

 once again.    N.N-dialkylamide substituents on erythromycins also are over-

 predicted.- On the other hand, two or more linked hydrophilic rings, such  as

 is found in clindamycin analogs,   are underestimated.    Some sesquiterpene

 lactones also  appear to  behave in this  fashion.     Half  of them  can  be

 calculated  very well,    but half  are  over one  log unit  underpredicted.

 Conformations  which   bring  a  carbonyl  and  hydroxyl  group  into  close

 proximity  (even though  their separation by 'skeletal1  route is great) are a

 possible explanation.    Folding or  'screening1   may be  the reason aliphatic

 rings  or chains of  eight or more carbons are likely to  be calculated higher

 than observed,   as  do  N,N-disubstituted amides with total  chain length  of

 eight  or more.*
*0f  course experimental  difficulties are  greater with   these  classes  of
solutes,   and one  can  never be  sure  whether  the  deviation   is  in   the
measurement or in the calculation.

-------
                                                                     PAGE 23




    B.  Tautomerism:   Sometimes the observed log  P lies between the values




 calculated for  the two tautomeric  structures,  but  this need not  be the



 case.






    C.  Peri  Substituents:   In quinoline  analogs,  H-polar groups  on the




 8-position are generally more hydrophobic than usual.    There is not enough




 data  to   generalize  about  other   substituent  types,     including  the




 1,8-disubstituted naphthalenes.






    D.   Hydrophobic  Groups Between  Aromatic  Rings:    2,2'-disubstituted




 biphenyls  have been  mentioned   above.    They  are  less  hydrophobic  than




 calculated (up to one log unit or more).   The same  applies to halogens on a




 methyl  or ethyl group between phenyl  rings (as in DDT  analogs).






    E.    N-nitroso-ureas  are  a type  of fragment  which  CLOGP-I   sees as  a




 combination.    Proximity effects,   such   as  the halogen-H-polar  interaction



 ICF-17,  are counted twice.






    F.  The  zwitterion correction (-2.4)   is  too   great for  an   amino acid




moeity connected  to a  very  polar  aromatic ring.






    G.  When a   benzene ring is totally substituted with  large halogens and




polar groups,  the presently known  interaction  effects  fail.






   H.  Solute  structures  containing a ring completely  surrounded by  other




rings (e.g. strychnine) may be calculated too  low.






   I. Oxy-N-heterocycles are poorly predicted.
                        V. Analysis of Test Results

-------
                                                                     PAGE 24




 A.  Data Set:  3517  'Selected1  Log P values from the Pomona Medchem Parameter




 Data Base.






    These values were selected on the basis of:  a) reliability (measurement




 error,  if known; or agreement with other measurements.)      b)  small  or no




 correction  needed to obtain value for  uncharged species,   except for  a  few




 values  for  completely ionized solutes.






    It should  be noted that  the solutes   in the Medchem Data Base are biased




 toward   bioactive   organics,    i.e.,     chemicals  with  pharmacophores   or




 toxiphoric  moeities.    Dndoubtedlly this necessitates   dealing with  a  wider




 variety of  Fragments and  Factors than  would  be present in  the  same number




 of  ordinary industrial  organic chemicals.    For this  reason  this test  may




 well  provide  'worse case* statistics.






    Before any   statistical  evaluation could be  made,   it  was  necessary to




 verify  program  consistency.    To accomplish   this the  entire 'selected'  set




 was calculated-and  the  values  stored  with corresponding WLNs;   the  set  was




 then  recalculated five  times,  each time after  a random reshuffling of  input




 order.    The  program then  saved the  WLNs  for which different   values were




 recorded.   This procedure  had to be  repeated several times before  all  the




 programming errors  could be located and corrected.   (All  versions of  CLOGP




with  dates  earlier  than  Sept.   1982   are  liable  to  produce  inconsistent




 results.)






B.  Range of Values:  Measured values of  log P in the  'Selected  Set' ranged




from -3.31  and -3.21  (pentaglycine  and  glycine)  on the  low side  to 7.54  and




6.36  (hexachlorophene and DDT) on the high.






C.  Percent  Fully  Perceived:    775 of  the 3517   selected  solutes  contained

-------
                                                                     PAGE 25

 'excluded fragments'  and elicited the  message "unaccounted  atoms."  This

 amounts  to 22%  of  the  structures and  is  very  close to  the  original

 objective of the program's level of effectiveness.  It should be noted that

 with operator  intervention this failure rate  can be cut  approximately in

 half,  if a higher probable error can be accepted.   The three unprogrammed

 approximations which make  this possible are:   a)   the difference between

 aliphatic and aromatic attachment at any fragment valence bond is about one

 log unit  (aromatic is higher),  and  few of the multi-valent  fragments in

 CLOGP-I  have values  for all  possible 'environments';    b)  enlarging  an

 already  hydrophilic Fragment  by  fusing  on another  hydrophilic  segment

 reduces the  original value  much less  than addition  of the  two Fragment

 values and can be approximated as -0.3; e.g., adding -NH2 to -CONH- changes

 the aliphatic Fragment  value  from -2.71 to -2.50 if  added to the right side

 and to -2.18 if added on the  left;   c) Many 'missing1  Fragments differ from

 known ones by a hydrogen atom;  a hydrogen atom on a Fragment usually has a

 higher value  than when  attached to  an  I.C.  but  can  be  approximated as

 0.40.*


 D.   Precision of  Calculation:   This  can be examined  for  each of three modes

 of  operation:    (1)  When the  only  output examined  is the final calculated

 log F;  .  i.e.,   the  warnings  and corrections  integral   to  the  output  are

 ignored.   (2) When the  corrections provided by  the program are applied (see

 IV.D.I  above),  and the  anomalies listed in IV.D.3 above  are removed.    (3)

When,  in  addition to the steps in (2),  the corrections listed in IV.D.2  are
*At the outset of this work,  some consideration was given to incorporating
these features as options  in the CLOGP program,  but the  time required  to
correct the  inconsistency problem precluded  this.   They are  planned for
version II.

-------
                                                                    PAGE 26




 applied.    Because   of  the  aforesaid  bias   of  the Pomona  Medchem  Data Base




 (e.g.   steroids  and  antibiotics  heavily represented),   the listed  anomalies




 reduce  the  'effective perception1  of  CLOGP-I to  a greater extent than would




 be  the  case if   tested  on a file of general industrial organics.   In mode




 (3)   it still   calculates 70%  of the   structures,   and  perhaps  half  of the




 remainder could   be  satisfactorily estimated (within 0.8 log unit with 90%




 confidence) by operator assistance with procedures provided.






    Mode  (1):  The statistical analysis of  the results  of CLOGP-I  operating




 in a  'blind mode1 is seen in Fig.   2.   The standard deviation of over 0.8




 log units is about twice that  of the  original  objective.   The mean of 0.076




 shows   that the   net corrections  which remain   to be  applied should  be




 positive,    especially   when one  considers that   a number of zwitterion




 corrections (-2.4, in program  prompt) have  NOT been applied.    The residual




 frequency   chart  shows   a fairly  normal distribution  with well  over  85%




 chance  of calculation within one log  unit.






    Mode (2):   The   statistical analysis of the results   when  the operator




 follows  computer  prompts   and  eliminates readily recognized  anomalies




 appears in  Fig. 3.   It  cannot  be stressed too  strongly  that simple operator




 assistance  can  reduce  the   standard  deviation to   half the level achieved




when the  program 'runs   blind*.   The  chances of  the  calculation falling




within  one  log  unit   of the  measured value are   now  greater than  98%.




However,  since aromatic  solutes  needing large corrections for electronic




effects  are now  included  in  the 'anomalous1 category   (even though  the




corrections  are known),  the  calculable  percentage  drops to 66%.






    Mode (3):  With  a very  small investment  of time  and effort  the operator




of CLOGP-I  can be instructed in  the additional correction procedures given

-------
                                                                     PAGE 27




 in Section III.    Use of these constitutes Mode (3),   and the statistical




 analysis of these results are given in Fig. 4.   The standard deviation has




 been reduced almost to 0.35, which is respectable considering the diversity




 of structures  in the Medchem Data  Base (ranging from  phenylglucosides to




 hexachlorophene).   The  residual frequency  chart indicates  that one  can




 expect 92%  of the  calculations to  be within  0.6 log  units of  measured



 values.






    A plot  of residuals vs.   clogp  for Mode (3)   is seen in Fig.   5.   A




 regression  of this  data shows  that the  apparent downward  slope of  the




 points is  not very significant  (squared correlation coefficient  = 0.11).




 Nevertheless,  it seems to indicate that very hydrophilic solutes tend to be




 calculated low and very hydrophobic ones calculated high.   This may reflect




 very real physical limitations on this parameter on both ends of the scale.






    Table 5   contains  the CLOPG  output for  the  5 solutes  with  the highest




 deviation as   calculated by   Mode (3),   together with   an analysis  of the




 reasons why  they  are  predicted poorly.











     Summary of  Results:   With   assistance  from an   operator who  has been




 given a reasonable amount of instruction,   CLOGP-I  can  meet   or exceed the




 original  objectives set forth in  this  contract;  i.e., when operated in Mode




 (3)   described above.    To  reduce  or   eliminate this  need for   operator




 assistance,  and  to  enable the program  to be  expanded   as new measurements




 are made  and new  interaction  effects   percieved,  an  entirely new  algorithm




must be employed.   In spite  of the failure  of CLOGP-I to  'stand alone1,  it




has proved invaluable in  charting out  a  course  for further development and




will  continue to  be  used in  practical  applications  until an   improved

-------
                                                                    PAGE 28




version becomes available.

-------
 BTDLIOCRAI'HY

 1.  Leo,  A.,  Hdnsch,  C. and CLkins, D.. Clu-m. K*v..  7i ,  575  (1971).
 2.  MirrLc'O*,  M.S.,  i? I. c. I., J. hed. Chem.,  1(?. 6i5  (1976).
 3.  Henry,  I)..  et.dL.,  ibid.,  19. 619  (1976).
 4.  Yamani.  1.,  el. a I., J.  Pliarrn. Sc i . , 66, 747  (1977).
 5.  Uncjer .  S.  H.,  el. a I..,  ibid.,  67, 1364  (1978).
 6.  Veith,  r,.,  et.al..  Water Res., 13, 43  (1979).
 /.  Bradwhow,  J.  and  Latter, I).,  GLaxo Research, private  communication.
    (1982).
 8.  Garst,  J.  E.  and  Wilson, W.  C., (I.) J. Chromatog.
    submi rled,  (1982).
 9.  Garst,  J.  E.,  (II.) ibid.
 10. Van  Duyne,  R.,  et.al.,  J. Phys.Chem.. 71, 3427  (1967).
 11. Fuji-fa.  T.,  Iwasa, J.  and Hansch, C., J. Am. Chem.  Soc. ,  86,  5175
    (1964).
 12. Rekker,  R.,  "The Hydrophobic FragmentaI Constant",  Elstvier,
    Amsterdam,  (1977).
 13. Hansch,C.  and  Leo, A.,  "Subs I i tutrnt Constants  for Correlation
    Analysis  in  Chemistry  ft Biology", Ch.3V.. Wiley In terseipnc«,
    N.Y.,  1979  (appears as Appendix A of this report).
 14. Brands trom,  A.,  Ac. ta Pharm.  Sutic . , 19, 175  (198?).
 1L>. Hammett,  L.  P..  "Physical Organic Chem i «>i ry* .   2nd Ed., MrCraw-H i L I.,
    N.Y..  1970.
 16. Fujita, T.,  in  "Prog.  Phyi.  Org.  Chem." A. Stre i twe i ser  am! R. Tal'l
    Edi.,  Wiley  Tntprsciencp.  in press.
 17. Fuji la. T.,  J. Pharm.  Sci.,  in press
 If!. Parafiif ler  Da tit Hose. Pofiitm^  College- Mt-ricin-m Project, Issue  »ri
    July 1982.
 19. Ulr i t ton by Steve Burns, Pomond College.
I'O. Leo, A..  Hansch. C. and Joiv, p.,  J. Med. Chem., i9,  611  (1976).
21.Char ion, M.,  in  "Prog.  Phys.  Urg.  Chem.'. A. Strei\weiser  &  R.
    Taft,  tds.,  Wiloy  Intersclence,  Vol. 0, p. 235  (1971).
1>2. Ogino, A., Mfllsumura,  S.  and Fujita, T., J. Meek Schem. .  23,
    437  (1980)'.
23. Maget-,P., Chevi on  Chemical and Peacock, S.,  Molecular Design,
     priv
-------
                               Table 1.
                     Compounds Measured by Shake-Flask
No.
 UL N

 K1U1

 E2E

 FR CMVN 1o,01


 FR DOV1MVR

 FXFFR C01MVR


 F4

 GR BVU1


 GR BV1

 GR COVIhVR

 GYGU1

 G1UiG -C

 G1U1G - 1

 L C666  I:V IV DQ GQ k
         M2M20 Nrf2M2Q

 L C666  BV TVJ DZ

 L C666  BV IVJ

 L6TJ AOV1  CMVNNOS.2G



 L6VTJ BM1  BR BG

 L66  BV  EVJ GO


 I.66J BMYZUS
NCR DUKV?,RU)2

NCR DUV1MVK
 VINYL  BROMIDE

 DIBROMOETHANE

 1(3'-FLUOROPHENYL)-3-MET
        HOXY-3-METHYLUREA

 P-FLUORDPHENYLHIPPURATE

 M-TRIFLUOROMETHYLPHENYL_
                HCPPURATF

 FLUOROBUTANF

 0-CHLOROBENZOTC ACID,
             hETHYL  ESTER

 0-CHL OROAf.ETOPHFNONF

 ri -CHLOkOPHEHYLHJ PPLIRATC

 V1NYLID1NE  UHLUR1DE

 1,2-DrCHLOROEFHYLENE

 1 ,2-DICHLOROETHYLENE

 ANTHRACENLDTME


 1 -AM J NOAt>ITHRAQUl NONE

 ANTHROQUINONE

 1(2'CLETHYL)-1-NO-3-(3'_
 CARBOMETHOXYCYCLOHEXYL  )
                     UREA

 KETAMINE

 5-HYDF
-------
                                                      PACE 2
 NC1U1

 NC20R

 QR  BV1


 QR  CG  Dfr

 QR  DMV1

 QR  DNU

 QVR BOR

 QVR BM1

 QVR L
-------
                                                      PAGE 3
RVM1VOK

RVR

RV1R

i'HR BVQ

SH1YZVQ

T C5  C6556/C-P/JP  C-
_3ACJ P  CX  EY  JXOV 0
 UTJ  BV« EUI FQ  M  NQ

T G5  D6  K666 CV  HO M
0 POTMTfcJ  1YU1  S01
                 T01

T307J D

T3DTJ BR

15-10- HOVY liU LUTJ_
             DU1 l-l L

T'5N Ci'J  BZ

T5NJ  A

T5NMVTJ  AR


TSNVTJ A1U1

75NTJ AVMRA H  E

T50 COTJ PR BOVM1

750 COTJ

T50J  BVH

T501J

15SUTJ

T5VMVJ

TS« BM DN FMVMVJ

FS6 IU1 DN FVri  ]NJ  HZ

TCJ6 Hh DHJ  CZ
PHENYLHTPPURATC

BENZOPHENONE

DEOXYBENZOIN

THIOSALICYL1C ACID

CYSTEINE

GIBBERELIC ACID



ROTENONE



PROPYLENE OXIDE

STYRENE OXIDF

COSTUNOLIDE


2-AMINOTHTAZOLC

N-METHYLPYRROLE

1-PHENYL-3-PYRAZOLIDJN_
                     ONE

N-VINYL-2-PYRROI.IDINONE

CISANILIDE

DIOXACARB

1,3~DIOXOLANE

2-FURALDEHYDE

TETRAHYDROFURAN

TETRAMETHYLENESULFONE

MALEIHIDE

XANTHINE

GUANTHE (AT I'H=13)

2-AMINOBENZJhTDAZOLr
4
4
4
4
6
0
4
4
4
2
4
7
4
4
4
4
4
6
5
4
6
10
4
4
2.31
3.12
3.19
2.39
-1.87
0.24
4.10
0.03
1 .61
2.09
0.33
1 .21
0.89
0.37
2.33
0.67
-0.37
0.41
0.46
-0.77
-0.29
-0.73
-0.91
0.91
.03
.02
.04
.03
.03
.10
.05
.02
.03
.01 8
.01
-.04
.02
.01
.02
.01
.03
.02
*
.04
.02
.or.
.01
.04

-------
   /* PNJ
        DO CHJ K01VN1
                Y&&1Y
 T56 EO DO CHJ G1U1V-
              _AT6NTJ
 T56 DO DO CHJ G1U1VZ
 T56 PO DV CHJ C C IQ

 T56 UCU&J

 T56 BOr&J C C IQ

 T56 Pi1 HNJ
 TAN CNJ B DZ ElhVNNO
                  &2G
 TANJ  HOR

 TANJ  B I) F

 TANJ  PI  2U


 TANJ  CC,'

 ToNJ  DNU

 TANJ  DVH


 TANNVNJ  AO D1R


 TANTJ  A- ALATfJ  AR

 TANTJ  AV1U1R

 TAVMrlVJ

 TAG CO LOTJ EOF

 TA,-', UNNNVJ DiS'l1
TAA PUPO EHJ C.V  COi

TAA CNJ CO

VHR Bf,
      1 -METHYLINDOLE

      N,N-DI-l-BUTYL-3,4-DIOXY
          HETHYLENECINNAMAMIDE

      N(3,4-METHYLENEDIOXY_
        CINNAMOYDPIPER1DINE

      a^-DIOXYMETHYLrNE.
                   CINNAMAMIDE

      3-KETOFURAN PHENOL

      2,3-DIHYDROBENZOFLIRANt

      CARBOFURANE PHENOL

      A-AZATHJANAPHTHbNE

      ACNU


      2-PHENOXYPYUTD1NE

      COLLIDTNf

      1 ,2-DKALPHA-PYftlDYL)
                      ETHYLENE

      3-HYDROXYPYRID1NE

      4-NITROPYR1DINE

      4-PYRID1NE  CARBOXALDEHYDE
     4-PENZYL-1,2,4-TR1AZ1NE-
                3-ONE-1-OXIDE

     PHENCYCLIDJNE

     N-CINNAMOYL PIPERJDINE

     MALEIC HYDRAZIDE:

     PARALDEHYDE

&01  ftZlNI-'HOi1 METHYL.

     S'ALITHION

     J-QUTNOL1NC-N--OXIDE

     O-CHLOROBIINZnLDIIIIYDF
F'AGE
4
4
4
8
4
5
5
4
3
3
4
4
4
3
3
3
2
4
6
5
3
4
4
3
4
2.72
4.37
2.82
1.40
1.87
2.14
2.08
1.74
0.94
2.39
1 .88
2.11
0.48
0.33
0.43
0.19
-0.01
2.74
-0.84
0.52
-.75
2.67
0.25
2.33

.01
.03
.02
.01
.01
.03
.02
.03
.01
.005
.04
.03
.01
.004
.02
.03
.05
.01
.02
.01
.03
.OS
.01
.04

-------
                                                      PAGE 5
VHR CXFFF

VH1U1

VH2

UNR B CNU  ENl-J

UNR BM1

UNR BNU

UNR BC)1

UNR HQ CNU ENU

UNR t
-------
 ZK  COV1MVR

 7.R  CVQ

 ZR  DI

 ZR  DSWMV01

 ZSUR COV1MVR


 ZSUR CSZU

 ZVH

 ZVR B01

 ZVR BVZ

 ZVR BZ

 ZVR C01MVR

 ZVYQ

 ZV1U1

 ZV1U1R

 ZV1VZ

 Z1R DNU

 1MV10R

 1MV1U1R

 1MY1R


 1 NR&R

 1NRJ.V10R


 1N1iV10k


1N1&V1U1R

1ON&1&VMR
 H-AMTNOPHLNYLHIPPURATi:

 M-AMINOBENZ01C  ACID

 4-IODOANILINE

 ASULAM

 M-SULFONAMTDOHHENYL_
                HIPPURATE

 1,3-BENZENEDISULFONAMIDE

 FORMAMIDE

 ANISAMIDE

 0-PHTHALAM1DE

 ANTHKANtLAMTDE

 M-CARfiOXAMlDOPHENYL_
                HIPPURATE
 LACTAriTDE

 ACRYLAMIIiE

 CINNAMAHIDE

 MALONAMIDE

 H-N1TROBENZYLAMINE

 N-METHYLPHENOXYAr,ETAMIDE

 N-hETHYLCINNAMAMTDE

 N-METHYLAMPHETAMINE (AT_
                  PH=13)

 N-METHYLDIPHLNYLAMINE

N-METHYLPHENOXYACET_
                 ANILIDE

N,N-DIMETHYLPHENOXY_
              ACETAHiDE

f^N-DIMETHYLCINNAMAMIDE

1-PHENYL-3-METHOXY-3-
              METHYLUKEA
PAGE
4
7.
3
8
9
4
A
4
4
4
3
5
4
4
5
3
4
3
3
4
3
4
4
A
6
1 .30
0.07
2.34
-0.27
0.84
-0.55
-1 .51
.084
-1 .73
0.35
1 .20
-1 .39
-0.67
1 .43
-2.01
1 .06
1 .02
1 .81
2.07
3.90
2.26
0.80
1 .73
1 .29

.0!J
.04
.02
v
.04
.04
.02
.01
.03
.03
.04
.06
.02
.01
.06
.01
.02
.01
.03
.05
.04
.2
.01
.02
10N1,%VMR D02R I)
   -(2' (4'-MF.THOXYPHENYL)
3.81

-------
                                                       PAGE 7
 lONI&Vrtk DU40R



 1QR  D

 iOR  C

 10R  CQ  E01

 10R  D

 1I3R  DOV1MVR

 10VR B


 1 02 OR

 1VMK COV1MVR


 U'OllJi

 1Y&OP04.S1R&OY

 20P5&2&SR

 2GVM1


 20VR BV02

 20V1U1

202

4N4&V10K
  CTIIOXY) PHFNYL) 3-ME7I-IOXY
          -3'-METHYL UREA

 N-1-MEO-N-1-METHYL-N'-3-
 (4-(4-PHENOXYBUTOXY)
              PHENYDUREA

 0-HETHYLAN1SOLE

 M-METHYLANISOLE

 3,5-DIMETHOXYPHENOL

 P-METHYLANISOLE

 P-METHOXYPHENYLHTPPURATE

 0-TOl.UIC  ACID. METHYL
                    ESTER

 2-METHOXYETHOXYBENZENE

 M-ACETAhIDOrHENYL_
               H1PPURATE

 VINYL ACETATE

 IBP

 FONOFO.V

 N-HETHYLCARBAMTC  ACID.
             ETHYL. ESTER

 DIETHYLPHTHALATn

 ETHYLACRYLATE

D1CTHYL ETHER

N,N-DIBUTYLPHENOXY_
             ACCTANILIDE
3.57
.03
3
3
4
4
4
4
4
4
4
2
4
4
4
4
4
4
2.74
2.66
1 .64
2.81
2.28
2.75
1 .73
1.70
.073
3.47
3.94
0.34
2.47
1 .32
1 .00
3.23
*>
.01
.02
.06
.04
.04
.04
.04
.01
.01
.07
.03
.OS
.06
.03
.07
* Concentration dependent; extrapolated to zero cone.

** Measured at pH 7.4; other values obtained at pH 5.4, 6.5, 9.0 and 14.0

-------
                                   TABLE 2.
                           Sigma and Rho Constants
No.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
Sigma Rho Generalized Structure
0.84 0.00
0.71 0.00
0.65* 0.00
0.65 0.00
0.60 0.00
0.49 0.00
0.28 0.00
0.58 0.44*
0.51 0.27
0.32 0.35*
0.32 0.72
0.17* 0.50?*
0.50° 0.88C
0.00d 0.50d
0.00 0.61
0.00 1.06
0.00 1.08
-N-
-S02F
-so2-x
-CN
-NO
~CF3
Halogens
-CHO
-C(=0)-X
-CONH-X
-0-X
-S02NH-X
-S-X
-OH
-NH-X
Examples
a
pyridine, quinoline
X = alk, N(Me)2

F, Cl, Br, I

X = alk, OCH-, C.H., N(Me)0
j o j 2
X = H, NH2, CgH5, alk
X = alk, CONHCH , CON (Me) ,
mCn V PnTO— allr^
_I>U_H, ru^u— aj.Kj_
£* /n &
X = H, C-H..
b y
X = H, alk
-N(Me)2, -N=NN(Me)2
X = COMe, CON(Me)2, CHO, alk,
* Not determined by  successive approximation program.
a. Effect cut in half for Responders on non-hetero ring.
b. With original training set of 90 solutes, 0.51 was obtained.  With the set
   enlarged with bi-directional solutes, 0.50 gave coefficients for the F  term
                                                                         o
   closer to unity.
c. Acts either as  'I1 or 'R* but not both at the same time; i.e.   it is not
   truly bi-directional;  exception is solute #208 in Table 5.
d. Not well characterized;  should be considered tentative.

-------
                                   TABLE 3.
                             Ortho Factor Levels

                                   I  ^^
     CM  CO
    O  t*4
             Mt-l
 CM 0  0  CM J3
O  ^  K O  O
                                          en
                                                   H CM
                                                  N  N
                                            U*  O  tS
                                                                M •• 3
                                                           CM  CO PL. ijpq
                                                          UK  I  . I
                                                          a  o  •«* v.-&
                                         sf
                                               NO  r»«
                                                     co
                                     d  -H^CM"
                                     CM  CMJ-CM

                      110131
            (1)
    ..
W = OMe, Me, N(Me)2
 X = H, NH2
 Y = CONHHe, COMe,Me,CON(Me)2
     OCH2C02H
 Zj= CONH2
 Z2= COMe
 *This level becomes  5  if Y = CCHC
                                O 3
 (  )=  borderline effect
 Within submatrix,1Hydrogen Bonds', F
                                      o
 t s anomalous; see  text
Italicized numbers are interpolated.
                        0
                        0
                        0
                        1
                        1
                        1
                        1
                                                      2  0 (0)    1
                                                   2
                                                   2
                                                   2  3
                                                   1
                                                        0  0
                       ,Intra-Mol.  , 0
                       'HYDROGEN    » 0
                     0, BONDS      ) 1
                       •            . 2
                                             (1)
                                                        1
                                    o
                                                           0(0) (0)
 1.
 2.
 3.
 4.
 5.
 6.
 7.
 8.
 9.
 10.
 11.
 12.
 13.
 14.
 15.
 16.
 17.
 18.
19.
20.
21.
22.

-------
     Solute
  1. 2,3-dichloroaniline

  2. 3,4-dichloroaniline

  3. 2,4-dichlorophenol

  4. 3,5-dichlorophenol

  5. 2,4-dibromophenol

  6. 3,5-dinitrobenzamide

  7. 2-aminopyrimidine

  8. 2-aminopyrazine

  9. 2,6-dinitro-4-CF3-
         aniline
10.  3-iodo-4-amino-
      benzoic acid

11.  3-bromo-4-aniino-
      benzoic acid
12.  3-chloro-4-amino-
      benzoic acid
13.  4-fluoro-4-amino-
      benzoic acid

14. 2,3,4,6-tetrachloro-
      phenol
                        TABLE 4.
              Multiple Electronic  Effects

              _OLP_ALP_       Fa	   Calc.
              2.78  2.32  .75(.28  +  .28)(1.08)     2.77

              2.782.32  .75(.28  +  .28)(1.08)     2.77

              3.08  2.88  .75(2)(.28)(1.06)-.28*   3.05

              3.44  2.88   .75(2)(.28)(1.06)        3.33

              3.22  3.18   .75(2)(.28)(1.06)-.28*   3.34

              0.83  0.12   .75(2)(.6)(.72)          0.77

             -0.22-1.63   .75(2)(.84)(1.08)       -0.27

             -0.07-1.45   .75(2)(.84)(1.08)       -0.09

              2.29 1.26  .6(.60+.60+.49)(1.08)     2.35

              1.65 1.99  .75(.28-K32)(1.08-K35)*2
                                    -2(.28)*      1.75
              1.49 1.73  (as #10.)

              i.33 1.58  (as #10.)
1.49

1.34
              1.2971.01   .75(.28-K32)(1.08+.35)r2
                                     -.28*       1.05

              4.10 4.30   .35(4)(.28)(1.06)-2(.28)*4.16
 Dev.

 +.01

 +.01

 +.03

 +.11

 -.12

 +.06

 +.05

 +.02

 -.06


 -.10

 0.0

-.01


-.24

-.06
     ;  see text and Table 3.
                                           NH3
                                                                -0.85  +
ALP - log P pyrinidine + 2 !!„„   + n
                             IUU      f^LJ ^ T_f
                             ««-     cnov.,n
                               f-       e. b j
          0.40          +2 (-1.23) + 2.01

Fa **  (n=2)coef.       Za             Ip

      (0.75)  x  (.84 + .84)  x (1.08 + 1.08) =  2.72

                        obsv.  = 1.58     calcd =  1.87
                                                                         (C-l)

-------
                                Table 5.

              Solutes with Highest Deviations  —  Reasons
 1. FXFF1NNO&1XFFF                  Frag.  Sum  =  -2.94
                                    Factors:   9(ICF-3) = 9(-.12) = -1.08
      F     N-NO  F                           6(ICF-5) = 6(.53)  =  3.18
    F-C-CH2-N-CH2-C-F                           ICF-17 = 0.55    =  0.55

                                    Calc.  = -0.29;   Obsv. = 2.15

    Reasons:  CLOGP finds only one Fp   . (Halogen-H-Polar); should find
            two cases of B-Factor for  ~  three fluorines as in Fig. 1.

            Correct ICF (replacing #17) =  2(1.20) =  2.40

            Corrected Calc.  = 1.56.

    Conclusions:  -NNO fragment is very sensitive-to o; may be in class
                  like -S02-X.

 2.  GR  CG B1UYV02&V02              Calc. = 4.897;   Obsv. = 2.69

                                   Reasons:  From parent and 2,4-Cl? analog
                                             it can be determined tnat a
                                   negative ortho effect operates on 2,2-
                                   disubstituted styryls.  For the first
                                   ortho C15  F = -0.80;  for the second, F =
                                   -1.20.
                                   Corrected Calc.  = 2.90

 3.  Q1X1Q1QMVMR                     Frag. Sum = -2.405
    HOCH                            Factors:   7(ICF-3)  =  -.84
       2           ^^                     2(ICF-4)  =  - 44
    HOCH^t-HHCONH-oJ                    3(ICF-12)=  3(.835)  = 2.505
    HOCH2            —            Calc. =-1.18;   Obsv.  = 0.43

    Reasons:  When H-polar fragments are on three adjacent ICs,  the present
             proximity correction is understated, as pointed out in section
             2(g), p. 17.  It occurs twice in  this  solute; additional
             correction = 2(.45) = 0.90.

             Corrected Calc.  = -0.28

4. G1U1X2&OVZ1UU1                  Calc. = 0.33;   Obsv.  = 1.71

            CH2-CH3                Reasons:  A new  positive Factor is  needed
   C1-CH=CH-C-C=CH                          when a  strong ethynyl  pi-electron
            6-CONH2                cloud is forced  close  to a carbonyl  oxygen.

-------
                            Table 5.  (cont.)
5. T B6566 B6/CO 4ABBC R BX FV HO PN GHT&&TTJ CQ JQ  P2U1   (Naloxone)

                                   calc.  = 0.682; Obsv.  = 2.09

                                   Reasons:   Since morphine analogs
                                             without the  alicyclic -OH
                                   are well-calculated, it is possible
                                   that this  group is not freely solvated.

   It should also be noted  that the measured  log P for naltrexone, with
   a cyclopropylmethylene replacing the allyl  on the nitrogen atom is
   0.17 units lover when it is expected to be 0.5 units higher than the
   above.   This  might indicate either:  unusual  conformation or in-
   correct measurement for  naloxone.

   No conclusion possible with present  data.

-------
                      Table 6.

   FRAGMENT and  FACTOR VERIFICATION LIST
   Simplest Appropriate Structures in WIN
   Ordered by Fragment Formula

   E1tM
   G1**2
   GR**32
    FR**33
    WSFR**34
A.  FFFFFSR**227
    I1**4
    IR**35
    1N1&R**36
    1NR&R**67
C.  1N1&1R$*84
    TANJ**103
A.  T5NJ  A*k140
A.  T5NJ  AR**164
    ONRt*37
   T6NJ  AOT>)*107
   UN1*#6
   UNR$*38
   UNO 1*fc1 98
   *U51f,NRi,SUl**no   INFINITE LOOP (i.e., logic error in program)
   T6NNJ**141
   ONN1f.U**28
   ONN1&R**200
   UNR DNUNN1&1**218
A.  T3NTJ  A-  3/P.V/*fc206
   2D2**7
   UNR D02**93
   T50J**106
   101**B
   OPR&R&Rt*201
E.  T5SJ A0$»109
   10PO«.RA01$*202
   10Pi'8,01&OR*>*208
   US1&01
   USR&ORM-213
   10PO&0 14.01 4*1 46
   m P n A. i ii ,x

-------
   1S1**1 1
   1SR$*44
   RSR**71
   RSSR**226
   1M1$*13
   1MR$*46
   R MR * « 72
   WNR DM1**94
   T5MJ**105
   USR&M1**209
   WS1&MR**79
   T5MNJ$«143
   T5MNNJ$«142
   QR**48
   Q1R*#85
   SHR*#49
   Z1$*17
   ZR*#50
   ZR DNU$«96
C. Z1R**86
   1SPZO&01*#205
   ZSWR$*51
   ZSUR DNU**97
   RMMR$*228
   RMM1R**229
   ZMR$*#9
   ZSUMR$#214
   ZMSW1*#215
   1X$«18
   1XR*»52
   L66J$*111
   T66 BNJ**1 12
   GYGUNR**220
   NCR$«53
   NC1R*#87
   1VNU1$»20
   1NR&VR*#74
   10VN1'&1$*190
   1N1&VOR**191
A.  1VN1&NUNR**178 (recognizes 1VNR&NUN1 instead)
   1V1$«22
   iVR*#56
   RVR**75
   1V1R$*89
   WNR DV1$*9Q
E.  LAV DVJ$«115
   T6DYTJ BU.S$«192
E.  OV1$«24 (gives 'ionic correction fro neutral fragment)
E.

-------
   WNR DV01$*99
   RVOR**76
   T66 BOVJ**114
   10V1R«*90
   R**113
   MUYR&R**221
   1NU1R$*144
   RNU1R**145
   VHNU1$-**15
   1 MVRt*59
   •1VMR$«81
   RVMR**77
   T6MVJ$«116
   T5NOJ***6
   10VM1$M157
   1VMOR*#J*18
   10VMR$»82
   T6MYTJ BU54-K186
   5UY1&MR*»187
   NCMR$»197
   1N1&VKR$*161
   ONN1&VMR**164
   VH1$-»27
   VHR$«-62
   VH01**193
   VHOR*»194
   GjV1$#28
   QVR$*63
   GJV1R**.91
:   ZVR$w64
A.  ZV1R$*92
   ZVR DNW$»100
   QNU2$*26
   ZVOR**18B
   ZV01R**189
   QMVR$»«3
   2UNM1**148
   1MVMR**65
   RMVMR$#78
   T5MVMJ**165
   ZVN1&R$«151
   ZVNRiR**152
   ZVN1&NO*«167
   ZVMR*#66
   ZMVR*»179
   SUYZM1**168
   SU_YZMR$*169

-------
    I V V M» w | 7 :>
    RVVR$*196
    1VMV1$*181
    RVMVR$*182
E.  T5VMVJ$*183
    VHMV1$»184
    RVMNU1R**180
    ZVMV1$*170
    ZVMV1R**171
    ZVMNU2**172
    ZVMNU1R**173'
    SUYZMNU1R$*174
    ZMVMNU1R**175
    1VMVMV1*#I76
    3H**117
G.  1U1R*#122
    R1U1R$*154
    R1UU1R**154
    1UUi**125
    1Y$tt126(ICF-4)
    QY$*127
A.  L66 B6 A B- C 1B ITJ$«128
    GYG*#129(ICF-5)
    GYGG$«130
    GXGGG*#131
    G2G$«i32(ICF-6)
    1 OPO&0 1 &OR*# 1 55 < ICF-30 )
    Qia2**133(ICF-30)
    10101**133(ICF-7)
    10201$#134(ICF-12)
    T60 COTJ$*135(ICF-8)
    T60 DSTJ*#136
    T6M DOTJ$*153
    QVR BZ*»139
    QVR BQ$«156
    1UU1R$K«21
    ZV2G*#«25(ICF-17)
    T60TJ BQ**#26(ICF-9)
    T60TJ B01**#26
    T60TJ
    T60TJ C01$##27
H.  F
   FXFFVMR*#*#*8
   FXFFOR*####9
   FXFFSR*«###10
   FXFFV01***#*13
            6
   GYGVMR*#««#17
   Q1
   01

-------
   E1 VMR*****23
   Q2E*****24
   EYEVMR*****25
   FXFFSWR*****27
   G1 V1$w«**28
   E1 VR*****29
   QV1F*****3©
   QV1G*****31
   QVXGGG*****33
   E1V01*****34
   QV1E*****35
   T60TJ BQ$(ICF-9>
   T60 COTJ  BQ*(ICF-1©>
   T60TJ BQ  CQ$(ICF-16)
   Z1VQ*U
A. Value assigned; fragment not recognized.
B. Value correct; differs  from text (appears only on CL06P  RESULTS which follows.)
C. Value assigned; fragment recognized but bonding assignment error.
D. Preferred value as shown (appears only on CLOGP RESULTS  which follows.)
E. Value assigned; fragment not recognized from WISCT assignment of aromatic
   or ionic bonds.
F. May need warning for alicyclic correction
G. Conjugated and non-conjugated values reversed.
H. Could replace warning with a value.
   $ ends WLN; the number of asterisks which follow identifies FCON list;
     i.e., $****! is the first value in FCON-4
   (ICF-#) refers to identification of Factor as seen in CLOGP printout.

-------
                     Table 7.
 FRAGMENTS in Simplest WIN
 Ordered by (FCON-1) Number
 E1**1
 G1$*2
 F1**3
 I1**4
 WN1$*A
 202*»7
                                   Unused:
 OS1&1*#9
                                       cannot be retrieved
                                       dormant
 SH1*#16
NCS1$w2i
1V1$«22
1V01*#23
OVi$«24
QNU2**26
VH1**27
ZV1**29
ZVM1**30
ER**31
USFR**34
IR*M35
1N1&R*#36
ONR$-»37
UNR«M38
10PO&01&OR**43
QR$»48
                                   45:  cannot be retrieved
                                   47:  dormant
ZR*»5G
ZSWR$«!>1
1 XR$«52
NCR**53
NCSR**55
i VR*#56
1 OVR$*57
OVR$*58
1 MVR**S9
1MVOR$*61
VHR$*62

-------
 1NR&R$*67
 ROR$«68
                                                       2.
USR&R**70
RSR**7t
RMR**72
WSR&MR**73
1NR&VR**74
RVR**75
RVOR$*76
RVMR**77
RMVMR**78
1VN1&R$*80
1VMR**81
10VMR$*82
E1R$*83
NC1R$«87
NCS1R**88.
1V1R$«89
10V1R$*9G
QV1R$«91
ZV1R**92
UNR D02**93
WNR DM1**94
ZR DNU*»96
ZSUR DNU**97
WNR DVi$«9a
UNR DV01**99
ZVR DNW**100
T6NJ**103
T5NJ AR**1©4
T5MJ**'1'05
T50J**106
T6NJ A0**107
T5SJ**108
T5SJ A0**109
1R**110
L66J$«111
T66 BNJ*#112
R**113
T66 BOVJ**114
L6V DVJ**115
 95:
101:
102:
                                      can be retrieved by:  WNR CQ;
                                      is obsolete.
                                      can be retrieved by:  WNR CNW; Obsol
                                                    11:   WNR CO!;   "
3H$*117(ICF-3)
L3TJ**118(ICF-1)
1U1*#121 (ICF-2)
1U1R«*122
1UU1**125
1Y*#126(ICF-4)
QY**127
L66 B6 A B- C 1B
GYG$wl29(ICF-5)
GYGG**130
                                 123:  dormant
                                 124:
                  ITJ**128
G2G*#132(ICF-6)
Qi02$»133(ICF-36)
10101*wl33(ICF-7)
10201$«134(ICF-12)
T60 COTJ*»135(ICF-B)

-------
T5NJ  A$«140
T6NNJ$«141
T5MNNJ$-M42
T5MNJ**143
1NU1R$4f144
RNU1R$*143
10PO&01&01**146
WNR DOPO&01&01**147
2UNM1*«148
                                 138:  obsolete.
                                                   13 >
                                                      3.
ZVNR&R$*152
T6M DOTJ$»153
-------
*USi&NR&SU1*«210   INFINITE LOOP
WSR&OR$»213
1N1&NUNR$*217
WNR DNUNN1«,1**218
 &YCtUNR$w220
 MUYR&R$»221
 QYR&UNQ*#222
 HUYZM1$«223
 RSSR$*226
 FFFFFSR$*227
 RMMR$*228
 RMM1R$«229

 (FCON-2)

 WS1&01$**1
 VHMR$**2
 QMVR$**3
 QMR$**4
 1VOR$**5
 T5NOJ$**6
 ZMR$**9
 VHN1&!$**! 5
 RNUNR$**16
 ZV01$**17
 1VMOR$**18
 ZYUS$**19
 ZYR&US$**20
 1UU1R$**21
 ZV2G$**25
T60TJ BQ$**26
T60TJ CQ$**27
ONN1&1$**28
                                    00.    .
                                    224:  dormant
                                    7: =-1.05;  dormant?
                                    8: =-1.83;  can't retrieve: SUYZN1&1
                                   10-14: empty
                                  22, 23:  can't retrieve with: 1V1UU1  or
                                           101UU1
                                      24:  empty

                                      29:  obsolete

-------
                    rii-fr-:.-  :
                    ::.rp::  ;
                    .  —L-	
                    . rTjTTTT.
                              r- •   -i
                              Fig. jl.
              .    ::..:.; LdI;.:.:.;..;-;; | ;.;>:;.   ;),..:
            Halogen-H-Pplar  Interaction in Aliphatics

i::o;_

-------
             Fig. 2. CLOGP-I  Mode (1)

             5 T A 1 I S T I C A L
                         ANALYSIS   S1 Y S  I ]£ M
                                  15.1M THURSDAY, SlIPrLMBEk 9.  190.'
VARIABLE

OLOUP
CLOGf-1
RES ED
   N
MEAN
.VTD DEV
SUM
MINJflUrl
MAXJHUM
27-12  1.74209336  1.30126617 4776.020000 -4.08000000  7.54000000
2742  1.66663239  1.64775191 436V.906000 -A.71500000  S.10000000
2742  0.07046098  0.01211378  206.714000-4.50500000  5.19700000
CORRELATION COEFFICIENT BETWEEN OLOGF AMI; CLOGP =  .87071
                                     R-SQUARED  =  .75813
                              FREQUENCY BAR CHART
MIDPOINT
RES ID
1
-1.25 I * «•***
-1.15 | *
-1.05 IM
-0.95 |*
-0.35 |*K
-0.75 |**
-0.65 |fc)n*
-0.55 !*«•**
-0.45 I**-**
-0.35 I******
-0.25 | #* **»)***•
-0.15 | >¥•***> )••*•*• ••}»•* -4
- 0. 05 | K « )!•<'• 'i * *• l< > Hr » )* *• Ji- mi if *}i--
0.05 | A >j»)tj>»-»jr *•*•*•* rfit*
0.1 'J | *:*««**•*•*•***•
0.2lr> | i"***-»Jfrf#»-it
0.35 |«h-Wi^^
0.45 |i*i**^
0.55 JMKXV
0.65 |**«
0.75 |*<*
0.85 |^i*
0.95 | )••
1 . 05 | *
1.15 I*-
A O CT 1 \. u v \r •..• \. • \
1 > 1 vt "ft J^" )f *Jl *lt P" )t

FREQ

101
14
10
25
33
37
51
70
02
1 11
162
242
n •* 443
294
222
199
121
89
87
55
37
34
25
22
18
158
                                                CUM.
                                                FkEU

                                                 101
                                                 115
                                                 125
                                                 150
                                                 183
                                                 220
                                                 271
                                                 341
                                                 423
                                                 534
                                                 6V6
                                                 938
                                                13lh
                                                1675
                                                1897
                                                2096
                                                221 7
                                                2306
                                                2393
                                                2448
                                                2485
                                                2519
                                                2544
                                                2566
                                                2534
                                                2742
                                          PERCENT
                                             3.68
                                             0.51
                                             0.36
                                             0.91
                                             1 .20
                                             1 .35
                                             1 .86
                                             2.55
                                             2.99
                                             4.05
                                             5.91
                                             8.83
                                            16.16
                                            10.72
                                             0.10
                                             7.26
                                             4.41
                                             3.25
                                             3.17
                                             2.01
                                             1 .35
                                             1 .24
                                             0.91
                                             0.80
                                             0.66
                                             5.76
                                         CUM.
                                       PERCENT
                                            68
                                            19
                                 3
                                 4
                                 4.56
                                 5.47
                                 6.67
                                 8.02
                                 9.83
                                12.44
                                15.43
                                1 9.4 I
                                25.30
                                34.21
                                50.36
                                61 .09
                                69.18
                                76.44
                                80.85
                                84.10
                                87.27
                                89.28
                                90.63
                                91 .07
                                92.78
                                93.58
                                94.24
                               100.00
                     100  200  300  400
                        FREQUENCY

-------
VfiRJAflLL

OLQGP
CLOCP
RES ID
             Fig.  3.  CLOGP-I Mode (2)

             STATIST 1 C  A L
   N
                         ANALYSIS   SYSTEM
                                  16.36 THURSDAY,  SEPTEMBER 9,  19«
MEAN
STD DL=.V
SUM
MINIMUM
                                                         MAXIMUM
2314  1.7878089?  1.334276424136.990000-3.69000000  7.54000000
2314  1.76274201  1.42505265 4078.V85000 -4.29000000  7.31000000
2314  0.02506698  0.39138656   58.005000 -2.20700000  2.62300000
CORRELATION COEFFICIENTS BETWEEN OLOGP AND CLOGP =  .96189
                                       R-SQUARED =  .92523
                              FREQUENCY BAR CHART
MIDPOINT
RES ID
-1.25
-1.15
-1 .05
-0.95
-0.85
-0.75
-0.6'-.
-O.b5
-0.3L'
-0.25
-0.15
-0.05
0.05
0.15
0.25
0.35
0.4^
0.55
0.65
0.75
0.95
1 .05
1 .25







to
>
•A
)(.*.
*h*
H*rf-Ht
liritH* iH"fc
*to> }**********•
tt*Ktty*-**fc)»tt*y)ftttt
h»* »*•**»>•*>*•* fcjn
ft**** ti.)tfc*.4t».
•»fc*j-4f **•#**«
*»*•**•»
*i**to
«tf-ft
4>tt
rt
V
*
*
. — _-_ — .1 _._«.J_ JL
~ ^ T^
100 200 300
"REQ

6
3
6
13
10
24
44
66
74
106
156
237
448
290
214
1 92
112
84
66
45
27
?7
15
14
0
19
CUM.
FREQ
6
9
15
2y
46
70
114
1 80
254
360
516
753
1201
1491
1705
1 897
2009
2093
2159
2204
2231
2258
2273
2207
2295
2 3'i 4
PERCENT

0.26
0.13
0.26
0.56
0.78
1 .04
1 .90
2 . 85
3.20
4.58
6.74
'i 0 . 24
19.36
12.53
9.25
8.30
4.84
3.63
2.85
1 .94
1.17
1.17
0.65
0.61
0.35
0.82
CUM.
PERCENT
0.26
0.39
0.65
1 .21
1 .99
3.03
4.93
7.78
10.98
15.56
22.30
32.54
51 .90
64.43
73.68
81.98
86.82
90 . 45
93.30
9!5 . 25
96.41
97.58
90 . 23
98.83
99.18
100.00
                                    400
                        FREQUENCY

-------
             Fig. 4.  CLOGP-I Mode (3)
             S T A T 1ST  I C A L
                         A N A L  Y S  I  £    S  \  STEM
                                   12:50  FRtDAY,  SEPTEMBER  10,  1 9H2
VAR i ADLE'

OLOGI1
CLOGP
RES ID
   N
MTAN
S7D Dl V
SUM
MINIMUM
2454  1.76560460
2454  1.74323513
2434  0.02244947
      1.311672/0 4332.990000 -3.69000000
      1.3U963354 4277.899000 -4.29000000
      0.36280510   55.091000 -2.20700000
   MAXIMUM

7.54000000
7.31000000
2.53000000
CORRELATION COEFFICIENTS: HETUEEN OLOLP AND  CLOG I-1  =  .96'->56
                                      R-SQUARED   =  .93231
       MIDPOINT
        RES ID
                  FREQUENCY BAR CHART

                             TREQ
1 .25
1 .15
1 .05
0.95
0.85
0.75
0.65
0.55
0.45
0.35
0.25
0.15
0.05
0.05
0.15
0.25
0.35
0.45
0.55
0.65
0.75
0.05
0.95
1 .05
1.15
1 .25




*
y-
**
*•**•
)Ht* V-
fc *!.>.> It*-
** y >' * ».
******
***)-«•)•
******
H-* «***!•
******
******
*)***
***
*»
*
«h-
)>•
*

*
                    100  200  300  400

                        FREQUENCY
                         CUM.
                         FREQ
3
3
5
9
14
20
46
63
70
123
178
251
4l39
320
239
199
117
82
63
45
29
30
17
11
8
12
3
6
11
20
34
54
100
163
241
364
542
793
1232
1602
1041
2040
2157
2239
2302
2347
2376
2406
2423
2434
2442
2454
                      PERCENT
                 CUM.
               PERCENT
0.12
0.12
0.20
0.37
0.57
0.81
1 .87
2.57
3.18
5.01
7.25
10.23
19.93
13.04
9 . 74
8.11
4.77
3.34
2.57
1 .83
1 .18
1 .22
0.69
0.45
0.33
0.49
0.12
0.24
0.45
0.81
1 .39
2.20
4.07
6.64
9.82
14.83
22.0?
32.31
52.24
65.20
75.02
83 .M3
87.90
91 .24
93.81
95.64
96.82
98.04
98.74
99.19
99.51
100.00

-------
                                     Fig. 5.

              S T ft T  T  S  T 1 C A L   ANALYSIS   S Y S T L  fl            :>1
                                                11.21* WFDNESDAY, SCPTChUKR 8.  1 9(i.»

            PLOT OF KESlDaCLOGP    LEGEND-  A  - 1  (Mo, B =• '.-' OK, [1C.
   3.0
S.S
r.o

1.5



1 .0



0.5

RES ID

0.0 •



-O.L5 •



-1 .0



-1 .5 •

-2.0 •

•
A
A A
h
A
A A A
A A
)• A A
A


A
AA A
A B
ABA ABA AA
AA AA AD AB AA A H A
A AABBBAAAD A A AAB B
A B B
ABDAD AACCACC ABD BAA
A DDAAE BUBBDGCEDAD CAB A l>
A B CAACCADCBFFKEIHCD ACBCCB A
A B A
CABBIFCLJMFIGF1CEGEBCCA A A A
A C A AB DBEFDGIJOUlUkdLI'JI'EDEnDUD AK A A
A A CBKDFGGGOlZRSTZQYQNPLMDCtirACBA A
CBEBHGGDLJTVRZZZZZZZZZYZUIIGCDDHA CB ft
B A
A A
A
A A
A

A








-2.5 +
1
1
-3.0 +
-5 -2
A nFFCltiCLPPVQS'UMRXUOLKGCDEEADBA A
AAAADECDCJHkILPNVUNNKKHGi)FGG AC. A
A ACCBBDCDIMKIrJLrKEJDCCBBBAA
B AA BbBCDCFKHHEbDDHCD C A A A
AA AAAAADDCCCBDCEEEDB AU A
AA DAAD BCACAAAA BAA A
AA AAAABC A BAA A
A A AAB A A A
A A
A A
A

A

A



+ 1 ^ . .
1 4 7
                                         CLOGP
NGTil      103  DBS HIDDEN
              STATISTICAL
A N A L Y y  ]  S   S Y ." T E M            '52
        11 23  WEDNFSDAY, SUP TkhHFR 3,  19»:2

-------