vvEPA
United States
Environmental Protection
Agency
Environmental Research
Laboratory
Athens GA 30613-7799
Research and Development
EPA/600/M-89/017 Aug.1989
ENVIRONMENTAL
RESEARCH BRIEF
Computer Prediction of Chemical Reactivity—The Ultimate SAR
Samuel W. Karickhoff.i Lionel A. Carreira,2 Clyde Melton.3 Valeta K. McDaniel.1
Andre N. Vellfno,3 and Donald E. Nute3
Introduction
Approximately 70,000 industrial chemicals are listed by
EPA's Office of Toxic Substances. As the evaluation and
management of the environmental and human risk
associated with the proliferation of these anthropogenic
chemicals becomes an increasingly urgent social priority,
there accrues a corresponding demand on physical and
biological scientists and engineers to provide effective
techniques for quantifying their release, fate, and potential
environmental damage. Historically, Federal health and
safety regulators (i.e., the U.S. Environmental Protection
Agency and the Occupational Safety and Health
Administration) have relied on field monitoring, toxico-
logical test data, and expert scientific knowledge to
condemn or vindicate a given chemical. Recent emphasis,
however, on more quantitative and comprehensive risk/
benefit analyses, coupled with the extension of the
regulators' "umbrella" (via the Toxic Substances Control
1 Environmental Research Laboratory, USEPA, Athens, GA 30613-
7799.
2Chemisry Department, University of, Georgia, Athens, GA 30602.
3Advanced Computational Methods Center, University of Georgia,
Athens, GA 30602
Act) over all manufactured chemicals (including those in
the pre-manufacturing stages) has required the develop-
ment of more sophisticated evaluation methods. These
methods must be capable of forecasting pollutant
behavior over wide ranges of environmental conditions,
often without the benefit of measured data specific to the
chemical or ecosystem in question.
Although a wide variety of approaches can be used in
such judgmental exercises, a knowledge of the relevant
chemistry of the compound in question is critical to any
assessment scenario. For volatilization, sorption and other
physical processes, considerable success has been
achieved in not only phenomenological process modeling
but also a priori estimation of requisite chemical param-
eters such as solubilities and Henry's Law constants.<1-3)
Although considerable progress has been made in pro-
cess elucidation and modeling for chemical processes
such as photolysis and hydrolysises), reliable estimates
of the related fundamental chemical constants (i.e., rate
and equilibrium constants) have been achieved for only a
small number of molecular structures. The values of
parameters, in most instances, must be derived from
measurements or from the expert judgment of specialists
in that particular area of chemistry. Parametric values
-------
have actually been measured for, perhaps, only about one
percent of the chemicals in the OTS inventory. Because
these measurements may easily cost $20,000 to $100,000
per chemical, estimation techniques for these pararqetric
values are very cost-effective. In any case, trained tech-
nicians and adequate facilities are not available for a
measurement effort involving thousands of chemicals^ We
describe here a prototype system for the estimation of
reactivity parameters that will cost the user only a few
minutes of computer time. ;
This work seeks to develop methods for the computer
estimation of fundamental reactivity parameters strictly
from molecular structure. Although the prototype system,
called SPARC (SPARC Performs Automated Reasoning in
Chemistry), presently deals only with the prediction of UV-
Visible light absorption spectra and pKa, the, techniques
discussed below are being extended to other reactivity
parameters such as hydrolysis rate constants. | Any
predictive method should be understood in terms of the
purpose for which it is conceived and should be structured •
by appropriate operational constraints. The methods
described herein are intended for what migh^ be
characterized as engineering applications in environrrjental
assessments. More specifically they provide: [
i
• an a priori estimate of reactivity parameters for dhem-
ical process models when measured data are unavail-
able, !
• guidelines for ranking a large number of chemical
parameters and processes in terms of relevance to the
question at hand, thus establishing priorities for meas-
urement or study, I
• an evaluation or screening mechanism for existing data
based on expected behavior, and guidelines for inter-
preting or understanding existing data and observed
phenomena. ;
The primary operational constraint is that the data
available for testing and calibrating predictive theories are
limited in both quantity and quality. This lack ofj data
recommends a less empirical-that is, more theoretical-
approach moderated by the operational axiom that
complexity not exceed need. In addition, predictive
capability should extend to the entire world of organic
chemicals. In particular, the theory should not be crippled
by the computation and calibration requirements of| large
(molecular weights greater than 200) polyfunctional
molecules. j
Scientific Computing
Until recently, scientific computing consisted almost
entirely of mathematical calculations. These programs,
although they often contain deep and involved mathe-
matics, do not typically express a fully developed scientific
theory. A scientific computer program, for example, might
yield a solution to Schrodinger's equation for many
particles using a self-consistent approximation method.
Such scientific computer programs can be thought; of as
tools or instruments that extend the power of a theory, but
they are not formulations of a theory.
The "new wave" computer technology of expert systems
provides for imbedding theoretical knowledge as well as
calculation algorithms into computer programs that, in
principle, can shorten the distance between scientific
theory and computer implementation. In most scientific
applications to date, however, expert systems have fallen
short .of actual theory implementation, relying heavily on
• pattera matching or correlational inferencing in predictive
strategy. Also, most of the current expert systems are
oriented to a specific application and are targeted primarily
to expert users within a particular scientific discipline.
Interested readers should consult recent reviews by
Gray<9> and Pierce and Hohne(10> in which existing expert
systems for molecular structure elucidation, chemical
synthesis, and molecular design are described.
. In the field of organic chemistry, an extensively developed
theoretical basis'exists already for estimating chemical
reactivity from molecular structure. This theoretical
knowledge base refers to the body of facts, general-
izations, models,.laws, and theories that form the basis for
mechanistic reasoning in physical organic chemistry.
Mechanistic reasoning refers to the process of analyzing a
chemical change in terms of more elementary component
processes, such as critical motions of certain electrons or
nuclei or sequential events through which the trans-
formation proceeds. (For indepth descriptions, readers
may consult physical organic chemistry texts such as that
of Lowry and Richardson.<11>)
The goal for our expert system is to capture the reasoning
process that an organic chemist might undertake in reac-
tivity analysis. The approach primarily involves deductive
reasoning and is theory/mechanism oriented. Computa-
tional procedures are based on existing mathematical
models of theoretical chemistry.
Chemical Modeling
Chemical properties describe molecules in transition, that
is, the conversion of a reactant molecule to a different
state or structure. For, a given chemical property, the
transition of interest may involve electron redistribution
within a single molecule or bimolecular union to form a
transition state or distinct product. The behavior of
chemicals depends on the differences in electronic
properties of the initial state of the system and the state of
interest. For example, a light absorption spectrum reflects
the differences in energy between the ground and excited
electronic states of a given molecule. Moreover, chemical
equilibrium—thus chemical equilibrium constants—de-
pends on the energy differences between the reactants
and products. Reaction rates, on the other hand, may
depend on the energies of a transition state relative to
either reactants or products.
For the reactions addressed in SPARC, these energy
differences are extremely small compared to the total
binding energies of the reactant involved. This presents a
problem for ab initio computational procedures that
calculate absolute energy values. Computing the relatively
small energy differences needed for the analysis of
chemical reactivity from absolute energies requires
-------
extremely accurate calculations. Although achievable for
small subclasses of molecules for certain reactivity
parameters, these methods cannot provide the major
computing thrust for SPARC considering the projected
scope and aforementioned constraints.
There are, however, methods known under the general
heading of "perturbation theories," that make it possible to
calculate energy differences directly. These theories treat
the final state as a perturbed initial state and the energy
differences, then, are determined by quantifying the
perturbation.
Perturbation methods can be used not only to predict
reactivity of a given chemical but also to compute
differences in reactivity for "similar" reactants. Although
they provide potentially more accurate and simpler
computations, the use of perturbation methods requires
considerable circumspection. One must be careful to:
• define the boundary conditions of an algorithm's
validity,
• choose appropriate reference states for calibration and
extrapolation, and
• define, in terms of molecular structure, the meaning of
terms like "chemical similarity."
These perturbation methods are ideally suited for expert
system application due to their extreme flexibility and
computational simplicity. The requisite conditions for
applicability, as well as the selection of appropriate
reference structures or reactions, can be easily built into
the computation control portion of the expert system.
SPARC Computational Procedures
An indepth description of SPARC procedures is beyond
the scope of this brief. The following, however, is a limited
description of the logic of our approach to chemical
parameter prediction and provides an overview of the
reasoning and computational procedures currently
incorporated in SPARC.
The basic philosophy is not to compute any chemical
property from "first principles." Rather, it is to utilize
directly the extensive knowledge base of organic
chemistry. Organic chemists have established the types of
structural groups or atomic arrays that impart certain types
of reactivity and have described, in "mechanistic" terms,
the effects on reactivity of other structural constituents
appended to the site of reaction.
To encode this knowledge base, a classification scheme
was developed that defines the role of structural
constituents in effecting or modifying reactivity.
Furthermore, models have been developed that quantify
the various "mechanistic" descriptions commonly utilized
in structure-reactivity analysis, such as induction,
resonance, and field effects. SPARC execution (Figure 1)
involves the classification of molecular structures (relative
to a particular reactivity of interest) and the selection and
execution of appropriate "mechanistic" models to quantify
reactivity.
The computational approaches in SPARC are a blending of
conventional linear free energy theory (LFET), structure
activity relationships (SAR), and perturbed molecular
orbital (PMO) methods. In general, SPARC utilizes LFET to
compute thermal properties and PMO theory to describe
quantum effects such as delocalization energies or
polarizabilities of n electrons. In reality, every chemical
property involves both quantum and thermal contributions
and necessarily requires the use of both perturbation
methods for prediction. These approaches have been
extensively developed and utilized in physical organic
chemistry. For detailed descriptions, readers should
consult texts by Leffler and Grunwald(12> and Hammett<13>
on LFET and SAR, Dewar(14> and Dewar and Dougherty!15)
on PMO theory, and the reviews of Taft et a/.(16> on SAR
applications.
Structure Classification
Reactivity assessment in SPARC begins with locating
potential sites within the molecule for a particular reaction
of interest. These reaction sites, which are termed reaction
centers (C), are in general the smallest subunit(s) to which
the reactivity of interest can be ascribed. Any molecular
structure appended to C is viewed as a "perturber,"
denoted P. All reactions to be addressed in SPARC (from
light absorption to hydrolysis) are analyzed in terms of
some critical equilibrium component:
PCn
f(AE)
PC,
(D
where C0 and Cf denote initial and "final" states of the
reaction center C, P is the "perturber"—the structure that
is presumed unchanged by the reaction, fA(E) denotes
some reaction parameter of interest that is a function of
the energy change (AE) of the reaction. For example, the
ionization of phenol is described by:
OH
(2)
where C0, Cf and P are -OH, -O— and the phenyl group,
respectively.
For light absorption, C0 and Cf are ground and excited
states of the reaction center and f(AE) is the intensity of
light absorption as a function of incident light energy (that
is, the absorption spectrum). For reaction kinetics, C0 and
Cf may denote the initial and transition states of the
reaction center, and f(AE) is the rate constant expressed
as a function of the energy required for achieving the
transition state.
The energy change is expressed in terms of contributions
of factored structural components. For thermal properties,
AE = AEn
8P (AEC
(3)
-------
User Input
Molecular Entry System:
—Smites
—Molecular Editor
Reactivity Parameter Type:
—Absorption Spectra
-pKa
Reaction Conditions
—Solvent
—Temperature
Internal Molecular
Structure Representation
—Prolog Database
Structural Classification
—Reaction Centers
—Perturbers
Process Descriptors
Algorithm Selection and Execution
Output
Graphics Display:
—Spectral Plot
Numerical Output
—pKa Values
Parameter Base
Output Display:
—Spectral Plots
—Numerical Values
Rgure 1. SPARC overview.
abti<
where AEc describes the intrinsic behavior of the reaction
center, and 6P(AEC) denotes some perturbation derived
from the appended structure, P. Changes in localized
bonding energies are incorporated in AEC andj are
assumed unchanged by P. For each reaction type, SPARC
catalogues reaction centers and appropriate charac-
terization data, f(AEc). These reference data are! not
calculated a priori, but are inferred directly from measured
data. Rgure 2 contains sample reaction centers for acid or
base ionization constants (pKa's) and UV-Visible [light
absorption. SPARC computes reactivity perturbations,
8p(Reac) that are then used to "correct" the intrinsic
behavior of the reaction center for the compound in
question. !
I
To facilitate quantitative modeling, 8p(Reac) is expressed
in terms of potential "mechanisms" for interaction of P, and
C.
8P (Reac) = 8e (Reac) + 8,(Reac)
(4)
where subscripts e and r denote electrostatic |and
resonance interactions, respectively. Electrostatic ifiter-
actions derive from local electric dipoles or charges [in P
(fixed or induced) interacting with charges or dipoles |n C.
Because 8p(Reac) describes changes in the reaction
C0-»Cf effected by P, 8e(Reac) represents the difference
in the electrostatic interactions of P with the two states, C0
versus C(. Resonance interactions involve the delocal-
ization of n electrons into or out of C, but again, 8r(Reac)
describes the change in electron delocalization accom-
panying the reaction. Additional perturbations include
specialized interactions of structural elements of P
contiguous to the reaction center such as H-bonding or
steric blockage of access to C for another molecule
(solvating or reacting).
The following examples are indicative of the extrapolation
capability of SPARC. Figure 3 shows calculated and
measured UV-Visible absorption spectra of sexiphenyl
isomers, each of which was calculated as a "perturbed"
benzene. The para isomer shows extended n conjugation
throughout the six-membered ring, whereas the meta
isomer shows only pair-wise ring coupling. In the ortho
isomer, steric twisting of the essential single bonds
removes, to a large extent, ring coupling. These spectra
reflect the ability of SPARC to consider both electrometric
and steric factors in resonance perturbations.
Figure 4 shows predicted and measured pKa's for the
reaction center, -OH, in a variety of molecular structures.
These compounds demonstrate the extendability of
SPARC procedures to inorganic compounds and to both
aliphatic and aromatic organic compounds. Figure 5 shows
similar information for the phosphono reaction center.
Model Verification and Testing
In chemistry, as with all physical sciences, one can never
determine the "validity" of any predictive model with
absolute certainty. This is a direct consequence of the
empirical nature of the science. The closest one can get to
"truth" in chemistry is to make use of the established laws
-------
Light Absorption
II -II* - rigid structures
- ethylene
- benzene
- napthalene
ri - n* - NR2
-OH
-SH
- N (In ring)
- S (In ring
-C = O
-C = S
-C-NH
Acids - OH
-SH
-CO2H
-PO2H
1
- B(OH)2
-SO2H
- SeO3H
- AsO2H
1
Bases - NR2
- N (In ring)
-C = N -
-C = O
Figure 2. Reaction centers for light absorption and
acid/base equilibrium.
and theories, whose validity derives not from logic but
from experience, having withstood exhaustive challenges
or attempts to falsify. This established theoretical
knowledge provides our best description of what the
molecular world is really like. Because SPARC is expected
to predict reaction parameters for processes for which little
or no relative data exist for corroboration, "validity" must
derive from the efficacy of the model constructs in
"capturing" or reflecting the existing knowledge base of
chemical reactivity.
In every aspect of SPARC development, from choosing
the programming environment to building model
algorithms or rule bases, testability was an important
criterion. As discussed earlier, the capability to extrapolate
and/or to avoid situation-specific or ad hoc descriptions
was a primary goal for SPARC. The basic mechanistic
models were designed and parameterized so as to be
portable, in principle, to any type of chemistry or chemical
structure. This extrapolatability enhances testability in
several important ways. First, as the diversity of structures
and the chemistry that is addressable increases, so does
the opportunity for failure. More importantly, however, in
testing against the theoretical knowledge of reactivity,
specific situations can be chosen that offer specific
challenges. This is particularly important when testing
performance in areas where existing data are limited or
where additional data collection may be required. Finally,
this expanded prediction capability allows one to choose,
for exhaustive testing, the reaction parameters for which
large and reliable data sets do exist to test against. Test
data sets are presently being encoded, including UV-
Visible absorption spectra (about 5000 compounds) and
ionization pKa's (about 18,000 compounds). These
unbiased and unscreened data sets will provide an
exhaustive test of performance covering a broad domain of
chemistry.
Is SPARC verifiable? SPARC was designed to optimize
falsifiability. A more pertinent question from a pragmatic
viewpoint is what happens when SPARC fails? In general,
failures to predict derive not by happenstance, but from
errors or inadequacies in conceptualization or theory
implementation. Again, one must resort to the theoretical
knowledge base of chemical reactivity to determine the
source(s) of failure. The "mechanistic" output of SPARC
should aid in the process. In addition, the modular design
p - Sexiphenyl
Sexiphenyl isomers
37500
Frequency (cm'1)
20000
Figure 3. Predicted (solid lines) and measured spectra of Sexiphenyl isomers.
-------
Inorganic and Organic reaction with the Reaction Center, -OH
o
H
EtOH
16.0(15.9)
Me OH
15.5(15.5)
IJOBr
H.O
14.0(14.0)
a 9.4(9.4)
P 9.5(9.5)
Figure 4. Predicted and measured (values in paren-
theses) pKa's for compounds containing the
reaction center, -OH. For example, the phony!
group and -OH react to produce phenol
(indicated by*) with a predicted value of 10.0,
which compares with the measured value of
10.0. Further reaction with bromine produces
o-, m- and p- bromophenol to give predicted
and measured values as shown. A typical
inorganic addendum is OH with the reaction
center to produce H202 (indicated by #) for a
predicted pKa of 11.7 as compared to the
measured pKa of 11.6. Note: there are no
measured pKa values for the various bromine-
substituted benzylalcohols (indicated by @).
and programming environment of SPARC facilitates
modifying or adding to model algorithms or the rule bases.
This provides, we hope, for coherent growth or
advancement in predictive capability. In this capacity, the
SPARC approach also can serve as a research tool for
resolving conflicting viewpoints or perhaps ultimately
advancing the field of reactivity description.
Projection
The methods described above predict UV-Visible light
absorption spectra, ionization pKa's, and hydrolysis
reaction rates. The prototype computer program currently
runs on a widely used minicomputer and uses
commercially available Prolog and Fortran compilers.
Plans are being developed for a PC version. Future
development in the modeling will include:
For Light Absorption Spectra—
• expand the reaction center database to include all
commonly encountered "rigid" n structures,
Inorganic and organic reaction with the Reaction Center, . p . O - H
HPO4-
12.2(12.0)
NHjPO3H2
3.0(3.0)
CH3PO3HZ
2.7(2.4)
H3PO4
2-1(2.1)
CH3PO3H-
7J (7.1-7.77)
2.1 (1.9)
-------
This expert system should be a boon to any Agency Office
or Region, state or other environmental group needing to
predict the concentration of a pollutant in a particular
environment for exposure assessment. It should be
valuable to Agency regulators in their modeling efforts
relative to the assessment of land disposal of wastes,
compliance monitoring, development of remedial action
plans, implementation of premanufacturing notice
regulations, or in any area where the chemical reactivity of
pollutants is important in exposure prediction. SPARC can
estimate reactivity parameters at less cost, with greater
accuracy, and with a broader scope than any conventional
technology.
References
1. Lyman, W. J., W. E. Reehl, and D. H. Rosenblatt.
1982. Handbook of Chemical Property Estimation
Methods: Environmental Behavior of Organic
Chemicals. McGraw-Hill, New York, NY.
2. Yalkowski, S. H. and S. C. Valfani. 1980. Solubility
and Partitioning I: Solubility of Non-electrolytes in
Water. Pharmaceutical So/., 69:912-913.
3. Leo, A. J. 1975. Calculation of Partition Coefficients
Useful in Evaluation of the Relative Hazards of Various
Chemicals in the Environment. /. J. C. Symposium on
Structure Activity Correlations in Studies of Toxicity
and Bio-concentrations with Aquatic Organisms.
(Veith, G. D., ed.), International Joint Commission,
Windsor, Ontario.
4. Zepp, R. G. 1982. Experimental Approaches to
Environmental Photochemistry. In: Handbook of Envi-
ronmental Chemistry (Hutzinger, O., ed.), Vol. 2(B)
Springer-Verlag, New York, NY, pp. 19-42.
5. MacKay, D., A. Bobra, W. Y. Shiu, and S. H.
Yalkowski. 1980. Partition Coefficients. Chemosphere,
9:701-711.
6. Zepp, R. G. and D. M. Cline. 1977. Rates of Direct
Photolysis in the Aquatic Environment. Environmental
Science and Technology, 11:359-366.
7.'Wolfe, N. L, R. G. Zepp, J. A. Gordon, G. L.
Baughman, and D. M. Cline. 1977. Kinetics of
Chemical Degradation of Malathion in Water. Environ-
mental Science and Technology, 11:88-100.
8. Smith, J. L, W. R. Mabey, N. Bohanes, B. B. Hold, S.
S. Lee, T. W. Chou, D. C. Bomberger, and T. Mill.
1978. Environmental Pathways of Selected Chemicals
in Freshwater Systems: Part II, U. S. Environmental
Protection Agency, Athens, GA, EPA/600/7-78/074.
9. Gray, Neil A. B. 1986. Computer Assisted Elucidation.
Wiley and Sons, New York, NY.
10. Pierce, T. H. and B. A. Hohne (eds.). 1986. Artificial
Intelligence Applications in Chemistry, American
Chemical Society, Washington, DC, Symposium
Series, No. 306.
11. Lowry, T. H. and K. S. Richardson. 1987. Mechanisms
and Theory in Organic Chemistry, 3rd ed., Harper &
Row, New York, NY.
12. Leffler, J. E. and E. Grunwald. 1969. Rates of
Equilibria of Organic Reactions. Wiley and Sons, New
York, NY.
13. Hammett, L. P. 1970. 2nd ed. Physical Organic
Chemistry, McGraw Hill, New York, NY.
14. Dewar, M. J. S. 1969. The Molecular Orbital Theory of
Organic Chemistry, McGraw Hill, New York, NY.
15. Dewar, M. J. S. and R. C. Dougherty. 1975. The PMO
Theory of Organic Chemistry. Plenum Press, New
York, NY.
16. Taft, R. W. (ed.). 1987. Progress in Organic Chem-
istry, Vol. 16. Wiley and Sons, New York, NY.
17. Karickhqff, S. W., Lionel A. Carreira, Clyde Melton,
Valeta K. McDaniel, Andre N. Vellino, and Donald E.
Nute. Predicting Chemical Reactivity By Computer,
Part I: Approach. Submitted for publication. 1989.
18. Karickhoff, S. W., Lionel A. Carreira, Clyde Melton,
Valeta K. McDaniel, Andre N. Vellino and Donald E.
Nute. Predicting Chemical Reactivity by Computer!
Part II: UV-Visible Light Absorption. Submitted for
publication. 1989.
19. Karickhoff, S. W., Lionel A. Carreira, Clyde Melton,
Valeta K. McDaniel, Andre N. Vellino, and Donald E.
Nute. Predicting Chemical Reactivity By Computer,
Part III: lonization pKa. Submitted for publication.
1989.
-------
United States
Environmental Protection
Agency
Center for Environmental Research
Information
Cincinnati OH 45268
Official Business
Penalty for Private Use $300
EPA/600/M-89/017
------- |