EPA/600/R-03/030
March 2003
PREDICTION OF CHEMICAL REACTIVITY
PARAMETERS AND PHYSICAL PROPERTIES OF
ORGANIC COMPOUNDS FROM MOLECULAR
STRUCTURE USING SPARC
By
S.H. Hilal and S.W. Karickhoff
Ecosystems Research Division
Athens, Georgia
and
L.A. Carreira
Department of Chemisty
University of Georgia
Athens, GA
National Exposure Research Laboratory
Office of Research and Development
U.S. Environmental Protection Agency
Research Triangle Park, NC 27711
-------
DISCLAIMER
The United States Environmental Protection Agency through its Office of Research
and Development partially funded and collaborated in the research described here under
assistance agreement number 822999010 to the University of Georgia. It has been subjected
to the Agency peer and administration review process and approved for publication as an
EPA document.
ABSTRACT
The computer program SPARC (SPARC Performs Automated Reasoning in Chemistry) has
been under development for several years to estimate physical properties and chemical reactivity
parameters of organic compounds strictly from molecular structure. SPARC uses computational
algorithms based on fundamental chemical structure theory to estimate a variety of reactivity
parameters. Resonance models were developed and calibrated on more than 5000 light absorption
spectra, whereas electrostatic interaction models were developed using more than 4500 ionization
pKas in water. Solvation models (i.e., dispersion, induction, dipole-dipole, hydrogen bonding, etc.)
have been developed using more than 8000 physical property data points on properties such as
vapor pressure, boiling point, solubility, Henry's constant, GC retention times, Kow, etc. At the
present time, SPARC predicts ionization pKa (in the gas phase and in many organic solvents
including water as function of temperature), carboxylic acid ester hydrolysis rate constants (as
function of solvent and temperature), Ei/2 reduction potential (as function of solvents, pH and
temperature), gas phase electron affinity and numerous physical properties for a broad range of
molecular structures.
-------
FOREWORD
Recent trends in environmental regulatory strategies dictate that EPA will rely heavily on
predictive modeling to carry out the increasingly complex array of exposure and risk assessments
necessary to develop scientifically defensible regulations. The pressing need for multimedia,
multistressor, multipathway assessments, from both the human and ecological perspectives, over
broad spatial and temporal scales, places a high priority on the development of broad new modeling
tools. However, as this modeling capability increases in complexity and scale, so must the inputs.
These new models will necessarily require huge arrays of input data, and many of the required
inputs are neither available nor easily measured. In response to this need, researchers at ERD-
Athens have developed the predictive modeling system, SPARC, which calculates a large number
of physical and chemical parameters from pollutant molecular structure and basic information about
the environment (media, temperature, pressure, pH, etc.). Currently, SPARC calculates a wide
array of physical properties and chemical reactivity parameters for organic chemicals strictly from
molecular structure.
Rosemarie C. Russo, Ph.D.
Director
Ecosystems Research Division
Athens, Georgia
in
-------
TABLE OF CONTENTS
1. GENERAL INTRODUCTION 1
2. SPARC COMPUTATIONAL METHOD 5
3. CHEMICAL REACTIVITY PARAMETERS 6
3.1. Estimation of lonization pKa in Water 7
3.1.1. Introduction 7
3.1.2. SPARC's Chemical Reactivity Modeling 8
3.1.3. lonization pKa Computational Approach 9
3.1.4. lonization pKa Modeling Approach 11
3.1.4.1. Electrostatic Effects Models 12
3.1.4.1.1. Field Effects Model 13
3.1.4.1.2. Mesomeric Field Effects 17
3.1.4.1.3. Sigma Induction Effects Model 19
3.1.4.2. Resonance Effects Model 20
3.1.4.3. Solvation Effects Model 21
3.1.4.4. Intramolecular H-bonding Effects Model 23
3.1.4.5. Statistical Effects Model 24
3.1.4.6. Temperature Dependence 24
3.1.5. Results and Discussion 25
3.1.6. Training and Testing of lonization pKa calculator 28
3.1.7. Conclusion 31
3.2. Estimation of Zwitterionic Equilibrium Constant, Microscopic
Constants Molecular Speciation, and Isoelectric Point 32
3.2.1. Introduction 33
3.2.2. Calculation of Macroconstants 33
3.2.3. Zwitterionic Equilibria:Microscopic Constant 34
3.2.4. Speciation-Two lonizable Sites 36
3.2.5. Speciation of Multiple lonization Sites 41
3.2.6. Isoelectric Points 48
3.2.7. Conclusion 50
3.3. Estimation of Gas Phase Electron Affinity 51
3.3.1. Introduction 51
iv
-------
3.3.2. Electron Affinity Computational Methods 51
3.3.3. Electron Affinity Models 52
3.3.3.1. Field Effects Model 54
3.3.3.2. Sigma Induction Effects Model 55
3.3.3.3. Resonance Effects Model 56
3.3.4. Results and Discussion 56
3.3.5. Conclusion 61
3.4. Estimation of Ester Hydrolysis Rate Constant 62
3.4.1. Introducti on 62
3.4.1.1. Base-Catalyzed Hydrolysis 62
3.4.1.2. Acid-Catalyzed Hydrolysis 63
3.4.1.3. General-Catalyzed Hydrolysis 64
3.4.2. SPARC Modeling Approach 64
3.4.3. Hydrolysis Computational Model 65
3.4.3.1. Reference Rate Model 66
3.4.3.2. Internal Perturbation Model 67
3.4.3.2.1. Electrostatic Effects Models 78
3.4.3.2.1.1. Direct Field Effect Model 68
3.4.3.2.1.2. Mesomeric Field Effects Model 69
3.4.3.2.1.3. Sigma Induction Effects Model 70
3.4.3.2.1.4. R,Effects Model 70
3.4.3.2.2. Resonance Effects Model 71
3.4.3.2.3. Steric Effect Model 72
3.4.3.3. External Perturbation Model 73
3.4.3.3.1. Solvation Effect 73
3.4.3.3.1.1. Hydrogen Bonding 73
3.4.3.3.1.2. Field Stabilization Effect 75
3.4.3.3. Temperature Effect 76
3.4.4. Results and Discussions 76
3.4.5. Conclusion 80
4. PHYSICAL PROPERTIES
4.1. Estimation of Physical Properties 81
4.2. Physical Properties Computational Approach 82
4.3. SPARC Molecular Descriptors 83
4.3.1. Average Molecular Polarizability 83
4.3.1.1. Refractive Index 84
4.3.1.2. Molecular Volume 87
4.3.1.3. Microscopic Bond Dipole 88
4.3.1.4. Hydrogen Bonding 89
4.4. SPARC Interaction Models 91
4.4.1. Dispersion Interactions 91
4.4.2. Induction Interactions 92
4.4.3. Dipole-Dipole Interaction 93
-------
4.4.4. Hydrogen Bonding Interactions 94
4.4.5. Solute-Solvent Interactions 95
4.5. Solvents 97
4.6. Physical Process Models 98
4.6.1. Vapor Pressure Model 98
4.6.2. Activity Coefficient Model 101
4.6.3. Crystal Energy Model 102
4.6.4. Enthalpy of Vaporization 104
4.6.5. Temperature Dependence of Physical Process Models 105
4.6.6. Normal Boiling Point 107
4.6.7. Solubility 108
4.6.8. Mixed Solvents 109
4.6.9. Partitioning Constants 110
4.6.9.1. Liquid/Liquid Partitioning 111
4.6.9.2. Liquid/Solid Partitioning 112
4.6.9.3. Gas/liquid (Henry's constant) Partitioning 113
4.6.9.4. Gas/Solid Partitioning 113
4.6.10. Gas Chromatography 114
4.6.10.1. Calculation of Kovats Indices 116
4.6.10.2. Unified Retention Index 117
4.6.11. Liquid Chromatography 118
4.6.12. Diffusion Coefficient in Air 120
4.6.13. Diffusion Coefficient in Water 121
4.7. Conclusion 122
5. PHYSICAL PROPERTIES COUPLED
WITH CHEMICAL REACTIVITY MODELS
5.1. Henry's Constant for Charged Compounds 123
5.1.1. Microscopic Monopole 124
5.1.2. Induction-Monopole Interaction 124
5.1.3. Monopole-Monopole Interaction 125
5.1.4. Dipole-Monopole Interaction 125
5.1.5. Hydrogen Bonding Interactions 126
5.2. Estimation of pKa in the Gas Phase and in non-Aqueous Solution 126
5.3. Ei/2 Chemical Reduction Potential 127
5.4. Chemical Speciation 129
5.5. Hydration 130
5.6. Process Integration 133
5.7. Tautomeric Equilibria 134
5.8. Conclusion 136
6. MODEL VERIFICATION AND VALIDATION 13 8
VI
-------
7. TRAINING AND MODEL PARAMETER INPUT 139
8. QUALITY ASSURANCE 139
9. SUMMAY 140
10. REFERENCES 143
11. GLOSSARY 147
12. APPENDIX 151
vn
-------
1. GENERAL INTRODUCTION
The major differences among behavioral profiles of molecules in the environment are
attributable to their physicochemical properties. For most chemicals, only fragmentary knowledge
exists about those properties that determine each compound's environmental fate. A chemical-by-
chemical measurement of the required properties is not practical because of expense and because
trained technicians and adequate facilities are not available for measurement efforts involving
thousands of chemicals. In fact, physical and chemical properties have only actually been measured
for about 1 percent of the approximately 70,000 industrial chemicals listed by the U.S. Environmen-
tal Protection Agency's Office of Prevention, Pesticides and Toxic Substances (OPPTS) [1]. Hence,
the need for physical and chemical constants of chemical compounds has greatly accelerated both in
industry and government as assessments are made of potential pollutant exposure and risk.
Although a wide variety of approaches are commonly used in regulatory exposure and risk
calculations, knowledge of the relevant chemistry of the compound in question is critical to any
assessment scenario. For volatilization, sorption and other physical processes, considerable success
has been achieved in not only phenomenological process modeling but also a priori estimation of
requisite chemical parameters, such as solubilities and Henry's Law constants [2-9]. Granted that
considerable progress has been made in process elucidation and modeling for chemical processes
[10-15], such as photolysis and hydrolysis, reliable estimates of the related fundamental thermody-
namic and physicochemical properties (i.e., rate/equilibrium constants, distribution coefficient,
solubility in water, etc.) have been achieved for only a limited number of molecular structures. The
values of these latter parameters, in most instances, must be derived from measurements or from the
expert judgment of specialists in that particular area of chemistry.
-------
Mathematical models for predicting the transport and fate of pollutants in the environment
require reactivity parameter values—that is, the physical and chemical constants that govern
reactivity. Although empirical structure-activity relationships have been developed that allow
estimation of some constants, such relationships are generally valid only within limited families of
chemicals. Computer programs have been under development at the University of Georgia and U.S.
Environmental Protection Agency for more than 12 years that predict a large number of chemical
reactivity parameters and physical properties for a wide range of organic molecules strictly from
molecular structure. This prototype computer program called SPARC (SPARC Performs
Automated Reasoning in Chemistry) uses computational algorithms based on fundamental chemical
structure theory to estimate a variety of reactivity parameters [16-26]. This capability crosses
chemical family boundaries to cover a broad range of organic compounds. SPARC presently
predicts numerous physical properties and chemical reactivity parameters for a large number of
organic compounds strictly from molecular structure, as shown in Table 1.
SPARC has been in use in Agency programs for several years, providing chemical and
physical properties to Program Offices (e.g., Office of Water, Office of Solid Waste and Emergency
Response, Office of Prevention, Pesticides and Toxic Substances) and Regional Offices. Also,
SPARC has been used in Agency modeling programs (e.g., the Multimedia, Multi-pathway, Multi-
receptor Risk Assessment (3MRA) model and LENS3, a multi-component mass balance model for
application to oil spills) and to state agencies such as the Texas Natural Resource Commission. The
SPARC web-based calculators have been used by many employees of various government
agencies, academia and private chemical/pharmaceutical companies throughout the United States.
The SPARC web version performs approximately 50,000-100,000 calculations each month. (See
the summary of usage of the SPARC web version in the Appendix).
2
-------
Although the primary emphasis in this report, and throughout the development of the
SPARC program, has been aimed at supporting environmental exposure and risk assessments, the
SPARC physicochemical models have widespread applicability (and are currently being used) in
the academic and industrial communities. The recent interest in the calculation of physicochemical
properties has led to a renaissance in the investigation of solute-solvent interactions. In recent ACS
conferences, over one third of the computational chemistry talks have dealt with calculating
physical properties and solvent-solute interactions.
The SPARC program has been used at several universities as an instructional tool to
demonstrate the applicability of physical organic models to the quantitative calculation of
physicochemical properties (e.g., a graduate class taught by the late Dr. Robert Taft at the
University of California). Also, the SPARC calculator has been used for aiding industry (such as
Pfizer, Merck, Pharmacia & Upjohn, etc.) in the areas of chemical manufacturing and
pharmaceutical and pesticide design. The speed of calculation allows SPARC to be used for on-
line control in many chemical engineering applications. SPARC can also be used for custom
solvent and mixed solvent design to assist the synthesis chemist in achieving a particular product or
yield.
SPARC costs the user only a few minutes of computer time and provides greater accuracy
and a broader scope than is possible with conventional estimation techniques. The user needs to
know only the molecular structure of the compound to predict a property of interest. The user
provides the program with the molecular structure either by direct entry in SMILES (Simplified
Molecular Input Line Entry System) notation, or via the CAS number, which will generate the
SMILES notation. SPARC is programmed with the ALS (Applied Logic Systems) version of
Prolog (PROgramming in LOGic).
3
-------
Table 1. SPARC current physical and chemical properties estimation capabilities
Physical Property & Molecular Descriptor
Molecular Weight
Polarizability
a, (3 H-bond
Microscopic local bond dipole
Density
Volume
Refractive Index
Vapor Pressure
Viscosity
Boiling Point
Heat of Vaporization
Heat of formation
Diffusion Coefficient in Air
Diffusion Coefficient in Water
Activity Coefficient
Solubility
Gas/Liquid Partition
Gas/Solid Partition
Liquid/Liquid Partition
Liquid /Solid Partition
GC Retention Times
LC Retention Times
Status
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Mixed
Yes
Yes
UD
Mixed
Mixed
Yes
Yes
Yes
Mixed
Yes
Mixed
Yes
Mixed
Reaction Conditions
Temp
Temp
Temp
Temp
Temp
Temp
Press
Temp
Temp
Temp, Press
Temp
Temp, Solv
Temp, Solv
Temp, Solv
Temp, Solv
Temp, Solv
Temp, Solv
Temp, Solv
Temp, Solv
Chemical Reactivity
lonization pKa in Water
lonization pKa in non-Aqueous Solution.
lonization pKa in Gas phase
Microscopic lonization pKa Constant
Zwitterionic Constant
Molecular Speciation
Isoelectric Point
Electron Affinity
Ester Carboxylic Hydrolysis Rate Constant
Hydration Constant
Tautomer Constant
Ey2 Chemical Reduction Potential
Yes
Mixed
Mixed
Yes
Yes
Yes
Yes
Mixed
Yes
Mixed
Mixed
Mixed
Temp, pH
Temp, Solv
Temp
Temp, Solv, pH
Temp, Solv, pH
Temp, Solv, pH
Temp, Solv, pH
Temp , Solv
Temp , Solv
Temp, Solv, pH
Temp, Solv, pH
Yes : Already tested and implemented in SPARC
Mixed : Some capability exists but needs to be tested more, automated and/or extended.
UD: Under Development at this time
Press : Pressure, Temp: Temperature, Solv: Solvent
a: proton-donating site, P: proton-accepting site.
-------
2. SPARC COMPUTATIONAL METHODS
SPARC does not do a "first principles" computation; rather, SPARC seeks to analyze
chemical structure relative to a specific reactivity query in much the same manner as an expert
chemist would do. Physical organic chemists have established the types of structural groups or
atomic arrays that impact certain types of reactivity and have described, in "mechanistic" terms, the
effects on reactivity of other structural constituents appended to the site of reaction. To encode this
knowledge base, a classification scheme was developed in SPARC that defines the role of structural
constituents in affecting reactivity. Furthermore, models have been developed that quantify the
various "mechanistic" descriptions commonly utilized in structure-activity analysis, such as
induction, resonance and field effects. SPARC execution involves the classification of molecular
structure (relative to a particular reactivity of interest) and the selection and execution of appropriate
"mechanistic" models to quantify reactivity.
The SPARC computational approach is based on blending well known, established
methods such as SAR (Structure Activity Relationships) [27, 28], LFER (Linear Free Energy
Relationships) [29, 30] and PMO (Perturbed Molecular Orbital) theory [31, 32]. SPARC uses
SAR for structure activity analysis, such as induction and field effects. LFER is used to estimate
thermodynamic or thermal properties and PMO theory is used to describe quantum effects such as
charge distribution delocalization energy and polarizability of the TT electron network. In reality,
every chemical property involves both quantum and thermal contributions and necessarily requires
the use of all three methods for prediction.
A "toolbox" of mechanistic perturbation models has been developed that can be
implemented where needed in SPARC for a specific reactivity query. Resonance perturbation
models were developed and calibrated using light absorption spectra for more than 5000
5
-------
compounds [1, 16], whereas electrostatic interaction perturbation models were developed using
ionization pKas in water for more than 4500 compounds [17-22]. Solvation perturbation models
(i.e., dispersion, induction, H-bond and dipole-dipole) have been developed using physical
properties data such as vapor pressure, boiling point, solubility, distribution coefficient, Henry's
constant and GC chromatographic retention times for more than 8000 compounds [21, 23, 24].
Ultimately, these mechanistic components will be fully implemented for the aforementioned
chemical and physical property models, and will be extended to additional properties such as
hydrolytic and redox processes.
Any predictive method should be understood in terms of the purpose for which it is
developed, and should be structured by appropriate operational constraints. SPARC's predictive
methods were designed for engineering applications involving physical/chemical process modeling.
More specifically, these methods provide:
1. an a priori estimate of the physicochemical parameters of organic compounds for physical
and chemical fate process models when measured data are not available,
2. guidelines for ranking a large number of chemical parameters and processes in terms of
relevance to the question at hand, thus establishing priorities for measurements or study,
3. an evaluation or screening mechanism for existing data based on "expected" behavior,
4. guidelines for interpreting or understanding existing data and observed phenomena.
3. CHEMICAL REACTIVITY PARAMETERS
Molecular structures are broken into functional units with known chemical properties
called reaction centers, C. The intrinsic behavior of each reaction center is then "adjusted" for the
-------
compound in question by describing mechanistically the effect(s) on reactivity of the molecular
structure(s) appended to each reaction center using perturbation theory.
The SPARC chemical reactivity models have been designed and parameterized to be
portable to any chemical reactivity property and any chemical structure. For example, chemical
reactivity models are used to estimate macroscopic/microscopic ionization pKa in water. The same
reactivity models are used to estimate:
1. zwitterionic constant, isoelectric point, titration curve and speciation fractions as a function
ofthepH,
2. ionization pKa in the gas phase,
3. ionization pKa in non-aqueous solution,
4. gas phase electron affinity,
5. carboxylic acid ester hydrolysis rate constant in water and in non-aqueous solution.
3.1. Estimation of Ionization pKa in Water
3.1.1 Introduction
A knowledge of the acid-base ionization properties of organic molecules is essential to
describing their environmental transport and transformations, or estimating their potential
environmental effects. For ionizable compounds, solubility, partitioning phenomena and chemical
reactivity are all highly dependent on the state of ionization in any condensed phase. The ionization
pKa of an organic compound is a vital piece of information in environmental exposure assessment.
It can be used to define the degree of ionization and resulting propensity for sorption to soil and
sediment that, in turn, can determine a compound's mobility, reaction kinetics, bioavailability,
complexation, etc. In addition to being highly significant in evaluating environmental fate and
7
-------
effects, acid-base ionization equilibria provide an excellent development arena for electrostatic
interaction perturbation models. Because the gain or loss of protons results in a change in molecular
charge, these processes are extremely sensitive to electric field effects within the molecule.
Numerous investigators have attempted to predict ionization pKa's using various
approaches such as ab initio [33, 34] and semiempirical [35, 36] methods. The energy differences
between the protonated and the unprotonated states are small compared to the total binding
energies of the reactants involved. This presents a problem for ab initio computational methods
that calculate absolute energy values. Computing the relatively small energy differences needed
for the analysis of molecular chemical reactivity from the absolute energies requires extremely
accurate calculations. Hence, the aforementioned calculation methods are generally limited to a
small subclass of molecules. A more aggressive attempt was made by Klopman et. al., [37, 38].
They estimated the pKa's for about 2400 molecules (R2 = 0.846) based on QSAR using the Multi-
CASE program. Despite the relatively large number of pKa's estimated, their calculator was
limited to only the first ionization site pKa [38] for compounds processing multiple sites.
Unfortunately, up to now no reliable method has been available for predicting pKa over a
wide range of molecular structures, either for simple compounds or for complicated molecules such
as dyes. The SPARC pKa calculator has been highly refined and has been exhaustively tested. In
this report, the calculation 'toolbox' will be described, along with testing results to date.
3.1.2. SPARC's Chemical Reactivity Modeling
Chemical properties describe molecules in transition, that is, the conversion of a reactant
molecule to a different state or structure. For a given chemical property, the transition of interest
may involve electron redistribution within a single molecule or bimolecular union to form a
-------
transition state or distinct product. The behavior of chemicals depends on the differences in
electronic properties of the initial state of the system and the state of interest. For example, a light
absorption spectrum reflects the differences in energy between the ground and excited electronic
states of a given molecule. Chemical equilibrium constants depend on the energy differences
between the reactants and products. Electron affinity depends on the energy differences between
the LUMO (Lowest Unoccupied Molecular Orbital) state and the HOMO (Highest Unoccupied
Molecular Orbital) state.
For any chemical property addressed in SPARC, the energy differences between the initial
state and the final state are small compared to the total binding energy of the reactants involved.
Calculating these small energy differences by ab initio computational methods is difficult, if not
impossible. On the other hand, perturbation methods provide these energy differences with more
accuracy and with more computational simplicity and flexibility than ab initio methods.
Perturbation methods treat the final state as a perturbed initial state and the energy differences
between these two energy states are determined by quantifying the perturbation. For pKa, the
perturbation of the initial state, assumed to be the protonated form, versus the unprotonated final
form is factored into the mechanistic contributions of resonance and electrostatic effects plus other
perturbations such as H-bonding, steric contributions and solvation.
3.1.3. lonization pKa Computational Approach
Molecular structures are broken into functional units called the reaction center and the
perturber. The reaction center, C, is the smallest subunit that has the potential to ionize and lose a
proton to a solvent. The perturber, P, is the molecular structure appended to the reaction center, C.
The perturber structure is assumed to be unchanged in the reaction. The pKa of the reaction center
9
-------
is either known from direct measurement or inferred indirectly from pKa measurements. The pKa of
the reaction center is adjusted for the molecule in question using the mechanistic perturbation
models described below.
Like all chemical reactivity parameters addressed in SPARC, pKa is analyzed in terms of
some critical equilibrium component:
P-C; ^ - * P-Cf
where C; denotes the initial protonated state, Cf is the final unprotonated state of the reaction center,
C, and P is the "perturber". The pKa for a molecule of interest is expressed in terms of the
contributions of both P and C.
where (pKa)c describes the ionization behavior of the reaction center, and 5p(pKa)c is the change in
ionization behavior brought about by the perturber structure. SPARC computes reactivity
perturbations, 5p(pKa)c, that are then used to "correct" the ionization behavior of the reaction center
for the compound in question in terms of the potential "mechanisms" for interact on(s) of P and C as
Sp(pKa)c = SelepKa+SrespKa + SsoipKa+...
where 5respKa, 5eiepKa and 5soipKa describe the differential resonance, electrostatic and solvation
effects of P on the protonated and unprotonated states of C, respectively. Electrostatic interactions
are derived from local dipoles or charges in P interacting with charges or dipoles in C. 5eiepKa
represents the difference in the electrostatic interactions of the P with the two states. 5respKa
describes the change in the delocalization of n electrons of the two states due to P. This
delocalization of n electrons is assumed to be into or out of the reaction center. Additional
10
-------
perturbations include direct interactions of the structural elements of P that are contiguous to the
reaction center such as H-bonding or the steric blockage of solvent access to C.
3.1.1.4. lonization pKa Modeling Approach
The modeling of the perturber effects for chemical reactivity relates to the structural
representation S—iRj—C, where S—;Rj is the perturber structure, P, appended to the reaction center,
C. S denotes substituent groups that "instigate" perturbation. For electrostatic effects, S contains
(or can induce) electric fields; for resonance, S donates/receives electrons to/from the reaction
center. R links the substituent and reaction center and serves as a conductor of the perturbation
(i.e."conducts" resonant TT electrons or electric fields). A given substituent, however, may be a part
of the structure, R, connecting another substituent to C, and thus functions as a "conductor" for the
second substituent. The i and j denote anchor atoms in R for S and C, respectively.
For each reaction center and substituent, SPARC catalogs appropriate characteristic
parameters. Substituents include all non-carbon atoms and aliphatic carbon atoms contiguous to
either the reaction center or a pi-unit. Some heteroatom substituents containing pi groups are treated
collectively as substituents (e.g. -NO2, -C=N, -C=O, -CC^H, etc.). The specification of these
collective units as substituents is strictly facilitative. The only requisites are that they be structurally
and electronically well-defined (charge and/or dipolar properties are relatively insensitive to the
remainder of the perturber structure). Also, these units must be terminal with regard to resonance
interactions (no pass-through conjugation). All hydrogen atoms are dropped and "bookkept" only
through atom valence. An isoelectronic carbon equivalent plus an appended atom, Q, replace
heteroatom substituents in these TT units. For example -C=O- becomes C=C-Q, which is now
treated in SPARC as perturbed ethylene.
11
-------
In computing the contribution of any given substituent to 5p(pKa)c, the effect is factored into
three independent components for the structural components C, S, and R:
1. substituent strength, which describes the potential of a particular S to "exert" a given effect.
(Independent of the property, C and R),
2. molecular network conduction, which describes the "conduction" properties of the
molecular structure R, connecting S to C with regard to a given effect, (Independent of the
property, C and S), and
3. reaction center susceptibility, which rates the response of C to the effect in question
(depends on the property, independent of S and R).
The contributions of the structural components C, S, and R are quantified independently.
For example, the strength of a substituent in creating an electrostatic field effect depends only on
the substituent regardless of the C, R, or property of interest. Likewise, the molecular network
conductor R is modeled so as to be independent of the identities of S, C, or the property being
estimated. The susceptibility of a reaction center to an electrostatic effect quantifies only the
differential interaction of the initial state versus the final state with the electrical field. The
susceptibility gauges only the reaction Cmitiai - Cfmai and is completely independent of both R and S.
This factoring and quantifying of each structural component independently provides parameter
"portability" and, hence, permits model portability to all structures and, in principle, to all types of
reactivity.
3.1.4.1. Electrostatic Effects Models
Electrostatic effects on reactivity derive from charges or electric dipoles in the appended
perturber structure, P, interacting through space with charges or dipoles in the reaction center, C.
Direct electrostatic interaction effects (field effects) are manifested by a fixed charge or dipole in a
12
-------
substituent interacting through the intervening molecular cavity with a charge or dipole in the
reaction center. The substituent can also "induce" electric fields in R that can interact
electrostatically with C. This indirect interaction is called the "mesomeric field effect". In addition,
electrostatic effects derived from electronegativity differences between the reaction center and the
substituent are termed sigma induction. These effects are transmitted progressively through a chain
of a-bonds between atoms. For compounds containing multiple substituents, electrostatic perturba-
tions are computed for each singly and summed to produce the total effect.
With regard to electrostatic effects, reaction centers are classified according to the
electrostatic change accompanying the reaction. For example, monopolar reactions proceed with a
change in net charge (5qc ^ 0) at the reaction center and are denoted Cm; dipolar reactions, Cd,
produce no net change in charge but involve a change in the dipole moment (5|j,c ^ 0, 5qc = 0, etc.).
The nature and magnitude of electrostatic change accompanying a reaction determine the
"susceptibility" of a given reaction to electric fields existing in structure, P.
3.1.4.1.1. Field Effects Model
For a given dipolar or charged substituent interacting with the change in the charge at the
reaction center, the direct field effect may be expressed as a multipole expansion
8»cl-
rJ.
where qs is the charge on the substituent, approximated as a point charge located at point, s7; |j,s is the
substituent dipole located at point s (this dipole includes any polarization of the anchor atom i
effected by S); qc (5|j,c) is the change in charge (dipole moment) of the reaction center
13
-------
accompanying the reaction, both presumed to be located at point c; 9CS is the angle the dipole
subtends to the reaction center; De is the effective dielectric constant for the medium; and rcs (rc/) is
the distance from the substituent dipole (charge) center to the reaction center.
In modeling electrostatic effects, only those terms containing the "leading" nonzero electric
field change in the reaction center are retained. For example, acid-base ionization is a monopole
reaction that is described by the first two terms of the preceding equation; electron affinity is
described by only the second term, whereas the dipole change in H-bond formation is described by
the third and fourth terms.
Once again, in order to provide parameter "portability" and, hence, effects-model portability
to other structures and to other types of chemical reactivity, the contribution of each structural
component is quantified independently:
field
(PKa )c = Pele ° ~P = P ele &c, FS
where ap characterizes the field strength that the perturber exerts on the reaction center. peie is the
susceptibility of a given reaction center to electric field effects that describes the electrostatic change
accompanying the reaction. peie is presumed to be independent of the perturber. The perturber
potential, op, is further factored into a field strength parameter, F (characterizing the magnitude of
the field component, charge or dipole, on the substituent), and a conduction descriptor, acs, of the
intervening molecular network for electrostatic interactions. This structure-function specification
and subsequent parameterization of individual component contributions enables one to analyze a
given molecular structure (containing an arbitrary assemblage of functional elements) and to "piece
together" the appropriate component contributions to give the resultant reactivity effect. For
14
-------
molecules containing multiple substituents, the substituent field effects are computed for each
substituent and summed to produce the total effect as
The electrostatic susceptibility, peie, is a data-fitted parameter inferred directly from
measured pKas. This parameter is determined once for each reaction center and stored in the
SPARC database. In parameterizing the SPARC electrostatic field effects models, the ionization of
the carboxylic acid group was chosen to be the reference reaction center with an assigned peie of 1.
For all the reaction centers addressed in SPARC, electrostatic interactions are calculated relative to a
fixed geometric reference point that was chosen to approximate the center of charge for the
carboxylate anion, rcj = 1.3 unit, where the length unit is the aromatic carbon-carbon length (1.40 A).
The peie for the other reaction centers (e.g., OH, NR.2) reflect electric field changes for these
reactions gauged relative to the carboxylic acid reference, but also subsumes any difference in
charge distribution relative to the reference point, c.
With regard to the substituent parameters, each uncharged substituent has one field strength
parameter, F^, characterizing the dipole field strength; whereas, a charged substituent has two, Fq
and Pp.. Fq characterizes the effective charge on the substituent and F^ describes the effective
substituent dipole inclusive of the anchor atom i, which is assumed to be a carbon atom. If the
anchor atom i, is a noncarbon atom, then F^ is adjusted based on the electronegativity of the anchor
atom relative to carbon. The effective dielectric constant, De, for the molecular cavity, any
polarization of the anchor atom i affected by S, and any unit conversion factors for charges, angles,
distances, etc. are included in the F's.
15
-------
Initially, the distances between the reaction center and the substituent, rcs, for both charges
and dipoles are computed as the summation of the respective distance contributions of C, R and S as
- v .. -+- v .
I 1] ' ' IS
In some cases, such as in ring systems, this "zero-order" distance is adjusted (see below) for direct
through-space interactions of S and C as opposed to interactions through the molecular cavity.
However, these adjustments are significant only when C and S are ortho or perri (e.g., 1, 8-
substituted naphthalene) to each other:
where A is an adjustment constant assumed to depend only on bond connectivity into and out of the
R-TT, unit (e.g., points i and j). For R-TT units recognized by SPARC, "A factors" for each pair (i,j)
are empirically determined from data (or inferred from structural similarity to other R-TT units). The
distance through R (r;j) is calculated by summation over delineated units in the shortest molecular
path from i to j. All aliphatic bonds contribute 1.1 unit; double and triple bonds contribute 0.9 and
0.8 units, respectively. For ring systems, SPARC contains a template listing distances between each
constituent atom pair as illustrated in Table 2. The dipole orientation factors, cos9;j, are presently
ignored (set to 1.0) except in those cases where S and C are attached to the same rigid R-TT unit. In
these latter situations, cosGyS are assumed to depend solely on the point(s) of attachment, (i,j), and
are pre-calculated and stored in SPARC databases.
The strength of the electrostatic interaction between S and C depends on the magnitude and
relative orientation of the local fields of S and C and the dielectric properties and distances through
the conducting medium. All uncharged dipole substituents and positively charged substituents will
16
-------
increase the acidity of any acid, no matter what the charge, and hence, exert a +F. For a negatively
charged substituent, the dipole field component tends to lower the pKa, whereas the negative charge
field component tends to raise the pKa.
C
Position on ring
Molecule Reaction Center
benzene 1
1
1
naphthalene 1
1
1
1
1
1
1
2
2
2
2
2
2
Geometry parameters
Substituent
2
3
4
2
3
4
5
6
7
8
1
3
4
5
6
7
r»
1.0
1.7
2.0
1.0
1.7
2.0
2.6
3.0
2.7
1.7
1.0
1.0
1.7
3.0
3.6
3.4
A,
0.25
0.87
1.00
0.25
0.87
1.00
0.73
0.63
0.64
0.47
0.25
0.25
0.81
0.63
0.98
0.80
CO*
0.53
0.88
1.00
0.53
0.88
1.00
0.81
0.83
0.81
0.77
0.53
0.53
0.91
0.83
0.96
0.84
17
-------
3.1.4.1.2. Mesomeric Field Effects
As mentioned in the previous section, a substituent can also "induce" electric fields in the R
that can interact electrostatically with C. This indirect interaction is called the "mesomeric field
effect". For example, the amino group in the structure below exerts a +F direct effect that should
normally lower the pKa; however, the observed effect is exactly the opposite. The measured pKa of
m-amino pyridine is 6.1, and is greater than the pKa of pyridine (5.2). In this case, the NF^ induces
charges ortho and para to the in-ring N. These charges interact indirectly with the dipole of the
nitrogen in the ring and result in a net increase in the pKa.
* NH2
The contribution of the mesomeric field can be estimated as a collection of discrete charges,
QR, with the contribution of each described by the following equation. As is the case in modeling the
direct field effects, the mesomeric effect components are resolved into three independent elements
for S, R, and C as
where MF is a mesomeric field effect constant characteristic of the substituent S. It describes the
ability or strength of a given substituent to induce a field in Rn. qR describes the location and
relative charge distributions in R, and peie describes the susceptibility of a particular reaction center
to electrostatic effects. Since the reaction center can not discriminate the sources of the electric
fields, Peie is the same as that described previously in discussions of the direct field effects.
18
-------
In modeling the mesomeric field effect, the intensity and the location of charges in R depend
on both the substituent and the Rn network involved. The contributions of S and Rn are resolved by
replacing the substituent with a reference probe or NBMO (NonBonded Molecular Orbital) charge
source. This NBMO reference source for SPARC was chosen to be the methylene anion, -CH2", for
which the charge distribution in any arbitrary Rn network can be calculated.
The mesomeric substituent strength parameter describes the ^-induction ability of a
particular substituent relative to the CH2". The magnitude of a given substituent MF parameter
describes the relative field strength, whereas the sign of the parameter specifies the positive
(electron withdrawing such as NO2) or negative (electron donating such as NR2) character of the
induced charge in Rn. The total mesomeric field effect for a given substituent is given by:
SMF(PKa)c =
k rkc
where qik is the charge induced at each atom k, with the reference probe attached at atom i,
calculated using PMO theory, r^ is the through-cavity distance to the reaction center as described
previously for direct fields. Because induction does not change total molecular charge, the sum of
all induced charges must be zero. This is achieved by placing, at the location of the substituent, a
compensating charge, qs, equal to but opposite to the total charge distributed within the Rn network.
3.1.4.1.3. Sigma Induction Effects Model
Sigma induction derives from electronegativity differences between two atoms. The
electron cloud that bonds any two atoms is not symmetrical, except when the two atoms are the
same and have the same substituents; hence, the higher electronegativity atom will polarize the
19
-------
other. This effect is transmitted progressively between atoms, and dies off rapidly with distance, i.e.
~0.4n, where n is the number of bonds through which the effect is transmitted.
The interaction energy of this effect depends on the difference in electronegativity between
the reaction center and the substituent and on the number of substituents bonded to the reaction
center. Sigma induction effects are resolved into two independent structural component
contributions: that of the substituent, S, and that of the reaction center, C.
SSigma(pKa}c = peleT.(xs-Xc}NB
where peie is the susceptibility of a given reaction center to electric field effects. Once again,
because the reaction center cannot discriminate the source of the electric fields, peie is the same as
that described for the direct field effect. Xc is the effective electronegativity of the reaction center.
Xs is the effective electronegativity of the substituent. NB is data-fitted parameter that depends on
number of the substituents that are bonded directly to the reaction center. The electronegativity of
reaction centers and substituents referenced to the electronegativity of the methyl group, chosen to
be the reference group for this effect.
3.1.4.2. Resonance Effects Model
Resonance involves variations in charge transfer between the TT system and a suitable orbital
of the substituent. The interaction of the substituent orbital with a 7r-orbital of a reaction center can
lead to charge transfer either to or from the reaction center. Electron withdrawing reaction centers
will localize the charge over itself. As a result the acidic state will be stabilized more than the basic
state making these compounds less acidic. For electron donating reaction centers, resonance will
stabilize the basic state more than the acidic state and lower the pKa.
20
-------
Resonance stabilization energy in SPARC is a differential quantity, related directly to the
extent of pi electron delocalization in the neutral state versus the ionized state of the reaction center.
The source or sink in the perturber P, may be either the substituents or R-TT units contiguous to the
reaction center. As with the case of electrostatic perturbations, structural units are classified
according to function. Substituents that withdraw electrons are designated S+ while electron
donating groups are designated S-. The R-TT units withdraw or donate electrons, or serve as a
"conductor" of TT electrons between resonant units. Reaction centers are likewise classified as C+ or
C-, denoting withdrawal or donation of electrons, respectively.
In SPARC, the resonance interactions describe the delocalization of an NBMO electron or
electron hole out of the initial state, (Q) or final state, (Cf) into a contiguous R-TT or conjugated
substituent(s). To model this effect, a surrogate electron donor, CH2", replaces the reaction center.
The distribution of NBMO charge from this surrogate donor is used to quantify the acceptor
potential for the substituent and the molecular conductor. The resonance perturbation of the initial
state versus the final state for an electron-donating reaction center is given by:
where (Aq)c is the fraction loss of NBMO charge from the surrogate reaction center calculated based
on PMO theory (see Appendix). pres is the susceptibility of a given reaction center to resonance
interactions. pres quantifies the differential "donor" ability of the two states of the reaction center
relative to the reference donor CH2". In the parameterization of resonance effects, resonance
strength is defined for all the substituents (i.e., the ability to donate or receive electrons); resonance
susceptibility is defined for all the reaction centers; and resonance "conduction" in R,i networks is
21
-------
modeled so as to be portable to any array of Rn units or to the linking of any resonant source or sink
group.
3.1.4.3. Solvation Effects Model
If a base is more solvated than its conjugate acid, its stability increases relative to the
conjugate acid. For example, methylamine is a stronger base than ammonia, and diethylamine is
stronger still. These results are easily explainable due to the sigma induction effect. However,
trimethylamine is a weaker base than dimethylamine or methylamine. This behavior can be
explained due to the differential hydration of the reaction center of interest and the reaction center.
The initial and the final states of the reaction center frequently differ substantially in degree
of solvation, with the more highly charged moiety solvating more strongly. Steric blockage of the
reaction center can be distinguished from steric-induced twisting of the reaction center in electron
delocalization interaction models. Differential solvation is a significant effect in the protonation of
organic bases (e.g., -NEk, in-ring N, =N), but is less important for acidic compounds except for
highly branched aliphatic alcohols.
In SP ARC's reactivity models, differential solvation of the reaction center is incorporated
in (pKa)c, pres and peie. If the reaction center is bonded directly to more than one hydrophobic group
or if the reaction center is ortho orperri to hydrophobic substituent, then 5soiv(pKa)c must be
calculated. The 5soiv(pKa)c contributions for each reaction center bonded directly to more than one
hydrophobic group are quantified based on the sizes and the numbers of hydrophobic groups
attached to the reaction center and\or to the number of the aromatic bridges that are approximate to
the reaction center using the following equation:
22
-------
where psoiv is the susceptibility of the reaction center to differential solvation due to steric blockage
of the solvent, v are the solid angles occluded by the hydrophobic P that is bonded directly (i),
ortho (j), orperri (k) to the reaction center, respectively.
3.1.4.4. Intramolecular H-Bonding Effects Model
Intramolecular hydrogen bonding is a direct site coupling of a proton donating (a) site with
a proton accepting (|3) site within the molecule. Reaction centers might interact with substituents
through intramolecular H-bonding and thus impact the pKa. The initial, Q, and final, Cf , states of
the reaction center frequently differ substantially in degree of hydrogen bonding strength with a
substituent.
In aromatic, TT-ring or 7r-aliphatic (i.e., diguanide) systems where the reaction center is
contiguous to the substituent and where a stable 5 or 6 member ring may be formed, 5H-B(pKa)c
must be estimated. 5H-B(pKa)c is a differential quantity that describes the H-bonding differences of
the initial versus the final state of a reaction center with a substituent, and is given by:
where HBC is the H-bond contribution for C-S when C and S adjacent to each other, S; is a
reduction factor for steric-induced twisting of C, and MLS is either 1 or 0.7 for aromatic and TT-ring
systems, respectively. For a reaction center that might H-bond with more than one substituent, the
H-bonding contribution for each substituent is calculated and the strongest contributor to H-bond is
selected.
23
-------
3.1.4.5. Statistical Effects Model
All the SPARC perturbation models presented thus far describe the ionization of an acid at
a single site. If a molecule contains multiple equivalent sites, a statistical correction is required. For
example, if a first ionization constant, K, is computed for a single site, and if the molecule has N
such sites, then
Nb
where a and b refer to the acid and conjugate base sites, respectively.
3.1.4.5.6. Temperature Dependence
For processes that can be modeled in terms of some equilibrium (or pseudo equilibrium
component) the temperature dependance can be expressed by the Van't Hoff representation:
where AC and Bc are the entropic and the enthalpic van't Hoff coefficients for the reaction center,
and 5n and 5S are enthalpic and entropic perturbations, respectively. To date, all perturbations have
been assumed to be predominantly enthalpic. The van't Hoff factors (A and B) can be derived
from temperature data for the reaction center or inferred from simple structures with minimal
perturbational contributions. An example of the temperature dependence of pKa for the amino
reaction center is shown in Figure 1 . When the enthalpic perturbation cancels the B parameter as
in the para nitroaniline example, little or no temperature dependence is observed. Some systems
may have perturbations large enough to change the sign of the slope of the pKa temperature
dependence.
24
-------
3.1.5.
Results and Discussion
To date, the approach used in SPARC to predict chemical reactivity parameters has been
applied to UV-visible spectra, pKa in water, electron affinity and carboxylic acid ester hydrolysis
rate constants. The computational algorithm is based on structure query. This involves simply
combining perturbation potentials of perturber units with reaction susceptibilities of the reaction
center. It is important to reemphasize that the reaction parameters describing a given reaction
center (Table 4) are the same regardless of the appended molecular structures. Likewise, for
substituents, the parameters in Table 3 are independent of the rest of the molecule. This structure
factoring and function specification enables one to construct, for a given reaction center of interest,
essentially any molecular array of appended units, and to compute the resultant reactivity.
03
a,
J]
10
9
6 -
2 -_
methyl a mine
4-nitro aniline
31 32 33 34 35
(1/T) *
Figure 1. pKa temperature dependence for selected molecules
36
37
25
-------
Table 3. SPARC Substituent Characteristics Parameters
Substituent
CO2H
co2-
AsO3FT
AsO3'2
AsO2H
PO3FT
PO3"2
BO2H2
so3-
OH
SH
O"
S'
NR2
NR2H+
CH4
NO2
NO
CN
OR
SR
I
Br
Cl
F
in-ring N
in-ring NH+
S02
=N
=NH+
=O
Fs
2.233
1.639
0.300
0.600
1.000
0.600
0.600
1.078
6.315
1.506
2.931
1.913
1.727
1.190
3.978
-1.10
7.460
6.714
5.649
2.138
2.323
4.270
3.756
3.622
3.164
5.310
1.379
6.451
1.533
2.000
3.195
Fq
0.000
-0.603
-0.500
-1.000
-2.000
-0.786
-2.500
0.000
-1.224
0.000
0.000
-1.566
-1.537
0.000
0.779
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
3.785
0.000
0.000
1.000
0.000
MF
0.687
0.560
0.500
0.300
0.000
0.400
0.400
1.010
2.491
-3.116
-1.871
-3.546
-1.437
-4.939
-2.505
-2.065
2.515
4.127
3.141
-4.767
-1.234
0.000
-0.031
-0.066
-1.718
0.929
6.995
2.038
0.544
2.800
1.584
E,
0.072
2.978
0.190
0.150
0.080
0.220
0.840
1.484
1.407
7.240
3.000
11.00
9.368
17.42
21.70
0.129
3.677
1.691
3.196
1.987
1.952
4.928
3.012
1.498
0.800
2.055
8.708
4.176
4.918
2.600
2.281
rls
0.80
1.00
1.20
1.20
0.80
1.20
1.20
0.80
0.80
0.80
0.80
-0.50
-0.50
0.70
0.50
-0.63
1.00
1.00
0.80
0.80
0.80
0.75
0.70
0.65
0.65
0.00
0.00
0.80
0.00
0.00
0.00
x.
3.43
2.68
2.60
2.60
2.60
3.32
2.90
2.40
2.82
2.76
2.76
3.01
3.34
2.58
3.23
2.30
3.79
3.80
3.71
2.90
2.80
2.95
3.19
3.37
3.67
3.30
3.80
3.60
3.80
3.80
3.60
26
-------
Table 4. Reaction Center Characteristics Parameters
c
CO2H
AsO2H
P02H
POSH
PS2H
BO2H2
SeO3H
S03H
OH
SH
NR2
in-ring N
=N
Pelc
1.000
0.653
0.489
0.291
0.101
0.355
1.207
0.451
2.706
2.195
3.571
5.726
5.390
Pres
-1.118
-0.817
-0.394
-0.402
-0.802
-0.050
-0.400
-4.104
18.44
4.348
19.36
-11.279
-4.631
Xc
2.60
2.22
2.72
2.69
2.63
3.04
2.30
2.09
2.49
2.76
2.40
2.31
2.47
(pKa)c
3.75
6.63
2.23
1.55
1.96
8.32
4.64
-0.10
14.3
7.40
9.83
2.28
5.33
Carbon and Nitrogen acid parameters are included in this table
The perturbations of some reaction centers such as oxy acids are small, whereas OH, NR2,
in-ring N and =N reaction centers have large perturbations. For example, the perturbation of the
OH in the molecule below may be large as 12 pKa units. The resonance and the electrostatic
contributions of the two nitro and the =N groups substantially overcome the H-bond contributions
of the OH with either the =N or the nitro groups making the pKai extremely acidic. On the other
hand, the field effect of the negatively charged groups (SOs" and O") and the H-bond of the second
OH with the =N will raise the second pKa2 and overcome the resonance contribution and the field
effect of the uncharged groups.
pKa2
Observed
2.0
12.2
SPARC
1.7
12.1
OH
OH SO3
-------
3.1.6. Testing and Training of lonization pKa
The ionization pKa calculator was trained on some 2400 compounds involving all the
substituents and reaction centers shown in Table 3 and 4. The overall training set RMS deviation
was 0.36 pKa units. In addition, the SPARC pKa calculator has been tested on 4338 pKas
(excluding carbon acid) for some 3685 compounds, including multiple pKa's up to the sixth pKa
spanning a range of over 30 pKa units as shown in Figure 2. The overall RMS deviation error for
this large test set of compounds was found to be 0.37 pKa units. While it is difficult to give a
precise standard deviation of a SPARC calculated value for any individual molecule, in general,
SPARC can calculate the pKa for simple molecules such as oxy acids and aliphatic bases and acids
within ±0.25 pKa units; ±0.36 pKa units for most other organic structures such as amines and acids;
and ±0.41 pKa units for =N and in-ring N reaction centers. For complicated structures where a
molecule has multiple ionization sites (N > 6) such as azo dyes, the expected SPARC error is ±0.65
pKa units.
While the pKa for simple structures can be measured to better than 0.1 pKa units in the same
laboratory. The interlaboratory RMS deviation error among the observed pKa for simple organic
molecules reported by IUPAC was not better than 0.3 pKa units even for simple carboxylic acid
derivatives (see Table 5). For complicated structures, especially those with multiple ionization sites,
the RMS deviation was much higher. For example, SPARC was used/tested to estimate 358 pKa's
for 214 azo dyes [18]. For these compounds, the SPARC calculated RMS deviation was 0.63 pKa
units. The experimental error reported by IUPAC for azo dyes was as high as 2 pKa units [18]. The
IUPAC reported RMS interlaboratory deviation between observed values of pKa for azo dyes,
where more than one measurement was reported was 0.64 [18]. Several examples of
interlaboratory error for simple and relatively complicated molecules are shown in Table 5. We,
28
-------
therefore, believe that the errors in SPARC-calculated values are comparable to experimental error,
and perhaps better for these complicated molecules. We also note that the diversity and complexity
of the molecules used for pKa model development and testing has been drastically increased in the
last few years in order to develop more robustness. A summary of the statistical parameters for the
SPARC ionization pKa in water calculator is shown in Table 6. For a sample hand calculation see
reference 19.
In this rigorous test, almost all the organic molecules reported in the IUPAC series were
included. The only compounds that were removed for this test were those that:
1. Form covalent hydrates. These include many of the multiple in-ring N compounds such as
quinazoline and pteridine. See hydration rate section.
2. Are known to tautomerize, e.g., molecules such as methyl-substituted imidazole. See
tautomeric constant section.
3. Carbon acid reaction center where the perturbations for this group are very large, and the
measurement standard deviation is not better than 1 unit. For example, the pKa's for
methane, nitro-methane, tri-nitro-methane are 52, 10, 3.6, respectively. (SPARC calculates
the pKa for carbon acid within ±1.3 pKa units).
4. SPARC has not yet been designed to calculate, such as quaternary amines.
SPARC also may not be able to discriminate positional substituents effects for an oxy acid
reaction center (where the perturbations are extremely small) in structures such as 3- or 4- S-CeFLj-
YC where Y is some side chain intervening between the benzene ring (e.g., Y = (CH2)X) and the
reaction center, (C=CO2H). SPARC can discriminate these effects for other reaction centers, C,
such as NR.2 as shown in Table 7.
29
-------
Table 5. Intel-laboratory Measurements Error Range in pKa For Simple and
Relatively Complicated IUPAC Molecules
Moleculea
Phenol
2-methylphenol
3 -methyl phenol
4-methylphenol
Citric acid
Dibutylpropanedioic acid
Biphenyl-2-ol
Uracil
4-Dimethylamino-azobenzene
4-Dimethylamino-4'hydroxy- azobenzene
2,4-Diamino-l,3,5-triazine
Range
pKa
pKa
pKa
pKa
pKi
pK2
pK3
pKi
pK2
pKa
pKi
pK2
pKi
pK2
pKi
pK2
pKa
of Measurements
9.78 -
10.10 -
9.82 -
10.02 -
2.79 -
4.11 -
5.34 -
1.89 -
7.19 -
10.01 -
9.38 -
12.0 -
3.2
-4.50 -
-1.81 -
-2.3
3.90 -
10.02
10.33
10.10
10.28
3.13
4.78
6.43
2.64
7.70
11.3
9.51
14.2
3.50
(-1.3)
2.95
3.40
5.88
a* In compiling pKa's for this study, it was necessary to compile data from many laboratories.
We used lUPAC-screened data, but even these data had relatively large variation, even for simple
molecules as shown above.
Table 6. Statistical Parameters of SPARC pKa Calculations
Set
Simple organic comp.
Azo dyes comp.
IUPAC comp.
Training
793
50
2500
R2
0.995
0.991
0.994
RMS
0.235
0.550
0.356
Test
2000
273
4338
R2
0.995
0.990
0.994
RMS
0.274
0.630
0.370
30
-------
Table 7. Observed vs. Calculated pKa for 3 and 4-S-C6H4-(CH2)x-C
c
s
X
1
2
3
4
NR2
4-OC
Obs. Cal.
5.3 5.11
9.6 9.52
9.92
10.06
10.13
3-OC
Obs. Calc.
4.3 4.43
9.15 9.31
9.77
9.96
10.04
CO?H
4-C1
Obs. Cal.
3.98 3.76
4.19 4.35
4.65 4.59
4.66
4.69
3-C1
Obs.
3.83
4.14
4.58
Cal.
3.65
4.34
4.59
4.66
4.69
03
X.
a.
O
01
<
Q_
W
1 8 -
-2 -
-1 2
-1 2
-2
1 8
Observed pKa
Figure 2. SPARC-calculated versus observed values for 4338 pKa's in water of 3685 organic com-
pounds. The overall RMS deviation was equal to 0.37. This test does not include carbon acid. The
majority of the molecules are complex, e.g., some of the molecules have 8 different ionization sites
(azo dyes).
3.1.7.
Conclusion
The pKa models are the most robust and most highly tested of the SPARC models. The
models are fully implemented and are executing in code. However, the real test of SPARC does not
lie in its predictive capability for pKa's but is determined by the extrapolatability of these models to
other types of chemistry. The SPARC chemical reactivity models used to predict ionization pKa in
water have been successfully extended to calculate many other properties (see next section).
31
-------
3.2. Estimation of Microscopic, Zwitterionic lonization Constants, Isoelectric Point
and Molecular Speciation
3.2.1. Introduction
Determination of microscopic constants and zwitterionic ratios has played an important
part in understanding the ionic composition of many biologically active molecules, particularly
since all proteins fall into this class. The chemical and biological activities of these substances vary
with the degree of ionization. For this reason, accurate knowledge of the ionization constants for
zwitterionic substances is a prerequisite to an understanding of their mechanism of action in both
chemical and biological processes.
Unfortunately, microscopic constants have been determined for less than 100 compounds,
and for only a very few of these molecules has the zwitterionic constant been determined or
calculated [39-43]. Moreover, determination or calculation of the fraction of the various
microscopic species as a function of pH has been reported in the literature for less than a dozen
molecules. Most of these measurements were restricted to aliphatic amino acid derivatives and only
for simple, two ionization site molecules such as glycine and cysteine (where the CC^H is already
ionized). Benesch [40] calculated the relative concentration of the four microscopic forms for
cysteine where the carboxylic acid group(s) was ionized in all the forms. He found that the
concentration ratio of the "S-R-NH3+ species to the HS-R-M^ species at any given pH was approxi-
mately 2 to 1 rather than 1 to 1 as suggested by Grafius [40]. This difference indicates the
magnitude of the uncertainty involved in the various approximations made to calculate the
microscopic constants and the relative concentration of the different species. As noted earlier, only
a very few of the total number of microconstants needed to characterize the equilibria have been
32
-------
measured or calculated. For example, only two microconstants have been determined for molecules
with 4-ionizable sites such as DOPA and Epinephrin [42]. Estimation or measurement of the
microscopic constants and relative concentration of the various species for such compounds is an
extremely difficult task.
The SPARC pKa calculator can be used to estimate the microscopic constants for almost
any molecule of interest strictly from molecular structure. Hence, the microscopic ionization
constants, the zwitterionic constant and the fraction of the various microscopic species as function
of pH can be estimated without approximations such as limiting the number of species considered.
The titration curves (charge versus pH) can also be calculated using the same reactivity models.
3.2.2. Calculation of Macroconstants
A Bronsted acid is defined as a proton donor and a Bronsted base as a proton acceptor. The
acid-base ionization properties in solution are generally expressed in terms of ionization constants
(pKas) that describe the tendency for an acid to give up a proton to a solvent or the affinity of a base
for a hydrogen ion. The strength of an acid in a solvent is measured by the ionization constant for
the reaction. Many molecules of great importance in chemistry and biochemistry contain more than
one acidic or basic site, and some macromolecules such as amino acids, peptides, proteins and
nucleic acids may contain hundreds of such groups. These latter molecules may exist in a great
number of distinct ionization states. The acidic groups are uncharged in strongly acidic solutions
and negatively charged in sufficiently alkaline solutions. The basic groups are positively charged
(protonated) in a strongly acidic solution and are uncharged in sufficiently alkaline solution. For a
bifunctional acidic compound the ionization equilibria are usually written as
33
-------
H2A «4 ^ FT + HA' Ki = [HA'][H"]/[H2A]
HA' «* + H+ + A'2 K2=[A'2][H+]/[HA-]
where the constant concentration of the solvent has been absorbed in K. The pKi and pK2 (pKa =
- log Ka) are commonly evaluated from a pH titration or spectroscopic measurement. These
measured pKa's are termed macroscopic constants because they often only describe a composite of
the processes which are actually occurring in solution. The actual donor sites where the protons
reside are not specified and may not be unique. Thus, a solution "species" such as H2A may, in fact,
consist of several H2A species with protons occupying different basic sites in each of the species.
On the other hand, microconstants are the equilibrium constants for equilibria involving individual
species in solution. For these complex compounds, microconstants may or may not be capable of
being either measured or determined distinctly.
3.2.3. Zwitterionic Equilibria: Microscopic Constant
Many molecules contain both an acid and a base functionality, but these sites are not able
to ionize simultaneously. Such molecules are usually referred to as amphoteric. Amino phenols are
good examples of amphoteric molecules. When the pH is very low, the cationic species
predominates while at high pH the anionic species predominates. At intermediate pH's the
molecule exists in the neutral form. Other substances contain both acid and base functionality
where both the base and the acid sites may be simultaneously ionized to form an internal salt. These
substances are referred to as zwitterionic or dipolar ions. The amino acids are an example of
molecules that can exist as zwitterions. At low pH and high pH the cationic species and the anionic
species predominate, respectively, as in the case of the amino phenol. But unlike the amphoteric
amino phenol, the internal salt predominates as an intermediate species over a wide range of pH.
34
-------
Actually, the zwitterion and the isomeric uncharged molecule are in equilibrium in aqueous
solutions. The nature of this equilibrium depends on the acid and the base strength of the ionizing
groups involved. For a molecule with two ionizable sites, such as glycine, this process can be
represented diagrammatically below.
1kzw NhbChbCQf
12 " NH2CH2C
Each ionizable group has two microscopically different ionization pathways (ki and ki2, for
loss of the first hydrogen and k2 and k2i for loss of the second hydrogen). Each group has two
constants associated with its ionization, one when the other group is ionized and one when the other
group is not ionized.
When the pKa's of the ionizing groups are arithmetically far apart (as those of glycine
shown in Figure 3) knowledge of the two macroscopic constants, KI and K2, is enough to calculate
speciation as a function of pH. When the two pKa values lie within 3 pKa units of one another, such
as in the case of N-phenylglycine (as shown in Figure 4), a more detailed survey of the problem
becomes necessary. In such a case, the two macroscopic constants, KI and K2, cannot fully describe
the equilibria denoted by the four microscopic constants: ki, ki2, k2i, k2 and the zwitterionic
constant kzw. However, the macroscopic and microscopic equilibrium constants are closely related
by the following equations:
KI = ki + ki2
35
-------
1 / K2 = I/ k2i + 1 / k2
The four microscopic ionization constants (k;) involving the individual, microscopically
distinct species describe precisely the acid-base chemistry of such a system at the molecular level
while the two macroscopic pKas provide an incomplete specification of the equilibria. It should be
noted that the four constants are not independent but are subject to the relation ki ku = k2i k2. In
order to calculate molecular speciation as a function of pH, kzw, must be calculated. kzw may be
determined using the left or the right path of the above scheme. Since both thermodynamic paths
give the zwitterion product, kzw may be expressed as function of the microscopic constants within
any loop as
**!
1
K
The integrity of the pKa calculator in SPARC can be checked by calculating kzw using the
two different loops. The RMS deviation in calculated versus measured pkzws for the cases tested to
date is 0.5 pKa units [22]. This value is what one would expect from a calculation requiring two
pKa calculations (0.37 * A/2). kzws calculated from different thermodynamic paths are averaged over
the number of thermodynamic paths available.
3.2.4. Speciation of Two lonizable Sites
In the pH range from 4 to 10, more than 99% of glycine in solution exists in a zwitterionic
form where both the carboxylic and the amine groups are simultaneously charged. Over a wide pH
range only 3 microscopic species have significant concentrations as is shown in Figure 3. The
36
-------
concentration of the fourth species (neutral) is negligible (below 1%) over the entire pH range. The
ratio of the zwitterionic concentration to the neutral concentration is about 4 * 105. The
macroconstants in a figure such as Figure 3 occur when the fraction of the ionizing group of interest
is reduced to 50%. So, the left and the right-hand side where the fraction is equal to 50% are the
macroscopic pKCo2H and the pKNH2, respectively. The microconstants interconnecting the species
of interest occur where the species curves intersect. In the case where the two macroconstants are
very far apart, the two macroconstants pKi and pK2 are equal to the microconstants pki and pk2i,
respectively, as shown in Figure 3. Hence, the two macroconstants can satisfactorily describe the
equilibrium in this instance.
Fraction
1 .20E + 00 n
8.00E-01
4.00E-01
On n p i n n
1 2 3
\ / \ /
\ / \ /
\ \
\ \
i i
\ \
\ \
\ \
/ \ / \
_y v, / \
.U U t + U U 1
0 5 10 15
PH
Figure 3. Fraction of the major microscopic species of glycine as a function of pH. The two
macroscopic pKas are far apart and are equal to the microscopic constants (pki2andpk2i)- The number
on the top of each curve corresponds to the microscopic species: 1 is positively charged; 2 is the
zwitterion; and 3 is the negatively charged species.
37
-------
c
o
"•5
o
CC
1 .OOE + 00
6.00E-01
2.00E-01
-2.00E-01
2Z
\2N
8
10
PH
Figure 4. Fraction of the major microscopic species of N-phenylglycine as a function of pH
The numbers on the top of each curve correspond to the microscopic species. 1, 3, 2N, 2Z
corresponds to the positive, negative, neutral and the zwitter ion species, respectively.
On the other hand, N-phenyl substitution of glycine substantially lowers the macroscopic
constant of the amine group due to resonance contributions. As a result, the amine and the
carboxylic acid have comparable hydrogen ion affinities and both functional groups make
important contributions to the hydrogen ion concentration (i.e., appreciable concentrations of the
acidic and the basic forms of both functional groups are present in solution simultaneously). The
macroconstants become more nearly equal (within 2 pKa units) and the ratio of the zwitterionic
species to the neutral species in solution decreases as indicated in Table 8. In this case the
macroconstants are not equal to the microconstants (as is shown in Figure 4) and the equilibrium
cannot be satisfactory described by pKi and pK2. Substituents with a large dipole moment, such as
a nitro or cyano group, will further decrease the zwitterionic ratio due to electrostatic and/or
resonance effects. For example, m-nitro- and m-cyano-phenylglycines exist in aqueous solution
predominantly in the non-zwitterionic form due to the large electrostatic effect of the dipolar
38
-------
group. Compounds with weaker dipole substituents, such as methyl- and methoxy phenylglycines,
exist largely in the zwitterionic form. Substituent effects on pKa are illustrated in Figure 5 for
glycine, N-(phenyl) glycine, N-(m-nitrophenyl) glycine and N-(m-methoxyphenyl) glycine where
the charge lost in the molecule as function of the pH is plotted. The zwitterion ratio kzw is very
dependent on the nature of the substituent in these molecules. The proportion of zwitterions in
aqueous solution is governed by the effect of the substituent on the pKa's, and can vary
substantially as shown in Table 8. Table 8 shows the observed versus SPARC-calculated
microscopic ionization constants, pky, and zwitterionic constants, pkzw, for two molecules with
ionizable site.
O -
pH
Figure 5. Total charge versus pH titration curves for (1) N-(m-nitrophenyl) glycine, (2) N-(phenyl)
glycine, (3) N-(m-methoxyphenyl) glycine and (4) glycine.
39
-------
Table 8. Observed vs. SPARC calculated microscopic constants for molecules with 2 ionizable site.
Obs. Calc. Obs. Calc. Obs. Calc. Obs. Calc.
Molecule
Glycine
Phenylglycine
m-NO2-
m-CN-
m-Cl-
m-COMe-
p-Cl-
m-OMe-
m-Me-
p-Me-
p-OMe-
4-Aminobenz. acid
4-Dimethaminobnz
3-Aminobenz. acid
Picolinic acid
Nicotinic acid
Isonicotinic acid
Tyrosine ethylester
5-Thiomethyl-Imid.
2-Thiomethyl-Imid.
l-Me-2-thioMeImid.
Tyramine
N,N-dimethyl-
DOPA
Dopamine
pkz
pki
pki2
pk2
-5.60
-0.26
0.90
0.85
0.36
-0.21
-0.32
-0.70
-1.00
0.93
0.62
-0.43
-1.15
-1.00
-1.40
1.15
-0.46
-0.36
-0.4
-0.09
—
—
-5.60
-0.58
1.35
1.10
0.40
0.07
0.05
-0.27
0.70
-0.90
-0.94
0.87
0.42
-0.72
-1.2
-1.18
-1.12
1.59
2.0
0.25
0.37
-0.76
-0.35
—
—
2.35
2.03
2.01
2.03
1.99
2.09
2.06
2.05
2.12
3.40
3.28
3.22
1.04
2.11
1.86
9.63
7.72
6.50
6.47
9.58
9.03
8.97
8.90
2.15
2.15
2.00
2.00
2.12
2.11
2.10
2.16
2.15
2.16
2.19
3.71
3.74
3.40
1.37
2.42
2.61
9.10
8.51
6.81
6.80
9.43
9.4
9.1
8.86
8.00
2.29
0.06
0.28
1.10
1.20
1.61
1.89
2.38
2.75
3.11
2.47
2.66
3.65
2.21
3.13
3.26
7.33
6.57
6.96
6.83
9.62
7.8
2.67
0.50
0.80
1.63
1.90
1.95
2.33
2.83
3.00
3.10
2.83
2.98
4.00
2.67
3.46
3.56
7.3
5.91
6.91
6.90
9.97
9.56
9.61
—
9.80
4.22
2.99
3.05
3.51
3.74
4.43
4.77
5.07
3.90
4.28
4.66
5.29
4.77
4.84
8.36
9.13
8.75
10.68
10.31
9.17
10.1
9.70
4.60
2.50
2.70
3.55
3.87
3.88
4.26
4.76
4.90
5.00
3.80
4.39
4.79
5.28
4.87
4.78
8.35
8.34
9.31
9.40
11.02
10.68
9.15
9.93
4.43
3.96
3.75
3.78
3.90
3.87
3.89
3.95
4.00
4.07
4.07
4.83
4.9
4.23
4.12
3.75
3.44
9.51
8.67
8.39
9.40
—
4.20
3.95
3.80
3.80
3.86
3.87
3.86
3.90
3.96
3.95
3.98
4.61
4.51
4.02
4.12
3.52
3.48
9.75
9.91
8.70
8.70
10.1
10.0
9.42
—
40
-------
3.2.5. Speciation of Multiple lonization Sites
Martin, et. al [41] measured the 12 microscopic constants for tyrosine. They used various
approximations to estimate the fraction of all the tyrosine species present in which the hydroxy
group was ionized. They assumed that the macroscopic constant KI was equal to the microscopic
constant ki for the CC^H group and that the ionization of the hydroxy group was completely
independent of the ammonium group. In addition, they assumed that the molar extinction
coefficients for several species were identical. Martin [42] also used this approach to calculate the
speciation of DOPA (1-3,4-dihydroxyphenylalanine), that is where the phenolic groups are ionized,
as a function of pH. To the best of our knowledge, estimation of all the different microscopic
species for molecules having four or more ionizable sites has not been reported.
For a molecule that has N ionizable sites, there are N macroscopic ionization constants that
can be measured. There are, however, 2N-1 * N microscopic ionization constants and 2N
microscopically different species or states. For example, tyrosine in a strongly acidic solution
contains 3 ionizable protons attached to a carboxyl, aromatic hydroxyl and ammonium group.
Since each of the three groups may exist in either of two states, tyrosine may exist in 8 (23)
microscopically different forms. The most positive of these 8 states is the cation, with net charge Z
=1; the most negative is the divalent anion, with Z = -2. Each of the two intermediate states of net
charge Z =0 and Z = -1, respectively, can have three microscopically different forms. Each of the
ionizable groups in tyrosine is characterized by four microconstants, since the tendency of each
group to accept or donate a proton depends on the ionization state of the other two groups. Hence,
there are 12 (3 * 22) microscopic ionization constants connecting the 8 species. Three macroscopic
ionization constants (Ki, K2, KS) for tyrosine have been determined experimentally from titration
and spectroscopic data [43].
41
-------
For tyrosine, only 5 of the 8 species ever have appreciable concentration [22]. The fraction
of each of the microscopic species formed by a molecule with three ionizable sites (e.g. tyrosine or
cysteine) can be expressed as function of pH in terms of the microconstants. If we start from the
neutral species (uncharged species) rather than from the positively charged species, the fraction of
any microscopic species for a molecule having N ionizable sites can be expressed in general as
Dij...k/Dx where DT can be expressed [22] as
kt [H+]L> £ Z *<• kv[H + ]L> 2S- S *< kV" **••* (H + JL«*
T 0! I! 2! N!
and Ly..k is the charge of the final state (ij..k state). The factorial is the number of different
thermodynamic paths that lead to the ij..k state and Dy k is one of terms in the denominator. For
example, the fraction of neutral species would be 1/Dx and the fraction of a singly ionized species
would be ki*[H+]Li/Dx.
The fraction of any distinct species as function of pH (fraction-species curve) can be
determined from the Equation above. Whenever the total net charge of two (or more) charged
species are equal, the maximum of the corresponding fraction-species curves will occur at the same
pH. This can be shown by estimating the ratio of the fraction of any two equally charged species
using the above equation. The H ion dependence will cancel and the ratio of the two fractions will
be totally independent of pH. In addition, the titration curve (charge curve) can determined by
multiplying the fraction-species curve by the charge on the species and summing over all species
(Figure 5). The macroscopic pKas can be determined by taking the first derivative of the titration
curve [22]. Table 9 shows the observed versus SPARC calculated microscopic constants for
several systems containing 3 ionizable sites. The notation for the macroconstants follows the
scheme first proposed by Hill [44] and used later by R. Martin, et. al [43]. The ionizing group of
interest is indicated in the microscopic pk by the last number in the subscript. Any number
42
-------
preceding this in the subscript denotes another specified group in the molecule that already exists in
the basic form when the ionization under consideration is taking place. Thus, pks2 denotes the pk
value for the ionization of the OH group when the NH3+ group has already been converted to the
conjugate base -NH.2- Since the number 1 does not appear in the subscript 32, its absence denotes
that group 1, the carboxyl, is still in the un-ionized form during the reaction corresponding to the pk
value in question.
Table 10 shows the observed versus the SPARC-calculated microscopic pk;s for
glutathione where two CC^H groups, an SH and a NHs+ can be ionized simultaneously in solution.
Since each of the four groups may exist in either of two states (acidic or basic) the molecule may
exist in 24 states. To describe the population of the possible 16 microscopic species, 32
microscopic ionization constants are required (see reference 22). However, only 8 of these
constants have been measured for glutathione. In general, for N possible ionizable sites in a
molecule there are NI microconstants that lead to a molecular state of ionization of IS where IS is
the number (
-------
Table 9. Observed vs. SPARC-calculated value for the microscopic pk; s for 3 ionizable site molecules.
Tyrosine
Obs. Calc.
Cysteine
Obs. Calc.
Cysteine
glycine
Obs. Calc.
Cysteine
ethyl ester
Obs. Calc.
2.21 2.00
1.71 1.80
5.17
pk2i
pk3i
pk23i
pk2
pki2
pk32
2.
4.
4.
9.
9.
9.
.61
.37
.77
.31
.71
.91
pki32 10.3
pk3
pko
pk23
pki23
7.
9.
7.
9.
.19
.35
.79
.95
2.30
3.90
4.20
9.30
9.60
10.0
10.3
7.30
9.60
8.40
10.6
2.79
3.80
4.74
7.45
8.53
9.50
10.0
6.77
8.60
8.41
10.36
2.40
3.80
4.40
7.80
8.20
9.09
10.0
6.70
8.86
8.60
10.5
3.40
3.50
7.
7.87 7.
8.
9.45 9.
6.
7.14 6.
8.
8.75 8.
.10
.40
.90
.20
.50
.88
.43
.80
2.
4.
4.
.62
.30
.74
7.45 7.08 3.85
4.
9.09 8.88 4.
5.
6.77 6.29 7.
9.
8.41 8.38 7.
9.
.32
.65
.09
.04
.19
.84
.96
2.
4.
4.
4.
4.
4.
4.
7.
9.
8.
.30
.10
.20
.20
.30
.60
.80
.87
.50
.40
10.0
44
-------
Table 10. Observed vs. SPARC calculated microscopic ionization constants for glutathione.
CO2H(1) CO2H(2) SH(3) NH3+(4)
Obs. Calc. k Obs. Calc. k Obs. Calc. k Obs. Calc
pki 2.09 1.92 pk2 3.12 3.26 pk3 .... 7.94 pk4 .... 7.04
pk2i
pk3i
pk4i
pk24i
pk23i
pk34i
2.33 1.98
2.06
3.84
3.91
2.12
4.10
pki2
pk32
pk42
pki32
pki42
pk342
3.36 3.31
3.50
3.35
3.56
3.41
3.60
pko
pk23
pk43
pki23
pk243
pki43
8.14
8.21
8.24
8.93 8.37
8.49
8.43
pkn
pk24
pk34
pki24
pki34
pk234
8.65
7.31
7.64
9.13 8.88
9.26
7.86
pk342i ... 4.10 pki342 .... 3.65 pki243 9.08 8.91 pki234 9.28 9.50
45
-------
3
4+
r
L_
^ 1
*-
h
6
i
*
i
2
4
)
7
*
A
k2
~
k42
4
k/n
*"
k,,
i 1
. J
1
2
3
4 +
ks2
1
k3
1
p
kis
k
k:
3
1
1
2
3 -
4+
4
1
2
3
'
-
-
k43
_
K-|
k4
7
i
i
2
3
4
i
4 ,
1 -
2
3
k123
3 -
4 +
k41
3-
4
^231
k24
' k
K-|24
L ^.
1
kl32
34
1
i
4
r
1
4
!
k
k
r
1 k142
1243
t t
1
2 -
3 -
4+
1 -
2 "
1
-rf
r t
r
2
4+
!_
k243
k 1
k234 _
k3421
4
1
f
4
'
k:
"•^O
k
542
ks4
<—'>•
kl34
kl34
4
143
1
1
2
3
4
2 i
r
1
o
o
CO2HCHCH2CH2CNHCHCNHCH2CO2H
NH3+ CH2SH
Glutathione
Figure 6. The 16 microscopic states of glutathione and the 32 microscopic ionization
constants that interrelate them. See ref. 22 for more details.
46
-------
Another example, hemimellitic acid (1, 2, 3-benzenetricarboxlic acid ) presents unusual
ionization behavior. The micro constants and the observed macroscopic constants are not identical.
The first ionization step favors leaving the molecule ionized at the 2 position and is stabilized by
hydrogen bonding with the carboxylic groups in positions 1 and 3. The second step is more
complicated. Here, the most stable di-anion is the species ionized at positions 1 and 3. This
minimizes electrostatic interactions. Going from the molecule ionized at the 2 position to a di-anion
ionized at the 1 and 3 positions is not a simple one proton loss. This process involves three protons
as shown in Figure 7. Figure 7 also shows the calculated microscopic species distribution of
hemimellitic acid as function of pH. The thermodynamic steps for the different two ionization
paths, the SPARC calculated results for each step (the micro constants), the first and the final step
(the observed macro constants) are shown in Figure 8.
^-__
pH
Figure 7. The calculated fraction of the eight microscopic species of Hemimellitic
acid versus pH. 4 of the microscopic species are shown on the top of the graph. Graphs 1 & 2
show the other 4 microscopic species; each having two different symmetrical species lying on
the top of each other: O'OO/OOO" and O'O" O/O O" O", respectively
47
-------
1.
.CO -11 CO;H
2.
•CO,
CO: H CO-"
:H OJ-H ^/fM• H
4.9
4,4
3,2
17 (3,8-4.2!
2,53 (2,6-2,8)
CO-H
a- o
,CO:H
CO-H
CO-H
- n ,^v
o
CO-'
5,76
3,1
3.8(18-4.2)
Figure 8. The three ionization macroscopic pKas for hemimellitic acid. The second macroscopic pKa
involves three different microscopic constants that are shown to the side of each step. The Two paths
are calculated within 0.1 pKa units of each other. The observed macroscopic are the numbers between
the bracket. The others are SPARC-calculated pKas.
3.2.6.
Isoelectric Points
Many molecules, such as amino acids, peptides, and proteins, contain both acidic and basic
groups. The acidic sites for these molecules are uncharged in strongly acidic solutions and are
negatively charged in sufficiently alkaline solutions. In contrast, the basic groups are positively
charged in strongly acid solution, and the conjugate bases are uncharged in sufficiently alkaline
48
-------
solution. Therefore, in an electric field such a molecule migrates as a cation in strongly acid
solution and as an anion in strongly alkaline solution. At some intermediate pH value, the mean
net molecular charge, Z, must attain the value zero, and the molecule will remain stationary in an
electric field. The pH value at which this occurs is known as the isoelectric point. Edsall and
Wyman [45] points out "for most simple ampholytes of this type, such as glycine or phenylglycine,
pKi and pK2 are so far apart that there is not merely an isoelectric point, but a broad zone of pH
values in which the ampholyte is practically isoelectric". However, Edsall showed that the
theoretical ioselectric point (pHi) for a two ionizable sites molecule, such as those mentioned
above, is given by
For polyvalent ampholytes, Edsall showed that for molecules containing 3 ionizing groups
where one of the macroscopic pKa's is far apart from the other two pKas, a reasonable approxima-
tion can be made to calculate the isoelectric point. For example the isoelectric point for lysine may
simply be expressed as (pK2 + pK.3)/2. Unfortunately, for more complicated systems, a very rough
approximation has to be made. With SPARC, the isoelectric point can be estimated by plotting the
fraction of neutral or zwitterionic species versus pH (e.g. Figure 3, 4). The pH at the middle of the
zwitterionic (or any other species where the total net charge of the molecule is zero) range is
labeled as the isoelectric point. The observed versus SPARC-calculated isoelectric points for
several molecules are shown in Table 1 1 .
The procedure for calculating the zwitterionic equilibrium constant has been automated by
allowing the user to specify all the zwitterionic acid-base pairs directly at the level of SMILES
input. For example, the input for B-amino propionic acid would be N@+CC(=O)O@-. The
SPARC molecular parser now recognizes the @ as a request to perform an automatic calculation of
49
-------
the zwitterionic constant along with the four relevant pKas. The user can also generate plots of the
relative concentrations of neutral, cationic, anionic and zwitterionic species as a function of pH.
Table 11. SPARC-Calculated Isoelectric Points.
No
1
2
O
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
34
Molecule
Glycine
Cysteine
Lysine
Glutamic acid
Penicillamine
Phenylglycine
m-NO2-
m-CN-
m-Cl-
m-COMe-
p-Cl-
m-Ome-
m-Me-
p-Me-
p-OMe-
Thiazolidine-4-carboxylic acid
2-Methyl-
2,2-Dimethyl
5,5-Dimethyl
2,5,5-Trimethyl
2,2,5, 5-Tetramethyl
2-Ethyl-2-methyl-
2-Ethyl-2, 5, 5 -trimethyl
Niflumic acid
Obs.
6.0
5.1
10
3.2
4.9
3.1
1.9
2.0
2.5
2.5
2.7
2.9
3.2
3.4
3.6
3.9
4.4
4.2
4.2
4.1
4.2
5.2
5.2
3.30
Calc.
5.8
5.0
9.8
3.2
5.0
3.2
2.0
2.1
2.6
2.6
2.8
3.0
3.3
3.4
3.4
4.4
4.6
4.7
4.4
4.5
4.7
4.7
4.7
3.1
3.2.7.
Conclusion
The SPARC chemical reactivity models used to estimate ionization pKa in water can
also predict zwitterionic and microscopic ionization constants, pk;, of organic molecules with
multiple ionization sites that are as reliable as most experimental measurements. The
corresponding complex speciation for these molecules as a function of pH and the titration
curve can also be estimated using the same models without modification. The chemical
reactivity models are fully implemented and are executing in code.
50
-------
3.3. Estimation of Gas Phase Electron Affinity
3.3.1. Introduction
One of the fundamental properties of gaseous negative ions is the lowest energy required to
remove an electron. This energy is called the electron affinity (EA). The electron affinity of a
molecule plays an important role not only in gas-phase ions, but also in condensed-phase and
charge-transfer complexes in chemistry, biology and physics. Although gaseous negative ions in
molecules were first observed around 1900 and have been studied extensively since then, the first
reliable EA was not obtained until the early 1960's for O2. Since that time, numerous methods have
been utilized to predict and measure electron affinity. Despite the availability of a large number of
methods and the fundamental importance of electron affinity values, the complexity of molecular
negative ions and the inherent difficulties in determining electron affinity have prevented the
determination of this important property for many molecules of interest. Wide disagreement also
exists among reported values.
3.3.2. Electron Affinity Computational Methods
The electron affinity property of a molecule describes the conversion of the neutral molecule
to a molecular negative ion when both the neutral molecule, E, and the negative ion, E", are in their
most stable state. Electron affinity is defined as the difference in energy between a neutral molecule
plus an electron at rest at infinity and the molecular negative ion when both the neutral molecule
and the negative ion are in their ground electronic, vibrational and rotational states.
The added electron enters the LUMO (Lowest Unoccupied Molecular Orbital), which in the
negative ion becomes the HOMO (Highest Occupied Molecular Orbital). The lower the energy of
51
-------
the LUMO, the greater will be the electron affinity and vice versa. The energy differences between
the LUMO state and the HOMO state are small compared to the total binding energy of the reactant
involved; hence, perturbation theories can be used to calculate the energy differences between these
two states. Perturbation theory treats the final state as a perturbed initial state and the energy
difference between these two energy states determined by quantifying the perturbation. The
perturbation of the HOMO state versus the LUMO state is factored into mechanistic components of
resonance, field and sigma induction contributions.
The successful application of the SPARC chemical reactivity models developed for pKato
the calculation of electron affinities demonstrates the power of the molecular toolbox approach.
Knowledge and models developed in the arena of ionization pKa were directly applied to another
chemical reactivity problem. Similar to SPARC's approach for calculating ionization pKa,
molecular structures are broken into functional units having known intrinsic electron affinities
(EA)C. The intrinsic behavior is then adjusted for the molecule in question using the mechanistic
perturbation models described for ionization pKa.
3.3.3. Electron Affinity Models
As was the case for pKa, the SPARC computational procedure starts by locating the
potential sites within the molecule at which a particular reaction of interest could occur. In the case
of EA these reaction centers, C, are the smallest subunit(s) that could form a molecular negative ion.
Any molecular structure appended to C is viewed as a "perturber" (P). All reactions to be addressed
in SPARC are analyzed in terms of critical equilibrium components:
P-Ci ^ ^ P-Cf
52
-------
where C; and Cf denote the initial state and the final state or the LUMO state and the HOMO state of
the reaction center, respectively; P is the structure that is presumed unchanged by the reaction; and
EA denotes the electron affinity reaction. Electron affinity is expressed as a function of the energy
required to add an electron to the LUMO state. It represents the energy difference between the
neutral molecular state and the molecular negative ion state. To model this energy difference,
electron affinity is expressed in terms of the summation of the contributions of all the components,
perturber(s) and reaction center(s), in the molecule:
c=l
where the summation is over n, which is defined as the number of subunits that could form a
molecular negative ion or simply as the number of reaction centers in the molecule. (EA)C is the
electron affinity for the reaction center. The electron affinity of the reaction center is assumed to be
unperturbed and independent of P. 5P(AEA)C is a differential quantity that describes the change in
the EA behavior affected by the perturber structure. 5P(AEA)C is factored into mechanistic
contributions as
dp (MA )c = S(A EAfield) + S(A EAres) + S(A EAslg)
where AEAfieid describes the difference in field interactions of P with the two states, AEAres
describes the change in the delocalization of pi electrons of the two states due to P (this
delocalization of n electrons is assumed to be into or out of the reaction center), and AEAs;g
describes the change in sigma induction of P with the two states.
53
-------
3.3.1.1. Field Effects Model
The modeling of the substitents field effects relates to the structural representation S--;Rj— C,
where S~;Rj is the perturber structure, P, appended to the reaction center C, i and j denote the
anchor atoms in R for the substituent S and reaction center C, respectively.
Field effects derive from charges or electrical dipoles in the appended structure P,
interacting with the charges or the dipoles of the reaction center, C, through space. The molecular
conductor, R, acts as a low dielectric conductor. This effect follows from the fact that the bonds
between most atoms are not completely covalent, but possess a partial ionic character that imposes
electrical asymmetry either in the substituent or the reaction center bonds. The field interaction
between S and C depends on the magnitude and the relative orientation of the local fields of S and
C, the dielectric properties of the conduction medium and the distances through the molecular
cavity. The field effect of a given S is given as
field - - j— -
Vcs LJe
where |j,s is the substituent dipole located at point S; 5qc is the change in charge of the reaction
center accompanying the reaction, presumed to be located at point C; 9SC gives the orientation of the
substituent dipole relative to the reaction center; r's are the appropriate distances of separation; and
De gives the effective dielectric constant for the intervening conduction medium.
Field effects are resolved into three independent structural component contributions
representing the change in dipole field strength of S, a conduction factor of R, and the change in the
charge at C as
54
-------
where ap describes the potential of P to "create" an electric field irrespective of C, and peie is the
susceptibility of a given reaction center to electric field effects that describes the change in the
electric field of the reaction center accompanying the reaction. The perturber potential, op, is further
factored into the field strength parameter, Fs, (describing the magnitude of the dipole field
component on the substituent) and a conduction descriptor,
-------
respectively. NB is data-fitted parameter that depends on number of the substituents that are
bonded directly to the reaction center, C. Both NB and x§ are the same as those used for ionization
pKa calculations.
3.3.3. Resonance Model
Resonance involves the delocalization of pi electrons into or out of the reaction center. This
long-range interaction is transmitted through the 71 -bond network. The resonance reactivity
perturbation, pres(AEA), is the differential resonance stabilization of the initial versus final state of
the reaction center. It is a differential quantity, related directly to the extent of electron
delocalization in the neutral state versus the molecular ion state of the reaction center. The source
or sink in P may be the substituents or R-pi units contiguous to the reaction center.
As explained in the estimation of ionization pKa, a surrogate electron donor, CH2", replaces
the reaction center. The distribution of NBMO charge from this surrogate donor is used to quantify
the acceptor potential for the perturber structure, P. The reactivity perturbation is given by:
where (Aq)c is the fraction loss of NBMO charge from the surrogate reaction center, and the
susceptibility, pres, of a given reaction center to resonance quantifies the differential "donor" ability
of the two states of the reaction center relative to the reference donor CH2".
3.3.4. Results and Discussion
In modeling any property in SPARC, the contributions of the structural components C, S,
and R are quantified (parameterized) independently. For example, the strength of a substituent in
creating an electrostatic field effect depends only on the substituent regardless of the C, R, or
56
-------
property of interest. Likewise, the molecular network conductor R is modeled so as to be
independent of the identities of S, C, or the property being estimated. Hence, S and R parameters
for electron affinity are the same as those for pKa. The susceptibility of a reaction center to an
electrostatic effect quantifies only the differential interaction of the initial state versus the final state
with the electrostatic fields. The susceptibility gauges only the reaction CmMai -Cfmai and is
completely independent of both R or S. For instance, for electron affinity the electrostatic
susceptibility reflects the electrostatic perturbations of the LUMO state versus the HOMO state,
which once again is totally independent of C or R. Thus, no modifications in pKa models or extra
parameterization for either S or R are needed to calculate electron affinity from pKa models, other
than inferring the electronegativity and susceptibility of electron affinity reaction centers to
electrostatic and resonance effects.
Figure 9 shows a sample calculation of EA for 4-chloronitrobenzene. SPARC first computes
the resonance and electrostatic perturbations of the appended substituent Cl para to the reaction
center NO2 through the molecular conductor of the benzene ring. Next SPARC computes the
perturbation of the appended substituent NO2 para to the reaction center Cl through the benzene
ring. Finally, SPARC sums these perturbations with the base electron affinities for NO2 and Cl.
The susceptibility of the NO2 and Cl reaction center for resonance and electrostatic effects are
shown in Table 12. The substituent parameters and the distance between NO2 and Cl are obtained
as described for pKa.
Benzene is the progenitor of the other aromatic compounds. The added electron enters the
LUMO state, which in the negative ion becomes the SOMO (Singly Occupied Molecular Orbital).
Since the LUMO energy is still high, a stable negative ion is not formed in the gas phase. The
LUMO energy can be lowered and stabilized either by expansion of the TT conjugation or by
57
-------
introduction of electron-withdrawing substituents. For unsubstituted benzene, naphthalene and
anthracene, the electron affinity is seen to increase significantly due to increase in the resonance
contributions in going from benzene to anthracene. A positive electron affinity (a stable negative
ion) is observed only for anthracene.
Table 12. Reaction center characteristic parameters
Reaction Center
-NO2
-ON
-C=O
in-ring N
-NR2
-OH
-OR
-CH3
-F
-Cl
-Br
Pele
-0.05
-0.03
-0.02
-0.16
0.02
0.11
0.06
0.01
-0.01
-0.02
-0.05
Pres
2.0
1.2
3.2
0.76
-1.01
-1.80
-0.67
-0.14
-0.11
-0.09
-0.07
Xcs
3.28
—
—
—
—
—
—
2.30
4.35
4.15
3.95
(EA)C
0.51
0.29
-0.39
0.20
0.17
0.42
0.24
0.01
0.16
0.25
0.26
Introducing 7r-electron-withdrawing substituents such as cyano, nitro, aldehyde and ketone
strongly perturbs the benzene and raises the electron affinity significantly. The LUMO states for
these substituents are close to the degenerate TT benzene LUMOS. This leads to a strong interaction
between the LUMO states of these substituents and one of the degenerate LUMO states of benzene,
which in turn, lowers and stabilizes the energy of the LUMO states of these molecules. The order
of the resonance stabilizing effect of the LUMO state of these substituents is -C=O > -NO2 > -C=N
> in-ring N (aromatic nitrogen, such as pyridine, quinoline, and acridine). For benzene substituted
by any electron-donating groups like F, Cl, Br, NR.2, OH and OR, the perturbations of the LUMO
state versus the HOMO state are extremely small. Hence, the electron affinity for these molecules
58
-------
will be close to the electron affinity of benzene (-1.2 eV). The same trend was found for all of
these groups when attached to any other aromatic or ethylenic compound. The case for in-ring N is
similar. The substitution of N for one of the carbons in a benzene ring decreases the electron
density in the TT -type SOMO state, stabilizing the LUMO state and increasing the electron affinity
for pyridine by 0.50 eV. The LUMO energy for pyridine is still relatively high, so that its electron
affinity is -0.7 eV and no stable negative ion is formed in the gas phase state. The LUMO energy in
n-aromatic molecules can be lowered and stabilized by expansion of the n conjugation. Similar to
carbon-aromatic systems, the electron affinity is seen to increase significantly in the order of
pyridine, quinoline, and acridine. A positive electron affinity is expected for quinoline and acridine.
E.A(4-chloronitrobenzene) - 5(EA)NO2 + 5(EA)d
Reaction center NO2 Cl
Substituent Cl NO2
Molecular network benzene benzene
EAsub + S sub ^EA field + Ssub kEAres
ris
5(EA)N2 = 0.51 + - ' 2 + (2.0 X 0.236) = 0.962
N°2 (1.3 + 2.0 + 0.65 /
5(EA)l - 0.25 + -- ' 2 + (-0.09 x 0.273) - 0.217
cl (1.3 + 2.0 + 1)
EA4-chloronitrobenzene = 0.962 + 0.217 = 1.18
Figure 9. Sample calculation of 4-chloronitrobenzene EA.
Benzoquinone has a high electron affinity (1.9 e. V.). Expansion of the pi system from
benzoquinone to naphthoquinone to anthraquinone leads to a decrease in the electron affinity. Both
59
-------
Hiickel MO and STO-3G calculations predict a lower LUMO energy for benzoquinone relative to
naphthoquinone. This agrees with experimentally measured electron affinity and the order is
opposite to that observed for benzene, naphthalene and anthracene. Introduction of electron
withdrawing groups increases the electron affinity of benzoquinone compared to benzene. On the
other hand, methyl or alkyl group substitution leads to a decrease in the electron affinity.
Cyclic, unsaturated dicarbonyls such as maleic anhydrides, maleimides and cyclo-
pentenedione form a long-lived negative ion in the gas phase. Similar to benzo-, naphthoquinone,
the extra electron in these systems enters the LUMO, which is a TT orbital resulting from a
combination of TT c.c and TT co. Their LUMO energies are higher, which leads to lower EA for these
compounds relative to the quinones. Thus, the electron affinity of maleic anhydride is lower than
that for benzoquinone. The electron affinity of the oxy compounds is larger than that of the NH and
CH2 bridged structures. The EA decreases as the electronegativity of the bridging atom decreases,
i.e. in the order of OH, Ntfe, and CH2. Substitution of a methyl group destabilizes the LUMO's of
the quinones and the anhydrides. The substitution of electron-withdrawing substituents leads to
significant increases in electron affinity.
The SPARC-calculated electron affinity for alkene compounds is close to -2.0 eV, the same
as the electron affinity for ethylene. Methyl substitution effects on electron affinity are small; in
general, electron affinity will decrease by almost 0.04 eV per substituent. 1,2-Dicyanoethylene and
tetracyanoethylene (TCNE) are compounds of high electron affinity that are often involved as
electron acceptors in charge-transfer complexes. The additional electron in the negative ion enters
the LUMO, which is the TT* orbital of ethylene lowered by conjugation with the electron-
withdrawing cyano groups.
60
-------
p-Quinodimethane is expected to have a negative electron affinity similar to benzene.
Cyano substitution lowers the LUMO state substantially and increases the electron affinity, hence,
tetracyanoquinodimethane (TCNQ) electron affinity is as high as 3 eV. Fluoro substitution in
TCNQ will lower the LUMO state even more resulting in higher electron affinity, e.g., tetrafluoro-
tetracyanoquinodimethane at 3.2 eV. Pyrrole and furan both have a large negative electron affinity.
The methoxy and the amine groups raise the LUMO state thereby lowering the electron affinity for
these compounds more than the ethylene electron affinity. Figure 10 shows SPARC-calculated
versus observed electron affinities. The RMS deviation was found to be 0.14 eV, which is about
equal to the measurement error in charge transfer experiments [17].
-i -
o
.]
-2
-3
-3-2-1 O I 2 3 4
Observed
Figure 10. SPARC-calculated versus observed gas phase electron affinities in eV. The RMS
deviation for was 0.14 eV.
3.3.5.
Conclusion
The SPARC electron affinity models have been tested on all of the reliable (charge transfer
equilibria) data available in the literature. The models are fully implemented and are executing in
code. These models have been tested to the maximum extent possible given the limited set of direct
observations.
61
-------
3.4. Estimation of Ester Carboxylic Acid Hydrolysis Rate Constants
3.4.1. Introduction
Hydrolysis of organic chemicals is a transformation process in which an compound, RX,
reacts with water, forming a new carbon-oxygen bond and the cleaving of a carbon-X bond in the
original molecule:
R-X 2 » R-OH + X" + H+
Hydrolysis is one of the most important reactions of organic molecules in aqueous environments,
and is a significant environmental fate process for many organic chemicals. It is actually not one
reaction but a family of reactions involving compound types as diverse as alkyl halides, carboxylic
acid esters, phosphate esters, carbamates, epoxides, nitriles, etc. This study seeks to apply SPARC
chemical reactivity models to estimate hydrolysis rate constants for carboxylic acid esters strictly
from molecular structure. In the near future, these same models will be used to predict hydrolysis
rate constants for other groups such as alkyl halides and phosphate esters.
The general structure for carboxylic acid esters is represented by RiC(=O)OR2, where RI
and R2 are organic substituents. These R substituents can be substituted alkyl chains, phenyl groups
or heteroatoms. Carboxylic acid esters are used industrially to make flavors, soaps, herbicides,
pesticides, etc. Carboxylic acid esters undergo hydrolysis through three different mechanisms;
base, acid and general base-catalyzed ester hydrolysis.
3.4.1.1. Base-Catalyzed Hydrolysis
The base-catalyzed or alkaline hydrolysis of esters generally takes place via a BAC2
mechanism as shown below. BAc2 stands for base-catalyzed, acyl-oxygen fission and bimolecular
reaction. It is similar to the SN2 reaction, and occurs when the hydroxide ion attacks the carbonyl
62
-------
carbon of an ester to give the carboxylic acid and alcohol. In addition, alkaline hydrolysis of esters
may also occur through other mechanisms, such as BACI (base-catalyzed, acyl-oxygen fission,
unimolecular), BALI (base-catalyzed, alkyl-oxygen fission, unimolecular) and BAL2 (base-catalyzed,
alkyl-oxygen fission, bimolecular). However, BAC2 is the most common mechanism for alkaline
hydrolysis of esters and it usually masks all the other plausible mechanisms.
o
l RI ^ RK /°
J H20 'v / H20 \ / H20
C//,O - CH ^ RjCOOH + R2OH + OH"
3.4.1.2. Acid-Catalyzed Hydrolysis
The acid catalyzed hydrolysis of esters takes place via AAc2 mechanism as shown below.
AAC2 stands for acid-catalyzed, acyl-oxygen fission and bimolecular reaction. It is also similar to
the SN2 reaction. It occurs when a positive hydrogen ion catalyzes the ester and the water molecule
attacks the carbonyl carbon of the ester to give the carboxylic acid and alcohol. In addition, the
acid-catalyzed hydrolysis of esters may also take place by other mechanisms, such as AAci (acid-
catalyzed, acyl-oxygen fission, unimolecular), AALI (acid-catalyzed, alkyl-oxygen fission,
unimolecular) and AAL2 (acid-catalyzed, alkyl-oxygen fission, bimolecular). However, AAc2 is the
general mechanism for acid-catalyzed hydrolysis of esters and it usually masks all the other possible
mechanisms.
CM
HI
OH u OH
O H+ / / I
•• H H20 /^/ H?0 | ,^ R20
CTP—O_ + >.R,COOH + R9OH
OR2 OR2
\ M dC
°R2 H WOR5
63
-------
3.4.1.3. General Base-Catalyzed Hydrolysis
The general base-catalyzed hydrolysis of esters takes place via BAc2 mechanism as shown
below. BAC2 stands for base-catalyzed, acyl-oxygen fission, bimolecular reaction. It is too similar
to the SN2 reaction. It occurs when a base (B:) abstracts a hydrogen atom from a water molecule
thereby releasing a hydroxide ion that eventually attacks the carbonyl carbon of the ester to yield the
carboxylic acid and alcohol. The base, B:, stands for any base, such as ammonia, acetate ion,
imidazole and so on. In case of neutral hydrolysis, B: represents the water molecule.
D P 8
C^ H2° I H20 ^ || BH
R ^T ^OR - ^ | ^O - ^C. , RjCOOH + R2OH + B:
1 J&3 2 Rl I OR2 «S N£V
H H (^
^ V-OR2"
3.4.2. SPARC Modeling Approach
Reaction kinetics were quantitatively modeled within the chemical equilibrium framework
described previously for ionization pKa in water. It was assumed that a reaction rate constant could
be described in terms of a pseudo-equilibrium constant between the reactant (initial) and transition
(final) states. SPARC therefore follows the modeling approach described previously for ionization
pKa, in water. For these molecules/chemicals to be modeled, reaction centers with known intrinsic
reactivity are identified and the reaction rate constants expressed (ion energy terms) by perturbation
theory as
64
-------
where log k is the log of the rate constant of interest; log kc is the log of the intrinsic rate constant of
the reaction center and Ap log kc denotes the perturbation of the log rate constant due to appended
structure. These rate constants are expressed in the appropriate second order form inclusive of
catalytic effects. A given reaction center may have two or more appendages or perturbing units.
For example, a carboxylic acid ester has two appendages and a phosphate ester three. With the
exception of steric effects, the perturbations associated with these appendages are modeled
independently and simply summed. Each perturbation is factored into the mechanistic components
of resonance, steric and electrostatic effects. All that is required for the perturbation calculations are
data-fitted susceptibilities of the log k rates to these mechanisms, and knowledge of the electrostatic
change at the reaction center (monopole, dipole, etc.) associated with the formation of the transition
state to determine the drop off (1/r2) of electrostatic interactions. The latter can be inferred from
knowledge of the reaction mechanism or can be determined by an optimized data fit. The log kc for
the reaction center can be either measured directly or can be data-fitted.
3.4.3. Hydrolysis Computational Models
The computational model for hydrolysis of carboxylic acid esters is categorized into three
sub models, namely, reference rate, internal perturbation and external perturbation models. The
reference rate model calculates the hydrolysis rate constant for the smallest ester compound, which
excludes internal perturbation and steric effects. The internal perturbation model calculates the
perturbation of the reference hydrolysis rate constant due to the internal perturbation interactions
between the reaction center and the substituents. The internal perturbation interactions include
steric, resonance and electrostatic effects. Finally, the external perturbation model calculates the
solvation contributions to the hydrolysis rate constant due to solute-solvent interactions. The
65
-------
hydrolysis rate constant contributions from these three models are then added to give the total
calculated hydrolysis rate constant as
log& , = logk + S^logk + SFP\ogk
O Hydrolysis O c *-^^Oc £P O c
where log kc describes the hydrolysis behavior of the reaction center "reference rate", in this study
the hydrolysis rate constant for an unperturbed methyl formate. 8n> log kc is the change in
hydrolysis behavior brought about by the perturber structure. SPARC computes the internal
reactivity perturbations, 5n> log kc, that is then used to "correct" the hydrolysis behavior of the
reaction center for the compound in question in terms of the potential "mechanisms" for the
interaction of P and C. The last term describes the external perturbation of the solvent with both the
initial and the final state. Specifically, 5Ep log kc describes the change in the solvation of the initial
state versus the transition state due to H-bond and field stabilization effects of the solvent.
3.4.3.1 Reference Rate Model
The reference rate, log kc, is the hydrolysis rate constant for the smallest ester compound,
that resembles the structure of reaction center C(=O)O. The reference rate is free of any internal
perturbation interactions, such as resonance and electrostatic effects. Neither does it show steric
effects. However, it is dependent upon the temperature. As the temperature increases, the reference
hydrolysis rate increases. SPARC expresses the reference rate, log kc, as function of the
temperature and enthalpic and entropic contributions as
log kc = A + logTk + Refi + Ref2/Tk
where A is the log of the pre-exponential factor, Tk is the temperature in Kelvin, Refi is the entropic
contribution to the rate and Ref2 is the enthalpic contribution. The AI Refi and Ref2 are data-fitted
parameters that are assumed to be the same for all molecules, solvents and temperatures.
-------
3.4.3.2 Internal Perturbation Models
Like all chemical reactivity properties addressed in SPARC, molecular structures are broken
into functional units called reaction center and the perturber. The reaction center, C, is the smallest
subunit that has the potential to hydrolyze. The perturber, P, is the molecular structure appended to
the reaction center, C. The perturber structure is assumed to be unchanged in the hydrolysis
reaction. The reference hydrolysis rate of the reaction center is adjusted for the molecule in
question using the mechanistic perturbation models described below. The perturbation of the log of
hydrolysis rate for a molecule of interest is expressed in terms of mechanistic perturbations as :
where the 5eie logkc and 5res logkc terms describe the electrostatic and resonance perturbations of the
initial and the transition state (activation energy), respectively, caused by molecular substituents (P).
Electrostatic interactions are derived from local dipoles or charges in P interacting with charges or
dipoles in C. 5res logkc describes the change in the delocalization of TT electrons of the two states due
to P. This delocalization of TT electrons is assumed to be into or out of the reaction center. 5stericlogkc
describes the change in the steric blockage of solvent access to C of the initial state versus the
transition state.
As we described in detail earlier, the modeling of the perturber effects for chemical
reactivity relates to the structural representation S-R-C, where S-R is the perturber structure, P,
appended to the reaction center, C. S denotes a substituent group that "instigates" the perturbation.
For electrostatic effects, S contains (or can induce) electric fields; for resonance, S donates/receives
electrons to/from the reaction center. R links the substituent (S) and reaction center (C) and serves
as a conductor of the perturbation (i.e., "conducts" resonant n electrons or electric fields).
67
-------
3.4.3.2. 1 . Electrostatic Effect Models
The electrostatic effect is a phenomenon of interaction of dipoles or charges of the
substituent (S) with the dipoles or charges of the reaction center (C). The types of electrostatic
effects that can occur between the substituent (S) and the reaction center (C) are direct field,
mesomeric field , R-TT, and sigma induction effects.
3.4.3.2.1.1. Direct Field Effect Model
The direct field effect occurs when the electrical dipole/charge of the substituent (S)
interacts with the dipole/charge of the reaction center (C) through space. Since the transition states
for base and general base-catalyzed hydrolysis of esters are negatively charged, a dipole will
increase the hydrolysis rate constant for these reactions. In contrast, the transition state for acid
hydrolysis of esters is positively charged and a dipole will decrease the hydrolysis rate constant in
this situation. Field effects are resolved into three independent structural component contributions
representing the change in dipole field strength of S, a conduction factor of R, and the change in the
charge at C as:
where, 5eie log Kc is the direct electrostatic effect due to the interaction between the dipole of the
perturber and the reaction center. peiec is the susceptibility of the reaction center to electrostatic
effects and is presumed to be independent of the perturber. ap is the field strength that the perturber
exerts on the reaction center, which can be calculated as shown previously for ionization pKa. The
perturber potential, op, is further factored into a field strength parameter, Fs, (describing the
magnitude of the dipole field for the substituent) and a conduction descriptor, acs, of the intervening
-------
molecular network for the electrostatic interactions. The effective dielectric constant, De, for the
molecular cavity, any polarization of the anchor atom affected by S, and any unit conversion factors
are included in the field strength parameter, Fs. The distances among the various components and
orientation angle are calculated as described earlier in this report.
3.4.3.2.1.2. Mesomeric Field Effect Model
A mesomeric field (MF) is generated when an electron-withdrawing or donating group
induces charges on the conductor R. An electron-withdrawing group creates positive charges on the
conductor, while an electron-donating group creates negative charges. Since the transition states of
base and general base-catalyzed hydrolyses are negatively charged, the electron-withdrawing
groups will increase the hydrolysis rate constant because the induced positive charges on the
conductor will stabilize the negative charges of the reaction center. On the other hand, induced
negative charges will have an opposite effect. The MF effect due to interaction between either the
electron withdrawing or donating group with the reaction center is given as.
£M>g£ = peleMFT. qik/rtc
where SMF log kc is the mesomeric field effect due to the interaction between the substituent induced
charges on the molecular conductor and the reaction center. peie is the susceptibility of the reaction
center, which is assumed to be independent of the substituent. MF is the mesomeric field constant
characteristic of the substituent. It describes the ability of the substituent to induce charges.
Substituent MF values have been calculated for ionization pKa and were shown in Table 3. q^ is the
charge induced at each atom k on R, and is calculated using PMO theory, rkc is the through-cavity
distance between the induced charge on atom k and the reaction center.
69
-------
3.4.3.2.1.3. Sigma Induction Effect Model
Sigma induction occurs due to the difference in electronegativity between the reaction center
and the substituents. For base and general base-catalyzed hydrolyses, the reaction center has a large
electronegativity and methyl substituents, for example, will move charge or electrons into the
reaction center and decrease the hydrolysis rate constant. The acid-catalyst hydrolysis reaction
center, on the other hand, is less electronegative and the induced perturbations are always quite
small. The sigma induction is a short range effect. We have calculated these effects up to two
atoms from the reaction center and consider effects further a way to be negligible. The sigma
induction due to an electronegativity difference between the reaction center and the substituent is
calculated using the following equation.
§Sigma log kc = pdec Z (x,-Xc}NB
where peiec is the susceptibility of the reaction center to electrostatic. Xc and x§ are the
electronegativity of the reaction center and the substituents, respectively. NB is data-fitted
parameter that depends on number of the substituents that are bonded directly to the reaction center,
C. Both NB and x§ values are the same as those used for ionization pKa calculations.
3.4.3.2.1.4. Rn Effect Model
The Rn effect is similar to sigma induction, except it involves 7r-electrons instead of a-
electrons. The magnitude of the reactivity perturbation, 5n log kc, depends upon the difference in
the electronegativity of the atom of the TT group and that of the reaction center to which it is
attached. Since the differential induction capability is highly correlated with peie, SPARC uses a
70
-------
simple model to quantify the effect, requiring a minimum computation and only one extra
parameter:
S^ogkc = peteax
where an is a data-fitted parameter. The reaction center is classified as a C+ group (at the carbon of
the carbonyl) that withdraws electrons and a C- group (at the acyl oxygen) that donates electrons to
a reference point. If the 7r-system is attached to the C+ group, the Rn effect contributes negatively or
lowers the hydrolysis rate constant. In contrast, if the TT-system is attached to the C- group, the Rn
increases the hydrolysis rate constant.
3.4.3.2.2. Resonance Effect Model
Resonance is a phenomenon of TT-electrons moving in or out of the reaction center.
Resonance stabilization energy in SPARC is a differential quantity, related directly to the extent of
electron delocalization in the initial state versus the transition state of the reaction center. The
source or sink in P may be substituents or R-TT units contiguous to the reaction center. Substituents
that withdraw electrons from a reference point, e.g., CIHk", are designated S+ and those that donate
electrons are designated S-. The R-TT units withdraw or donate electrons or may serve as
"conductors" of 7r-electrons between resonance units. Reaction centers are likewise classified as C+
(carbonyl carbon) and C- (acyl oxygen) denoting the withdrawing and donating of electrons,
respectively. The distribution of NBMO charge from a surrogate donor, CIHk", is used to quantify
the acceptor potential for the perturber structure, P. The resonance perturbation is given by:
where pres is the susceptibility of the reaction center to resonance interactions. That is, quantifies
"donor" ability of the two states of C relative to CH2" . Aqc is the fraction loss of NBMO
71
-------
(nonbonding molecular orbital) charge from the surrogate reaction center calculated based on PMO
theory. The reaction center has two different press one is for the oxygen of the ester when it is
attached to 7r-networks and the other is for the carbonyl when it is attached to 7r-networks.
Resonance plays two important roles in ester hydrolysis. The major impact is that of resonance
stabilization of the leaving group. Thus, Tt-networks attached to the oxygen of the ester reaction
center have a pronounced effect and greatly increase the hydrolysis rate. rc-networks attached to the
carbonyl tends to destabilize the leaving group and reduce the hydrolysis rate.
3.4.3.2.3. Steric Effect Model
The normal trend for steric effect is that as the bulkiness of a substituent increase, the steric
effect increases. Thus, the steric effect always decreases the hydrolysis rate constant. Comparing
the steric effect on hydrolysis rate constant in various solvents, we observe the trend of lesser steric
effect in pure water than in other mixed solvents. The reason for this trend is that pure water
solvates the substituents more and aligns the structure of the esters in suitable position for the
attacking hydroxide ion or water molecule. On the other hand, the mixed aqueous solvents only
partially solvate the substituents and thus deform the structure of esters, creating a hindrance to
attack from the hydroxide ion or water molecule. Therefore, the reaction does not proceed at a
normal rate and the hydrolysis rate constant decreases. Steric effects will include blockage of
reactant access and strain in achieving the transition state. SPARC expresses the steric
contributions as:
f-\
LJ e
-------
where Vs is the sum of the appendage sizes, Vthresh is a threshold size for onset of steric effects, and
Vex is an excluded (cavity) volume between pairs of appendages (zero for formates, 2 vex for
acetates, 3 vex for trialkyl phosphates). psteric is the steric susceptibility, De is the dielectric constant
for the solvent and Tk is the reaction temperature.
3.4.3.3. External Perturbation Models
3.4.3.3.1. Solvation Effect Model
Solvation effects for ester hydrolysis include both hydrogen bonding and field stabilization
effects. Hydrogen bonding gauges the hydrogen acceptor effect (alpha) and hydrogen donor effect
(beta) of the ester, while the field stabilization interaction describes the effect of dielectric constant
of the solvent on the hydrolysis rate constant.
3.4.3.3.1.1. Hydrogen Bonding
Hydrogen bonding is a direct site coupling of a proton-donating site of one molecule with a proton-
accepting site of another molecule. The H-bond energy is resolved into a proton-donating site, a,
and proton-accepting site, |3, which in the SPARC models are presumed to be independently
quantifiable. If the transition state is more solvated or stabilized by the hydrogen bonding than the
initial state, the hydrolyses rate constant increases. The negatively charged transition states of base
and general base catalyzed hydrolysis are strongly solvated or stabilized by solvent alphas, while
the betas play a minor rule. Thus, one might conclude that strong alpha solvent should increase the
base-catalyzed hydrolysis rate constant. However, the strong alpha solvents not only solvate the
transition states, but also solvate the attacking hydroxide ion. This tends to stabilize the initial state
more than the transition state as shown in the Figure 11. Therefore, strong alpha solvents actually
73
-------
tend to decrease the base-catalyzed hydrolysis rate constants. On the other hand, the strong beta
solvents interact with the alpha sites freeing-up the hydroxide ions to react with the esters and
tending to increase the base-catalyzed hydrolysis rate constant. For acid-catalyzed hydrolysis, both
the alpha and beta sites stabilize the initial state more than the transition state. Therefore, hydrogen
bonding decreases the acid-catalyzed hydrolysis rate constant. Furthermore, the alpha and beta
impacts on the hydrolysis rate constant in various mixed solvents depend on the relative amount of
alpha and beta sites available in those solvents. Water has an almost equal strength of alpha and
beta, whereas mixed solvents generally have weaker alpha and stronger beta. Consequently, the
alpha contribution from pure water to hydrolysis rate constant in base and general base catalyzed
hydrolysis should be lower or more negative than from the mixed solvents. The beta contribution,
on the other hand, should be higher in mixed solvents than in pure water. SPARC expresses alpha
and beta H-bond contribution as:
/ 1 T^1 T7~ 1 \ O
pA a (I — rv VOL ) pB p
alpha = beta =
7\ 7\
where alpha (beta) is the hydrogen accepter (donor) effect of the solute ester, a (P) is the hydrogen
donating (accepting) value of the solvent. PA, and PB are the susceptibility for a and |3 of the
solvent, respectively. These are data-fitted parameters. Fvis another data-fitted parameter for the
volume of alpha of the solvent. Vol is the volume of the solvent and Tk is the temperature in
Kelvin. Both a and |3 of the solvent are calculated as pseudo pKa's, with the electrostatic
component treated as a dipole transition
3.4.3.3.1.2. Field Stabilization Effect
The field stabilization effect is given as
74
-------
S PS log kc =
PFS
Tk (De + Damp )
where PFS is the susceptibility of the solvent to solvation due to the dielectric constant of the solvent
and is a data-fitted parameter. De is the dielectric constant of the solvent and Damp is a damping
adjusting factor. The dielectric constant impacts both the solvation of initial and transition states of
the reactants in the hydrolysis reaction. Hence, dielectricity of the solvent solvates or stabilizes the
initial state more than the transition state. Thus, the field stabilization effect decreases the
hydrolysis rate constant. Comparing the dielectricity effect on hydrolysis rate constant in different
mixed solvents, we observe that the dielectricity or field stabilization effect reduces the rate constant
less in pure water than in mixed organic aqueous solvents. The reason for this phenomenon is that
the high dielectricity of pure water stabilizes the transition state more than mixed aqueous organic
solvents which have less overall dielectricity.
Energy
Transition state
Initial state
vl/ /
E. e acti on c o or din ate
Figure 11. The effect of alpha sites on the initial and transition states. Alpha sites tremendously
solvate the hydroxide ion and stabilize the initial state more than the transition state. As a result,
alpha sites decrease the hydrolysis rate constant.
75
-------
3.4.3.3. Temperature Effect
Steric, hydrogen bonding and field stabilization effects decrease with increasing
temperature, and the rate of hydrolysis of organic compounds increases with temperature. The
quantitative relationship between the hydrolysis rate constant and temperature is frequently
expressed by the Arrhenius equation as
k = A e
where k is the hydrolysis rate constant, A is the pre-exponential factor or the frequency factor, Ea is
the activation energy (the minimum energy required to form a product from the reactants), R is the
gas constant and T is the absolute temperature. As shown previously, temperature dependence was
also incorporated in to the field stabilization, alpl^eta and steric effect relationships. For example,
recall that, as the temperature increases, the steric effect decreases. The reason for this is that as the
temperature increases the reactants tend to mimic gas phase structure, producing minimal or null
steric effect for hydrolysis of esters.
3.4.3.4. Results and Discussions
For any property estimated by SPARC, the contributions of the structural components C
(reaction center), S (substituent), and R (molecular conductor) are quantified independently. For
example, the strength of a substituent (S) in creating an electrostatic field effect is assumed to
depend only on the substituent regardless of the C, R, or property of interest. Likewise, R is
modeled so as to be independent of the identities of S, C, or the property being estimated. Hence, S
and R parameters for hydrolysis rate are the same as those for ionization pKa in water. The
susceptibility of a C to an electrostatic effect quantifies only the differential interaction of the initial
76
-------
state versus the final state with the electrostatic fields. The susceptibility gauges only the reaction
Cinitiai -C transition state and is completely independent of both R or S. Thus, no modifications in any of
the pKa models, or extra parameterization for either substituent (S) or the molecular conductor (R)
are needed to calculate hydrolysis rate constant using the ionization pKa models in water, other than
inferring the electronegativity and susceptibilities of the carboxylic acid ester hydrolysis rate
constant to electrostatic, resonance, steric and solvation effects. A sample calculation for the acid
catalyzed hydrolysis rate constant for p-nitrophenyl acetate in water at 25° C is shown in Figure 12.
Figures 13-15 show SP ARC-calculated versus observed values for hydrolysis rate constants of
esters undergoing base, acid and general base catalyzed hydrolysis, respectively in six different
solvents and various temperatures, respectively. These sets represent 321, 416 and 50 unique esters
in base, acid and general base catalyzed hydrolysis of esters respectively. Because several of the
esters were measured under different conditions (solvents, temperatures, etc) there were 653, 667
and 150 base, acid and general base catalyzed calculations performed. The RMS deviation of the
SPARC-calculated versus measured carboxylic ester hydrolysis rate constant values for these three
hydrolysis mechanisms were 0.37, 0.37 and 0.39 log units, respectively.
77
-------
Reaction Center peie pres Xc Const A Refi Ref2 pFs
Acid Hydrolysis -0.87 -0.4,1.11 1.8 1.7 3.36 3.886 -3070.8 -0.051* 107
Base Hydrolysis 4.5 -0.8,3.9 2.55 1.7 3.36 -3.522 -1443.4 0.18 * 107
GBase Hydrolysis 5.2 -0.6,2.6 2.65 1.7 3.36 -5.249 -3479.7 -0.15 * 107
logkc= 3.36±+ log (298. 15) + 3.886
Pelec
s loa lr 0 87
/
/
NB
C 1 y-^rr 1r
* ;°2/
(2.3-1.8) 7.46x1
/ (1.3\-1)2 ' (1.3 + 2 + 1)3
/ \ \
/ \ \
TIC Xs I
Sigma Field
PFS
-O.OSlxio7
298.15(78.54+32.6)
Log k hydrolysis =
De(H2O) Damp
Field Stabilization
+ (-3070.8/298.15) = -0.59
^cos9cs NBMO Charges Constan
( 0.078 0.078 3x 0.078 ^
i n nn° i °~ i ° si s " i 7
x |^ (1.73+1+1.3) (1.3 + 1) (1. 3 + 2 + 1) J /
\ \ / /
\ /MF
is OJE r!3 r!4 JVLTNO2
J^ Mesomeric
psteric V's pa a Vol pp
PSteric Pa Pp Damp
-310935 -1821 944.6 32.6
-328642 4205 -2119.2 32.6
-114969 -373 5598.3 32.6
t pres
0 78 xl 1 1 0 07^
Resonance
P
-310935x0.0687 -1821x0.384(1-0x0.07) 944.6x0.382 ^ ^
298(78.54 + 32.6) 298
Steric Hydrogen bonding
298
-0.59 + (-0.075) + (-3.33) = 3.995 (Observed is 3.9)
Figures 12. Sample calculations of hydrolysis rate constant acid catalyzed media for p-nitrophenyl acetate in water at 25° C. Only the reaction center parameters are
trained on hydrolysis rate constant (in bold and they are showing at the top of the Figure). The substituent, molecular conductor parameters and distances
between the various components (indicated by lines) are the same as of ionization pKa and they are shown in Table 2 and 3, see text. The a and P are
for the solvent, water, and they are calibrated using SPARC physical process models. See next section.
78
-------
"O
0)
_o
ra
9
o
o;
a.
V)
Figure 13. SPARC-calculated versus observed log hydrolysis rate constants for alkaline hydrolysis
in six different solvents, respectively. The RMS deviation error is 0.37 and R2is 0.96.
•a
a>
(0
O
6
a:
<
a.
(O
0
-2
-4
-6
-8
-10
-10
-8
-6
-4
-2
0
Observed (M'V1)
Figure 14. SPARC-calculated versus observed log hydrolysis rate constants for acid hydrolysis in
six different solvents and at different temperatures. The RMS deviation is 0.37 and R2is 0.97.
79
-------
_re
3
_O
re
O
6
-2
-7
-12
-12
-7 -2
Observed (M~1s~1)
Figure 15. SPARC-calculated versus observed log hydrolysis rate constants for general alkaline
base hydrolysis in six different solvents and at different temperatures. The RMS deviation is 0.39
and R2 is 0.97.
Table. 13 Statistical Parameters of SPARC Hydrolysis Rate Constant (M'V1)
Solvent
Water
Acetone/Water
Ethanol/Water
Methanol/Water
Dioxnae/Water
Aceteonitrile/Water
Total Molecules
Base
No RMS R2
142
143
105
150
90
24
654
0.39 0.98
0.34 0.83
0.29 0.83
0.36 0.78
0.47 0.75
0.3 0.97
0.37 0.96
Acid
No RMS R2
383
208
39
22
15
N/A
667
0.36 0.98
0.33 0.96
0.17 0.98
0.22 0 .95
0.16 0.87
0.37 0.97
Gbase
No RMS R2
51
73
9
N/A
17
N/A
150
0.34 0.98
0.36 0.96
0.1 0.99
0.47 0.67
0.39 0.97
3.4.5.
Conclusion
SPARC's reactivity models, used to calculate both ionization pKa and electron affinity,
have been successfully extended to calculate hydrolysis rate constants for carboxylic acid esters.
These models have been tested to the maximum extent possible as function of temperature and for
single and mixed solvent systems on all the reliable data available in the literature. Further
extension of these reactivity models is currently under development to calculate hydrolysis rate
constants for phosphate ester compounds.
80
-------
4. PHYSICAL PROPERTIES
4.1. Estimation of Physical Properties
In SPARC, all the physical property estimations derive from a common set of core models
describing intra/intermolecular interactions, and require as user inputs molecular structure (solute
and solvent(s)) and reaction conditions of interest (temperature, pressure, etc.). SPARC solvation
models are described in this section. Results of these solvation models in estimating solute
activities and 'activity based' properties (solubilities, vapor pressures, distribution coefficients) are
given for a wide range of solutes and solvents. A prototypical set of solutes and solvents has been
selected that covers a wide range of interaction forces both in type and strength. Model extensions
to more complex molecules are described along with some calculated properties. Any chemical
model should be understood in terms of the purpose for which it is conceived and its prescribed
usage. The models described herein are intended for what might be characterized as engineering
applications in environmental assessments; the target user has minimal chemistry/computer skills;
the target computer a standard pc. The process modeling goal is to optimize physical and chemical
integrity yet achieve both the requisite range of prediction capability (physical and chemical
processes, 'environmental' conditions, and molecular structures) and 'accessibility' for the target
audience.
Intermolecular interactions are expressed as a summation over all the interaction forces
between molecules (i.e., dispersion, induction, dipole-dipole and hydrogen bonding). Each of
these interaction energies is expressed in terms of a limited set of molecular-level descriptors
(density-based volume, molecular polarizability, molecular dipole, and hydrogen bonding
parameters) that, in turn, are calculated strictly from molecular structure.
81
-------
4.2. Physical Properties Computational Approach
For all physical processes (e.g., vapor pressure, boiling point, activity coefficient, solubility,
partition coefficients, GC/LC chromatographic retention times, diffusion coefficients in air/water,
etc.), SPARC uses one master equation to calculate characteristic process parameters:
process = ^^Interaction Other
where AGinteraction describes the change in the intermolecular interactions accompanying the process
in question. For example, in liquid to gas vaporization, AGinteraction describes the difference in the
intermolecular interactions in the gaseous versus the liquid phase. The intermolecular interaction
forces between the molecules are assumed to be additive. The AGother lumps all non-interaction
components, such as excess entropy changes associated with mixing or expansion, and changes in
internal (vibrational, rotational) energies. At the present time, the intermolecular interactions in the
liquid phase are modeled explicitly, interactions in the gas phase are ignored, and molecular
interactions in the crystalline phase are extrapolated from the subcooled liquid state using the
melting point. The 'non-interaction' entropy components are process specific and will be described
later. The intermolecular interactions in the liquid phase are expressed as a summation over all the
mechanistic components:
^Interaction = ^Dispersion + ^^ Induction + ^^Dipole-dipole +
Each of these interaction mechanisms is expressed in terms of a limited set of pure component
descriptors (liquid density-based volume, molecular polarizability, microscopic bond dipole, and
hydrogen bonding parameters), which in turn are calculated strictly from molecular structure.
82
-------
4.3. SPARC Molecular Descriptors
The computational approach for molecular-level descriptors is constitutive with the molecule
in question being broken at each essential single bond and the property of interest being expressed
as a linear combination of fragment contributions as
Z°(molecule) = X(z°-A,)
where v° are intrinsic fragment contributions (which in most cases are tabulated in SPARC
A; j
databases) and A; are adjustments relating to steric or electrometric perturbations from contiguous
structural elements for the molecule in question. In some instances, the x° (molecule) is further
adjusted for a specific process model or medium involved. Both y° and A; are empirically trained,
either on direct measurements of the descriptor in question (e.g., liquid density based molecular
volume) or on a directly related property (e.g., index of refraction, which can be related to
polarizability) for which large reliable data sets exist. This partition of molecular descriptors into
intrinsic fragment contributions enables one to construct, for any given molecule-of-interest,
essentially any molecular array of appended units, and thereby to estimate the descriptors of
interest for any molecular structure.
4.3.1. Average Molecular Polarizability
Molecules are composed of positively charged nuclei and negatively charged electrons.
When molecules are subjected to an electric field, the electrons are attracted toward the positive
plate and the positive nuclei are displaced from their ordinary position toward the negative plate.
The result is an electric distortion or polarization of the molecules producing electric dipoles. As
83
-------
mentioned previously, molecular structures are broken at each essential single bond with known
intrinsic atomic polarizability. The molecular polarizability of any molecule-of-interest is
calculated as the linear combination of all the fragment polarizabilities, which in turn are estimated
from intrinsic atomic polarizability contributions, Xj,. The polarizability of fragment i is expressed
as
where the summation is over all the atoms in fragment i, Xj is the intrinsic atomic hybrid
polarizability contribution, and N; is the number of electrons in fragment i. The Xj are empirically
determined from measured polarizabilities and stored in the SPARC database (with the exception
of hydrogen, which is calculated from the measured polarizability of H2). The average molecular
polarizability, a° , is calculated as the sum over all i fragment:
where a, is the polarizability of fragment i and A; are adjustments for the molecule in question.
The only adjustment, A;, currently implemented in SPARC is a 10% reduction in a: for
hydrocarbon fragments with an attached polar group or atom. The partition of polarizability into
atomic contributions enables estimates to be made of molecular polarizabilities for any given
molecular structure. The molecular polarizability can be calculated to better than 1% of measured
values for a wide range of organic molecules.
4.3.1.1. Refractive Index
Many physical properties depend upon polarizability; the most familiar is the refraction of
light. The passage of a light wave is accompanied by an oscillating electric field at right angles to
84
-------
the direction of the light propagation producing a corresponding oscillating dipole in nearby
molecules. This interaction reduces the velocity of propagation of the light wave, which is to say
that the refractive index, n, of the material medium is greater than 1. Index of refraction is thus a
good way to check the polarizability density for a molecule. The molecular polarizability and
volume can be related to the index of refraction using the Lorentz-Lorenz equation. For our units
of cm3/mole for volume and A3/molecule for polarizability, the Lorentz-Lorenz equation can be
written as
n2 -I _ 4x(Q.6Q23P)
n2+2~ W
where n is the index of refraction, P is the molecular polarizability and V is the molecular volume.
The refractive index calculator was trained on 325 non-polar and polar organic compounds
at 25° C then validated on 578 organic compounds at 25° C as shown in Figure 16. The statistical
performance for the SPARC refractive index calculator is shown in Table 14. See reference 23 for
sample hand calculations.
0)
Is
3
_o
re
O
O
V)
1.32 1.37 1.42 1.47
O bserved
1.52
Figure 16. SPARC-calculated versus observed refractive index at 25° C. The RMS (Root Mean
Square) deviation error was 0.007 and R2 was 0.997
85
-------
Table 14. SPARC Physical/Chemical Properties Statistical Parameters
Property
Refractive Index
Volume
Vapor Pressure
Boiling Point
Heat
of Vaporization
Diffusion
Coefficient in Air
Activity Coefficient
Solubility
Distribution
Coefficient
Henry's Constant
GC Retention Time2
LC Retention Time
GaspKa
Non-aqueous pKa
pKa in water
Electron Affinity
Ester Carboxylic
Hydrolysis Rate
Tautomer Constant
Hydration Constant
Ey2 Chemical
Reduction
Units
g/cm3
logatm
°C
Kcal/mole
cm2/s
log MF3
logMF
N/A
Kovtas
N/A
Kcal
Kcal
Kcal/1.36
e.V.
M-V1
Kcal/1.36
Kcal/1.36
e.V
Total #
Molecule
578
1440
747
4000
1263
108
491
647
623
286
271
295
125
400
300
4338
260
1470
36
27
352
RMS
0.007
1.97
0.15
5.71
0.301
0.003
0.064
0.40
0.43
0.34
0.10
10
0.095
2.25
1.90
0.356
0.14
0.37
0.3
0.43
0.18
R2
0.997
0.999
0.994
0.999
0.993
0.994
0.998
0.987
0.983
0.990
0.997
0.998
0.992
0.999
0.960
0.994
0.98
0.968
0.950
0.744
0.95
Reaction Conditions
Temp/Solvent
25
25
25
0.1-1520 torr
25, Boiling Point
25
25, 41 solvents
25, 21 solvents
25 Octanol, Toluene CC14,
Benzene,
Cyclohexane, Ethyl Ether
25, Water
25, Hexadecane
25-190, Squalane,B18
25, Water/Methanol
25 , Alcohols, Aceteonitrile,
Acetic acid, DMF1, THF1,
pyridine
25-100, Water
Gas
25-130, Water, Acetone,
Alcohols, Dioxane,
Aceteonitrile
25 , Water
25, water
25, Water, Alcohols, DMF1
Aceteonitrile, DMSO1
2.
3.
DMF : N,N -dimethylforamide
DMSO: Dimethyl sulfoxide
THF: Tetrahydrofuran
GC retention times in SE-30 and PEG-20M liquid stationary liquid phase is not included in this
report.
MF: mole fraction
-------
4.3.2. Molecular Volume
The "zero order" liquid density -based molecular volume is expressed as
where V;frag is the volume of the ith fragment and A; is a correction to that volume based on both
the number and size of fragments attached to it. The V;frags are determined empirically from
measured volumes and then stored in the SPARC database. This zero order volume at 25° C is
further adjusted for shrinkage resulting from dipole-dipole and H-bonding interactions by the
following equation:
A J _ 1 _i_ A a iP i
T/ - T/ °
' — ' •>*
25 ^ -^ dipole - dipole T/ o ^ ^ H -bond
25 ' 25
-TT O n - OOna -TT o
* 25 *25
where D;is the bond dipole of the ith fragment, and a;, (3; is the H-bonding parameters of potential
proton donor and proton acceptor sites within the molecule, respectively. The product a; * (3; is the
largest H-bonding interaction contribution in the molecule. Adipoie_dipoie and AH-bond are adjustment
constants due to dipole-dipole and H-bonding, respectively. The final molecular volume at
temperature T is then expressed as a polynomial expansion in (T-25) corrected for H-bonding (HB),
dipole density (Dd) and polarizability density (Pd) interactions as
)I.an(T-25)n]
where an are trainable parameters. The SPARC molecular volume calculator was tested on more
than 1440 compounds at 25° C and the RMS deviation error between calculated and measured
values was 1 .97 volume units as shown in Figure 17.
SPARC calculates the density at 25° C directly from the molecular volume calculator result
using the simple equation Density = Molecular Weight/Volume. The accuracy of the SPARC
density calculation depends purely on the accuracy of the calculated molecular volume.
87
-------
_
o
0)
«
_0
TO
O
6
w
200
400
600
800
1000
Observed (cm /mole)
Figure 17. SP ARC-calculated vs. observed-liquid density based volume at 25° C for 1440 organic
molecules. The RMS deviation error was 1.97 cm3 mole"1 and R2 was 0.999.
4.3.3. Microscopic Bond Dipole
For induction and dipole-dipole interactions, the effective microscopic dipole (j, is computed
from the bond dipole contributions of individual substituents (j,;0. (j,;0 describes the effective
substituent bond dipole strength when it is attached to either a methyl, ethylenic or aromatic group.
The effective substituent bond dipoles, (j,;0, used in SPARC, are derived from tabulated substituent
dipoles. Specifically, for each substituent, S, an S-methyl, S-ethylenic and S-aromatic dipole value
is stored in the SPARC database based on the dipole moments reported by McClellan [46]. For any
given molecule, a simple algebraic summation over all i fragments is employed to calculate the
effective microscopic dipole for the molecule as
S; is a reduction factor for steric blockage due to an appended molecular structure to substituent i
given as
S , = P ,u ,
-------
where v; is the solid angle occluded by the appended molecular structure. The value of v; depends
on the size and number of the groups that are appended to i. p; is the steric 'susceptibility' of each
substituent dipole-in-question. The reductions factors, S;, are usually small (< 5% for most
molecules) and will be ignored in this report.
For cases where multiple substituents are 'clustered', i.e., are attached to one of the
following: (1) the same aromatic ring or ethylenic unit or; (2) the same (or adjacent) aliphatic
atom(s), a 'local' vector sum is invoked, wherein each individual dipole is reduced by a fraction of
the resultant vector field of the other dipoles in the cluster. Details of the vector models will not be
given here except for two cases where the corrections are large. When the atom to which multiple
substituents are bound is also an intrinsic component of the individual dipoles (e.g., halogen atoms
attached to the same carbon), 'vectorial' reduction is substantial. For the chlorinated methane series
CC1X (x=l,2...4), the reduced individual dipole components are 1.8, 0.9, 0.4, 0.05 Debye,
respectively.
4.3.4. Hydrogen Bonding
Hydrogen bonding interaction is a direct site coupling of a proton-donating site of one
molecule with a proton-accepting site of another molecule. The H-bond interaction (AGn-Bond) is
resolved into a proton donating site a and proton accepting site |3, which in our models are
presumed to be independently quantifiable. The a and |3 for both the solute and solvent are
calculated as pseudo pKa's, with the electrostatic component treated as a dipole transition. Details of
the pKa computational methods were given earlier in this report; only a brief description for the H-
bond calculation is given here. Molecular structures are broken into functional units called reaction
centers and perturbers. The potential sites for reactions-of-interest are designated as the reaction
89
-------
centers, C. These are the smallest subunits that have potential to be a proton donating (accepting)
site. The perturbers, P, are the molecular structures appended to the reaction centers and are
assumed to be unchanged in the reaction. The ac and |3C of the reaction center, C, are adjusted for
the molecule in question by
a = a« +Saf P = Pc + 6ft p
where ac and |3C denote the intrinsic behavior of the reaction center. Both ac and |3C of the reaction
center are inferred indirectly from physical process measurements. 5ap and 5(3P denote changes in
reactivity effected by the perturber structure, P. SPARC computes both 5ap and 5(3P, which are then
used to "correct" the donating (accepting) site behaviors of each reaction center for the molecule of
interest in terms of potential "mechanisms", by
So, = 8aAe + So^ 8% = S&. + <%,
where 5aeie and 5ares (5peie and 5pres) describe the differential electrostatic and resonance effects of
P on the initial state versus the final state for the proton donating (accepting) sites, respectively.
The H-bond calculations are exactly as described earlier in the pKa models except for the
electrostatic multipole terms. Whereas ionization pKa is a monopole process resulting in a net
change in charge in the reaction center; hydrogen bond formation on the other hand is dipolar,
resulting in the change in bond dipole(s) in the reaction center. As was the case in the ionization
pKa calculation, electrostatic interaction terms up through substituent dipole are retained, but the
radial and angular dependence of electrostatic conduction differs as previously described for the
multipole model. The net result is a faster die-off of field effects with intermolecular distance and
increased sensitivity to dipole-dipole alignment for dipolar substituents. Also, because H-bonding
does not involve forming or breaking of covalent bonds, differential resonance stabilization of the
reaction center is small, which means resonance effects are small.
90
-------
4.4. SPARC Interaction Models
The differences in strength of the intermolecular interaction forces are reflected in the
physical property behavior of the compounds. For example, the boiling points for ethane,
chloromethane and ethanol are -88, -24 and 65.4 C, respectively. Dispersion interactions are
present in all 3 molecules, but dipole-dipole and induction interactions elevate the boiling points of
the chloromethane and ethanol. In addition, H-bond interactions exist only for ethanol and raise its
boiling point even higher. In general, compounds whose molecules interact through H-bonding
have higher boiling points than molecules of the same molecular weight, volume and dipole
moment where hydrogen bonding is not present. SPARC's interaction models are built on a limited
set of molecular-level descriptors (volume, polarizability, molecular dipole and hydrogen bonding
parameters) as described earlier in this report. These interaction models are for dispersion,
induction, dipole-dipole and H-bonding. Dispersion interactions are present for all molecules,
including non-polar molecules. Induction interactions are present between two molecules when at
least one of them has a local dipole moment. Dipole-dipole interactions exist when both molecules
have local dipole moments. H-bonding interactions exist when a; |3j or a,j (3; products are non zero,
where a represents the proton donator strength and |3 represents the proton acceptor strength.
4.4.1. Dispersion Interactions
Dispersion interactions occur between all molecules as a result of very rapidly varying
dipoles formed between nuclei and electrons at zero-point motion of the molecules, acting upon the
polarizability of other molecules to produce an induced dipole in the phase. The free energy
associated with these self-interactions is expressed as
91
-------
AG,(Dispersion) = pdlsp(Pf}iVl where Ptd = a t+ A*°p
v i
P? is the effective polarizability density of molecule i; pdisp is the susceptibility to dispersion; V;
and a; are the molar volume and average molecular polarizability, respectively. Dispersion is a
short range interaction involving surface or near surface atoms, and Adisp is an adjustment that
subtracts from the total polarizability a portion of the contributions of sterically occluded atoms in
the molecular lattice. Presently, SPARC only corrects for access judged to be less than that afforded
by a linear array of atoms (i.e., for branched structures or rings small enough to prohibit intra
penetration of the solvent). Branched (ternary or quaternary) atoms in an alkane structure will lose
a small part of their intrinsic molecular polarizability depending on the size and number of
appended groups, and the proximity of other branched carbons. Similarly, carbons in rings may
lose their intrinsic polarizability contributions depending on ring size and the presence of a ring
appendage.
4.4.2. Induction Interactions
Induction or dipole-induced dipole interactions occur between molecules where one or both
contain a permanent dipole. The dipole moment of a polar molecule has the effect of polarizing a
second molecule. This induced dipole moment can then interact with the dipole moment of the first
molecule. The magnitude of this effect depends on both the strength of the dipole moment of the
first molecule and the polarizability of the second one. For self interactions, the free energy change
due induction effects may given as
^.(Induction} = PmdPfDfVl
92
-------
where pind is the 'susceptibility' to induction and p~ , £)f and Vt are polarizability density
(adjusted for induction), dipole density and molar volume of the molecule-in-question, respectively.
Dd =- and P."' = ' n
V, ' V,
where (j,; is the effective microscopic dipole described previously and A;nd is a polarizability
adjustment for induction. Induction describes molecular polarization effected by a point dipole on
the surface, averaged over all orientations of the molecule. Inductive polarization interactions
'propagate' deeply within conjugated systems, but only one or two atoms deep in a nonconjugated
array of atoms. SPARC adjusts the molecular polarizability algorithmically, utilizing electron
withdrawing/releasing substituent parameters derived from pKa models; these models will not be
presented here, but a few simple rules will be given that capture all major adjustments. No
significant adjustments need to be made for unsubstituted systems. A polar substituent attached to a
TT unit (aromatic ring or ethylene unit) reduces its intrinsic 'induction' polarizability by -25 percent;
this results in a polarizability reduction of -1.3 cubic angstroms for a singly substituted ethylene and
-2.75 cubic angstroms for a substituted benzene or condensed ring hydrocarbon. Heteroatoms
within a given TT unit reduce the intrinsic polarizability by -75 percent. Adjustments for multiple
substituents are additive. Aliphatic hydrocarbon units have a pf approximating that of ethane or
0.025 A3/cm3.
4.4.3. Dipole-Dipole Interaction
Dipole-dipole interactions occur between molecules containing permanent dipoles. The
dipole aligns itself with other dipoles in a head-to-tail fashion resulting in a dipole-dipole attraction
93
-------
between these molecules. Between like molecules the free energy change of the dipole-dipole
interaction is given by
A Gn (dipole- dipole) = pd_d
where pd-d is the susceptibility to dipolar interactions; Df and V; are the effective dipole density and
molar volume of the molecule-in-question respectively, which are calculated as described earlier in
this report.
Since dipole-dipole interactions depend on the position of the polar molecule with respect to
its neighbor, the interaction forces is not additive in nature. SPARC adjusts the dipole-dipole
interaction as a function of number and magnitude of the microscopic bond dipole moment in the
molecule. In addition, SPARC adjusts AG;; for the ability of one dipole to align the dipole in the
other molecule into a favorable arrangement, and if the two dipoles can interact with each other
through H-bond interactions.
4.4.4. Hydrogen Bonding Interactions
For single site interactions between like molecules, the free energy change of the interaction
is given by
where PHB is the susceptibility to hydrogen bonding interactions and S;; is a steric reduction factor
given by
1 > S -l -
p- Sizea. .+Pe. Size/3 -thresh
>-> >J
and psii, Sizea and SizeB are the susceptibility to steric effects, and the steric sizes of the molecules
looking back from the a and B sites, respectively. No steric reduction is applied if the sum of these
94
-------
sizes does not exceed a threshold value. The threshold value is inferred from physical properties
data. The H-bond parameters are data-fitted on measured physical properties and stored in SPARC
databases. It should be pointed out that these parameters are process independent; the pn-Bond and
steric threshold value are also molecule independent.
4.4.5. Solute-Solvent Interactions
For symmetrical interactions (dispersion) the differential energy for mixing solute i and
solvent] is given by
(dispersion) = pdlsp (pf - prf V,
where i and j designate the solute and solvent molecules, respectively. 'Symmetrical' connotes
independence of order-of-mixing (i.e., 'i' into 'j' versus 'j' into 'i') in differential energy density. For
interactions that involve molecular orientation (dipolar or H-bonding),
where AG;; is the solute self-energy described previously and AGy describes the differential mixing
of an 'isolated' solute molecule 'i' into solvent 'j' and
c s~i c s~^c i c s~mc
with£G,"c (8G\j) describing solvation with (without) solvent destructuring; these two components
might also be termed 'outer' and 'inner' sphere solvation, respectively. wc and wnc are given by
95
-------
where the summations are over all dipole and H-Bonding interactions.
For a single site H-bonding interaction,
= A
'HB
-s-t
Vj_
J )
8G™ (H-B) = pHB(-Sua,/3J
where
- f
m
The fkB gauges reduction in solute-solvent H-bonding for outer versus inner sphere solvation. The
solvent-solvent term is the cavity creation energy in the absence of solvent destructuring. For
multiple hydrogen bonds the algebra becomes much more complex; each aB product is of the
following form:
1
k !
Soct /- -• /
^ftl
solvent
where W;k is the probability that the akB; bond will form, and Z;k represents a statistical factor for
the interaction. W;k and Z& are defined as
O' i r\k
intra + Uintra
solute
L-t J
solute
where Ointra are any intramolecular hydrogen bonds that can compete with the interaction of interest.
96
-------
Similarly for dipole-dipole interactions,
SG"- (dipole-dipole) =
-2 D.Dj
; (dipole - dipole} = pd_d (- 2 D, D] + D] ) Vl
where the dipole DJ is adjusted as /).,= /" /)
The fd gauges reduction in solute-solvent dipole interaction for outer versus inner sphere solvation.
Likewise for induction interactions,
SG"j (induction) = p^
c. (induction) =
V, + V,
4.5.
Solvents
SPARC uses the same molecular descriptors to describe solvents as it does solutes. The only
solvent 'known' to SPARC at start up is water. All other solvents must be entered as SMILES
strings and processed by the system. The user may declare the molecule as a solvent to be
remembered for future calculations. SPARC then stores the molecular descriptors that it has
calculated in memory. SPARC has a small 'common name' to SMILES string database in memory
and will use the common name if it finds it. If the name is not found, the user must supply a name
by which the solvent will be recognized. Any molecule that SPARC can run as a solute may be
declared a solvent, so essentially any organic solvent may be specified.
97
-------
4.6. Physical Process Models
All physical process models are built directly from the molecular interaction models
described above.
4.6.1. Vapor Pressure Model:
The saturated vapor pressure is one of the most important physiochemical properties of pure
compounds. Actually, the vapor pressure is among the most frequently measured and reported
physical properties. According to Dykyj et al [47], by the end of 1970's, vapor pressure data (as a
function of temperature) were available for more than 7000 organic compounds. Despite the
frequency of reporting in the published literature, the number of compounds where the vapor
pressure was truly measured and not extrapolated to 25° C from higher temperature measurements,
is limited. Most of the measured 25° C vapor pressures are for compounds that are either pure
hydrocarbons or molecules that have relatively small dipole moments and/or weak hydrogen bonds.
There is a pressing need to predict the vapor pressures of those compounds that have not been
measured experimentally. In addition to being highly significant in evaluating a compound's
environmental fate, the vapor pressure at 25° C provides an excellent arena for developing and
testing the SPARC self interaction physical process models.
The vapor pressure vp°; of a pure solute, i, can be expressed as function of all the
intermolecular interaction mechanisms, A G;; (interaction), as
o - AG (Interactio n)
logv» = + LogT+C
^' 2.303RT
where log(T) + C describes the change in the entropy contribution [48] associated with the volume
change in going from the liquid to the gas phase. For molecules that are solids at 25° C, the crystal
98
-------
energy contribution becomes important, especially for rigid structures such as aromatic or ethylenic
molecules that have high melting points (greater than 50 ° C). Each intermolecular interaction
(dispersion, induction, dipole-dipole and H-bonding) is assumed to have a different, but constant,
ratio of enthalpic to entropic contributions to the free energy process at 25° C. SPARC estimates the
crystal energy contributions assuming that at the melting point AG;; = AH;; + TA S;; = 0. See the
Crystal Energy Model section for more details.
The vapor pressure computational algorithm output was initially verified by comparing the
SPARC prediction of the vapor pressure at 25° C to hand calculations for key molecules. For
sample hand calculations see Figure 18. Since the SPARC self interactions model, AG;;, was
developed initially on this property, the vapor pressure model undergoes the most frequent
validation tests. The calculator was trained on 315 non-polar and polar organic compounds at 25°
C. Figure 19 presents the SPARC-calculated vapor pressure at 25° C versus measured values for
747 compounds. The SPARC self-interactions model can predict the vapor pressure at 25° C
within experimental error over a wide range of molecular structures and measurements (over 8 log
units). For simple structures, SPARC can calculate the vapor pressure to better than a factor of 2.
For complex structures such as some of the pesticides and pharmaceutical drugs where dipole-
dipole and/or hydrogen bond interactions are strong, SPARC calculates the vapor pressure within a
factor of 3-4. The statistical performance for the vapor pressure calculator is shown in Table 14.
The vapor pressure model was also tested on the boiling point and heats of vaporization. See later
sections.
99
-------
Figure 18. Sample hand calculations of the vapor pressure at 25° C for hexane and 1-chlorohexane
Molecular Descriptors
Volume
Polarizability
Dipole
H-Bond
131.56
11.77
0
0
138
13.73
1.404
0
Dispersion
Interaction
Induction
Interaction
\\.iT
131.6 =-2.69
0.00
Dipole-Dipole
Interaction 0.00
H-Bond
Interaction
0.00
EntropicTerm log (298) - 0.457 =2.05
Total (log vp) -0.64 atm (observed: -0.7)
fl3.731
-2'56 W 138 =-3'49
l3.73
138 = -0.32
f 1.4041
~2'837 ll38~J 138 = ~
0.00
log (298)-0.457 = 2.05
-1.8 atm (observed: -1.9)
100
-------
E
13
o>
2
0)
re
O
6
Oi
<
0.
tn
1
-1
-3
-5
-7
-7
-5 -3 - 1
Observed (log atm)
Figure 19. SP ARC-calculated vs. observed log vapor pressure for 747 organic molecules at 25° C.
The figure includes all the vapor pressure measurements (real not extrapolated) we found in the
literature. The RMS deviation was 0.15 log atm and R2 was 0.994.
4.6.2. Activity Coefficient Model
For a solute, i, in a liquid phase, j, at infinite dilution, SPARC expresses the activity
coefficient as
-RT log 7^ =
y (Interaction) + RT (log— + —+
V j 2.505
)
where the last term is the Flory-Huggins [49, 50], excess entropy of mixing contribution in the
liquid phase for placing a solute molecule in the solvent. The Flory-Huggins term is damped out by
orientation interactions (e.g. hydrogen bonding) that reduce the randomness of placement. When
the solute and solvent have the same molecular volume, the Flory-Huggins term will go to zero. It
should be noted that the negative log activity coefficient for small alkanes in squalane is a
consequence of the large Flory-Huggins contributions [22]. The activity follows the Raoult's Law
convention (i.e. y;j -> 1 as ft -> 1). The crystal energy is calculated the same way as for vapor
pressure.
101
-------
The activity coefficient computational algorithm output was initially verified by comparing
the SPARC prediction to hand calculations for key molecules. The SPARC activity coefficient
calculator was trained on 211 activities for a wide range of organic molecules. Figure 20 presents
the validation for SPARC-calculated log activity coefficients versus measured values for 491
compounds at 25° C in 41 different solvents. The SPARC activity coefficient test statistical
parameters are shown in Table 14. The activity coefficients calculator was also tested on the
solubility in more than 20 different solvents and partition coefficients in more than 18 different
solvents.
o
'•§
o
0)
o
ra
ra
O
8
7
6
5
4
3
2
1
0
-1
-2
-2
0246
Observed (log mole fraction)
Figure 20. Calculated versus observed log activity coefficients at infinite dilution for 491
compounds in 41 solvents including water. Only 15% of these compounds have strong dipole-
dipole and/or H-bond interactions. The RMS deviation was 0.064 log mole fraction R2 was 0.998.
4.6.3.
Crystal Energy Model
For aromatic compounds, the SPARC entropy energy calculator is very similar to that used
by Yalkowski [5, 10]. The SPARC calculator first estimates what fraction of the molecule is
conjugated (aromatic) and designate that value as Fa. At the melting point, Tmp, the AG in going
from a crystal to liquid, AGxstai, is zero, and
102
-------
~~ AHxstai ~ Tmp ASxstai ~~ 0
If we also assume that AHxstai and ASxstai don't vary significantly with temperature, T, we can write
AGxstal (T) ~ AHxstai ~ T ASxstai
and
AGxstal (T) ~ Tmp ASxstai - T ASxstai = (Tmp ~ T) ASxstai
Based on the vapor pressure equation given previously in terms of total interaction energy
(AG = -RT 2.303 log vp°;), the contribution of the log vp°; from the crystal energy is given by
logv/7 =
-23Q3RT
SPARC uses the Yalkowski value for ASxstai= 13.5 eu. for aromatics. SPARC then
calculates a first order correction as
ASxstai = 13.5 Fa + (1 - Fa) Na + LC + AAR
where Fa is the aromatic fraction, Na is a derived non-aromatic contribution of the crystal entropy
(this value is small) and LC is the coiling entropy that SPARC uses to estimate entropy change
associated with alkane chains of length greater than Y (this is linear in length Y). AAR is the
aromatic-aromatic bond rotation entropy change that is associated with the very low frequency
internal rotation in molecules having aromatic-aromatic bonds (e.g. biphenyl). This model works
fine for compounds with melting points less than 300° C. When the melting point gets much above
300° C, the model breaks down. This could be due to several factors, including our assumption that
AHxstai are ASxstai are independent of temperature. To compensate for this, SPARC incorporates a
second order non-linear contribution to ASxstai. The second order correction term looks like
103
-------
(T - T)2
- TT 17 mP
112 = *-iXra —2
where IT is data fitted contribution. The corrected crystal component of the log vapor pressure term
then becomes
.. .
-2.3030/Z77 T
4.6.4. Enthalpy of Vaporization
The heat of vaporization, AHV is sometimes referred to as the enthalpy of vaporization. It is
the difference between the enthalpy of the saturated vapor and that of the saturated liquid at the
same temperature. AHV is related to the slope of the vapor pressure versus temperature curve by the
Clausius-Clapeyron equation. Many estimation methods for AHV are simply based on either the
Clausius-Clapeyron equation or the law of corresponding states. However, in SPARC the enthalpic
contribution for any physical process is estimated from the corresponding free energy process. For
example, since the heat of vaporization can be determined from the vapor pressure, the enthalpy
contribution for each intermolecular interaction that contributes to the free energy process can be
expressed as
T AZJ i \ ~^Gfi(Interaction) H
Log AH i (vap) = — - LogT + C
2.303RT
where £Ga is the free energy change of the self interactions modified for the enthalpic contribution
as explained in the following. Similar to the vapor pressure model the log(T) + CH term describes
the change in entropy associated with the change in volume going from the liquid to gas phase upon
104
-------
vaporization. However, unlike in the vapor pressure model CH is independent of temperature and
represents the Clausius-Clapeyron integration constant [51]. The crystal energy calculation is the
same as in the vapor pressure model. For the AH(vaporization) contribution, SPARC modifies the
susceptibility of each molecular interaction mechanism as:
Aff
)
Mechanism
Q
Mechanism
' Mechanise
where ^Mechanism is dependent on the interaction mechanism (dispersion, induction, dipole-dipole
and H-bond), is data-fitted at 25° C and stored in the SPARC database. Likewise, the susceptibility,
pMechanism, depends on the type of the interaction mechanism (dispersion, induction, etc) and is the
same as explained earlier. Figure 21 shows the performance of the SPARC calculator for heat of
vaporization at 25° C and at the boiling point. The test statistical parameters are shown in Table 14.
0)
re
O
6
Q_
V)
10 20
O bserved (kcal /m ole)
3 0
Figure 21. SPARC-calculated vs. observed heat of vaporization. The RMS deviation was 0.302.
4.6.5. Temperature Dependence of Physical Process Models
The temperature dependence of some physical process and molecular volume models was
included in the previous discussion. In addition to the normal free energy temperature dependence,
105
-------
SPARC includes explicit temperature dependence associated with the molecular orientation
requirements for dipole-dipole coupling and hydrogen bonding interactions. To accomplish this,
initially, SPARC modifies the susceptibilities for dipole-dipole and hydrogen bonding as follows:
25 ,298^ „ _ „ 298
rri
^dipole-dipole ^dipole-dipole L „ dipole-dipole V dipole-dipole /J
25 .298 Q + 298
/"^H-Bond /"^H-Bond L ,-y-, H-Bond V H-Bond / J /T-T
where On-Bond and Odipoie-dipoie are data fitted parameters stored in SPARC database. Both:
and ^induction are set to be equal to 1. The AH and AS temperature dependence are described by the
first and the second term, respectively. SPARC assumes that the AH/AS contribution to AG is
constant at 25° C. The multiplier of the (298/T) in both equations is the temperature dependence
factor associated with molecular orientation. That is why the dipole-dipole and H-bonding
interaction will drop out faster than either dispersion or induction as the temperature increases.
Further, the enthalpic term 298/T is then expanded as a polynomial function of all the interaction
forces. In general, for any "activity-driven" process in SPARC, the susceptibility, p, of a given
interaction at temperature T is modeled as function of the H-bonding (HB), dipole density (Dd) and
polarizability density(Pd) is given by
T
PM,
rechanism
25
r Mechan,
'ism
where pMechanism is dependent on the interaction mechanism (dispersion, induction, dipole-dipole and
H-bond). When T = 25° C, the two susceptibilities are equal to each other. an are trainable
parameters quantified from physical properties measurement, mainly on 4000 boiling points
measured at different pressures, 600 heat of vaporizations (at the boiling point) and on more than
600 GC chromatographic retention times at different temperatures ranging from 30 to 190° C.
106
-------
However, this is a small correction to any physical property such as AHV, vapor pressure and boiling
point. It describes the small temperature dependence of the enthalpic contribution. For example,
for non-polar molecules such as alkanes, alkenes and aromatics, this correction amounts to only a
few degrees in the boiling point calculator. For molecules that contain a small dipole moment, such
as aromatic or aliphatic halogens, this correction might be 3-6 degrees in boiling point estimation.
Molecules that have a large dipole moment, such as nitrobenzene, or the capability to H-bond with
each other, such as phenol, might produce a correction between 6-12 C in boiling point estimation.
4.6.6. Normal Boiling Point
If a liquid is heated in an open container, its temperature rises only until its vapor pressure
equals the external pressure. At this point, the liquid changes completely into vapor at constant
temperature. This temperature is known as the normal boiling point of the liquid. SPARC
estimates the boiling point for any molecular species by varying the temperature at which a vapor
pressure calculation is done. When the vapor pressure equals the desired pressure, then that
temperature is the boiling point at that pressure. The normal boiling point is calculated by setting
the desired pressure to 760 torr. Boiling points at a reduced pressure can be calculated by setting the
desired pressure to a different value. Since the same factors that affect the boiling point of a
compound affect the vapor pressure, the dipole-dipole and H-bond interactions become less
important and decrease significantly above the boiling point. The SPARC boiling point calculator
was tested against 4000 boiling points measured at different pressures ranging from 0.05 to 1520
torr spanning a range of over 800° C as shown Figure 22 while the statistical parameters are shown
in Table 14.
107
-------
700 -r
O
jo
D
co
O
oi
CL-
OT
100
100 300
Observed (° C )
500
700
Figure 22. SP ARC-calculation vs. observed 4000 boiling points for pressure ranging from 0.1 to at
1520 torr. The Total RMS deviation was 5.71° C. The RMS deviation for polar molecules was 8.2°
C and R2 was 0.9988, while for non-polar molecules the RMS was 2.6° C and R2 was 0.9995
4.6.7. Solubility (Activity Coefficient as a Function of Concentration)
Solubility is the maximum amount of a compound that will dissolve in pure solvents at a
given temperature. SPARC does not calculate the solubility from first principles, but from the
activity coefficient model described previously. SPARC estimates molecular solubility from a
calculation of the infinite dilution activity coefficient, y°°. When log y°° is greater than 2, the mole
fraction solubility can be reliably estimated as xso1 = 1/Y°°- However, when the log y°° is calculated
to be less than 2, this approximation fails. In these cases, y°° is greater than yso1 and SPARC would
underestimate the solubility. In order to overcome these limitations, SPARC employs an iterative
calculation. SPARC sets the initial guess of the solubility as Xguess = l/y°°- SPARC then 'prepares' a
mixed solvent that is XgueSS in the solute and (1-Xguess) in the solvent. SPARC recalculates y°° in the
'new' solvent. This process is continued until y°° converges or goes to 1 (miscible). Using this
technique, SPARC correctly calculates the solubilities of the aliphatic alcohol series and shows
propanol to be miscible and butanol to be very soluble in water. This technique also works for
108
-------
mixed solvent systems. For example, the log mole fraction solubility of toluene in a
water(30)/ethanol(70) mixture is observed to be -1.47. The initial "guess" from the SPARC
calculator was -1.87. This value converged to -1.50 after three iterations. Figure 23 shows display
a result of SPARC calculated log solubilites of 260 compounds versus observed values at 25° C.
The RMS deviation was 0.321 with an R2 of 0.991. The RMS deviation for 119 liquid compounds
was 0.135 with an R2 of 0.997, while for 141 solids log mole fraction solubilties, the RMS deviation
was 0.419 with an R2 of 0.985. The RMS deviation for the solids compounds is 3 times greater than
that for the liquid compounds due to the crystal energy contributions.
u
£
I
•o
13
3
15
6
-1 0 -5
Observed (log mole fraction)
Figure 23. A test results for SPARC calculated log mole fraction solubilites for 260 compounds at
25°C versus observed values. The RMS deviation is 0.321 and R2 is 0.991. The RMS for 119 liquid
solubilties is 0.135 and R2 is 0.997 while for the 141 solids the RMS is 0.419 and R2 is 0.985.
4.6.8.
Mixed Solvents
SPARC can handle solvent mixtures for a virtually unlimited number of components.
Speed and memory requirements usually limit the number of components to less than twenty on a
PC. The user specifies the name and volume fraction for each solvent component. Each of the
solvent components must have been previously initialized as a solvent. SPARC will allow the user
to specify a name for the mixture so that it can be used later as a 'known' solvent.
109
-------
Solvent descriptors that are essentially bulk in nature (e.g. polarizability) are volume fraction
averaged when employed in the interaction models described earlier. Solvent descriptors that are
site interactions (e.g. hydrogen bonding) are mole fraction weighted when used in the interaction
models, and the interactions summed over all solvent components. SPARC calculation of solubility
of organic molecules in binary solvent mixtures has been tested and appears to work well. Most of
the binary mixture data available is in the form of solubilities. Figure 24 shows SPARC-calculated
versus observed log activities in mixed methanol/water medium.
y = 0.9535X + 0.1422
0
1234
Observed (log mole fraction)
Figure 24. SPARC-calculated versus observed log activities for 120 compounds in water/methanol
mixed solvent at 25° C. The RMS deviation error was 0.18 and the R2 was 0.980.
4.6.9. Partitioning Constants
All partitioning (Liquid/Liquid, Liquid/Solid, Gas/Liquid and Gas/Solid) constants are
determined in SPARC by calculating the activity of the molecular species in each of the phases
without any modification to the activity models.
110
-------
4.6.9.1 Liquid/Liquid Partitioning
SPARC calculates the liquid-liquid partition constant such as the octanol/water distribution
coefficient, Kow, by simply calculating the activity at infinite dilution of the molecular species of
interest in each of the liquid phases as
where the y°°'s are the activities at infinite dilution of the compound of interest in the two phases and
Rm is the ratio of the molecularites of the two phases (Mi/M2). Although octanol-water partition
coefficients are widely used and measured, the SPARC system does not limit itself to only this
calculation. SPARC can calculate a compound's liquid-liquid partition coefficient for any two
immiscible phases. The phases may also be mixed solvents. In fact, when calculating an
octanol/water partition coefficient, SPARC calculates the activity in water and the activity in wetted
octanol, i.e., a 5% water 95% octanol (by volume) mixture. The water in the octanol phase makes
this a more cohesive solvent than pure octanol. The SPARC -calculated Kow's are not greatly
different than those calculated assuming dry or pure octanol when the molecule of interest is small
and/or has a large hydrogen-bonding interaction. However, the differences can be significant (-0.8
log units) when the molecule is large and hydrophobic, as in the case of large polynuclear aromatics
(PNA's) e.g., coronene. Figure 25 shows the current performance of SPARC for log Ksoivent/water,
where the solvents are carbon tetrachloride, benzene, cyclohexane, ethyl ether, octanol and toluene.
Figure 26 displays a comparison of the EPA Office of Water recommended observed octanol-water
distribution coefficients versus SPARC and C log P calculated values. The RMS deviation and R2
values were is 0.18 and 0.996 respectively for SPARC and 0.44 and 0.978 respectively for ClogP
calculated values.
Ill
-------
TO
_0
TJ
.8
_o
n
O
7
2
-3
-8
y = 0.9721x+0.0363
-8
-3
Observed log Kow
Figure 25. SPARC-calculated versus observed log distribution coefficients Ksoivent/water for 623
organic compounds in carbon tetrachloride, benzene, cyclohexane, ethyl ether, octanol and toluene
at 25° C. The RMS deviation was 0.38 and R2 was 0.983.
re
Q_
O)
O
C
re _
O re
Ct O
0.
(O
8
6 H
4
2 H
0
024
Observed log Kow
Figure 26. Test for EPA OWPP calculated Koctanoi/water versus recommended measured values.
Squares are SPARC calculate values, circles are ClogP calculate values. The RMS deviation and R2
values were is 0.18 and 0.996 respectively for SPARC and 0.44 and 0.978 respectively for ClogP
calculated values
4.6.9.2. Liquid/Solid Partitioning
SPARC calculates liquid/solid partitioning in a manner similar to liquid/liquid partitioning,
except that for the solid phase the self-self interactions, AGy, are dropped from the calculation.
Later in this report, the capability of mixed-solvent/solid partitioning is applied to the calculation of
112
-------
liquid chromatographic retention times. There the mobile phase is a water-methanol mixture and
the stationary phase is octadecane/surface-water.
4.6.9.3. Gas/liquid (Henry's constant) Partitioning
For solutions that are so dilute that each solute molecule is surrounded only by solvent
molecules, small changes in the solute concentration will not affect the composition of the nearest
neighbor molecules. In this case, the intermolecular interactions the solute molecule experiences
will not change with concentration, the vapor pressure will be proportional to the mole fraction of
the solute and Henry's constant may be expressed as
o co
H* = vPi y..
where vp;° is the vapor pressure of pure solute i (liquid or subcooled liquid) and Yy00 is the activity
coefficient of solute (i) in the liquid phase (j) at infinite dilution. SPARC vapor pressure and
activity coefficient models are used to calculate the Henry's constant for a any solute out of a given
solvent liquid phase as shown in Figure 27. An application of SPARC-calculated Henry's law
constants for the prediction of gas-liquid chromatography retention times in polar and non-polar
stationary liquid phases is presented later in this report.
4.6.9.4. Gas/Solid Partitioning
SPARC calculates gas/solid partitioning in a manner similar to gas/liquid partitioning. For
the solid phase, the solvent self-self interactions, AGy, are dropped from the calculation when one of
the phases is solid (i.e., surface interactions; no dissolution of the solute in the solid). This type of
modeling is useful for calculating retention times for capillary column gas chromatography.
113
-------
.
o
£
re
O
4 -,
2 -
0 -
-2 -
-4 -
-6 -
-8 -
-10 -
y = 0.998x - 0.006
-1 o
-8
-6
-4
-2
Observed (mole/L)
Figure 27. Observed vs. SPARC-calculated Henry's constants for 271 organic compounds in
hexadecane. The RMS deviation was 0.1 (mole/L)2 with an R2 was 0.997.
4.6.10. Gas Chromatography
Despite some limitations, the Kovats index has found much greater use than all other
specialized retention specification schemes. The Kovats index is the only retention value in gas-
liquid chromatography (GLC) in which two fundamental quantities, the relative retention and the
specific retention volume are united [52]. Moreover, a series of explicit relationships between
retention indices and a number of physicochemical quantities related to GLC have been developed.
Also, many different linear relationships between the Kovats index value for a molecule and other
fundamental molecular properties such as carbon number, boiling point and refractive index have
been derived [52, 53].
The Kovats [54] index expresses the retention of a compound of interest relative to a
homologous series of n-alkanes examined under the same isothermal conditions. The Kovats index
for a particular compound of interest is defined as the carbon number (CN) multiplied by 100 of a
hypothetical n-alkane having exactly the same net retention volume characteristics of the compound
of interest measured under the same conditions:
114
-------
KI = 100* ( °r NC 6 Nx + CN
where KI is the Kovats Index of the compound of interest, X, X is a compound with a retention
between that of the first n-alkane and second n-alkane standard, CN is the number of carbon atoms
in the first n-alkane standard, en +1 is the number of carbon atoms in the second n-alkane standard,
VNX is the net retention volume of the compound of interest X, VNC is the net retention volume of
the first n-alkane standard, and VN(C+I) is the net retention of the second n-alkane standard.
Numerous investigators have attempted to calculate or predict KI using physicochemical
descriptors like boiling point, density, dipole moment, etc. Unfortunately, all of the correlations of
retention indices and the various physicochemical properties are either relatively limited in scope or
their application is restricted to a particular chemical class. Other attempts to predict retention
indices for a wide range of molecular structures using molecular bond length, molecular bond angle,
topological indices [52, 53, 55], or other molecular characteristics have been only marginally
successful. Most of these studies also were restricted to a particular class of molecules on a specific
stationary liquid phase.
Despite all the attempts to predict Kovats index, no realistic scheme with widespread
application for different classes of compounds on different polarity stationary liquid phases is
available. The following is a discussion of SPARC models for the Henry's constant and Kovats
index applied to branched hydrocarbons on squalane. Our goal, however, is to develop general
mathematical models to calculate the Kovats index at any temperature for a wide range of different
classes of compounds on different polar and non-polar stationary liquid phases using the SPARC
Henry's constant calculator described earlier.
115
-------
SPARC vapor pressure and activity coefficient models are used to calculate the Henry's
constant for a solute in a squalane liquid phase. Henry's constant can be related to the net retention
volume, VN, by
RT
TT =
M VN
where M is the molecular weight of the solvent, and VL is the volume of the stationary phase.
Substituting in the previous equation (previous page), we get
KI = 100 x ( logHN* ~ lo§#^ + CN )
where HNX, HNZ, and HN(Z+I) are Henry's constant for the compound of interest X, first n-alkane
standard and the second n-alkane standard, respectively.
4.6.10.1. Calculation of Kovats Indices
Retention indices may be reproduced within a laboratory using modern instrumentation with
considerable precision over finite time periods. Reproducibility of 0.1 units was reported by
Schomburg and Dielmann [56] in 1973. However, squalane columns produced reproducible results
for only for a few hours and, therefore, need to be continually replaced. For routine operation a
reproducibility of about 1 Kovats index unit might be expected with a squalane liquid phase.
Unfortunately, inter-laboratory reproducibility remains unsatisfactory, except for a few cases. The
actual discrepancies between experimental values of retention indices for identical compounds
obtained at different laboratories in routine analysis is assumed to be up to ± 10 Kovats units or
even more [53].
116
-------
4.6.10.2. Unified Retention Index
The unified retention index developed by Dimov [57, 58] has been used to explain the
variations in the retention index of simple hydrocarbons on Squalane liquid phase. The temperature
dependence of the retention index is well known, the function d(KI)/dT being hyperbolic. A
statistical treatment using simple regression analysis of the data allows computation of the unified
retention index (UIx) as
UIT = UI0 +
dT
where UI0 is the Kovats index at 0° C and dUI/dT the temperature dependence where -dUI/dT is the
slope of the plotted data. The UIj is a statistically obtained value and, hence, it is more reliable than
any individual KIexp value. Also dUI/dT is a more reliable value than d(KI)/dt for estimation of the
temperature dependence of retention indices. The UIj and dUI/dt served as the observed values for
the optimization of SPARC dispersion parameters for prediction of the retention indices. Figure 28
shows the SPARC-calculated versus observed [57, 58] Kovats index at 25° C for 156 organic
compounds in a squalane liquid phase. The RMS deviation was less than 7 Kovats units, a value
that approximates interlaboratory experimental error. The SPARC physical properties and the
temperature dependence models were also tested on GC chromatographic retention times in non-
polar liquid phase such as squalane and B-18, and polar liquid phase such as SE-30, OV-101 and
PEG-20M at various temperatures. The RMS deviation for the Kovats index at 80° C in squalane
and at 130 °C in B-18 for!39 organic compounds was 9.3 and 12 Kovats units, respectively. See
Table 14 for some of these results.
117
-------
1200
"§ 900
re
O
600
300
0
200 400 600 800
O bserved (kovtas)
1000
1200
Figure 28 SPARC-calculated versus observed values for the GC chromatographic retention time in
squalane liquid phase25° C for 156 organic compounds. The RMS deviation was 7 kovats.
4.6.11. Liquid Chromatography
Just as the gas-liquid partition constant (Henry's constant) can be used above to model GLC
retention time, SPARC uses liquid-solid partition constant to model liquid chromatography. To
date we have only looked at one reversed phase separation. This work was a very precise
measurement of the k prime values for a broad range of solutes on a Cig reversed phase column
using several different mobile phase compositions. Our major observation in modeling the retention
times was that the data could not be modeled without including a polar, hydrogen bonding species
as part of the stationary phase. The activities of several of the molecules in the data set had been
measured in hexadecane (very close to Cig) and in mixed solvents. These measurements were not
consistent with the observed LC retention times whenever the molecule of interest had strong
hydrogen bonding sites. We modeled the LC retention times using a three phase model. The
mobile phase was modeled as a mixed solvent as described previously with no further refinements.
The stationary phase was modeled as two stationary phases. The Cig phase bonded to the silica was
treated as a Cig molecule. The second stationary phase was modeled as an unknown phase whose
polarizability density, dipole density and hydrogen bonding characteristics were inferred by SPARC
118
-------
from the observed retention times. The values for these later molecular descriptors were within a
few percent of what would be expected for water sitting on isolated sites on the surface.
OH ^u
r
,s\
Si
,Si. Sj.
\
Si
Si
Wetted Silica on a C\g Chromatographic Column
Using surface water as the third phase, SPARC models LC retention times (relative to one of the
molecules) as
!*-'& A rel !*-'& A stationary/mobile L- ref
where the stationary/mobile phase K is expressed as
A. stationary/mobile " A cM/mofe!'fe (•* ~ "/ A. surface water/mobile
where F is the fraction surface coverage of Cig. The two partition coefficients are calculated as
KCls/
Cls/mobtte
/V surface water/mobile
moMe
cls
Cig/mobile
/mobile ~ & /
surface water
^water/
water/mobile
where the log R values are the molecularity corrections to convert the activity based K values to
concentration based K's. Using this approach the best fit to the data was found with a stationary
phase composed of 95% Ci8 and 5% isolated surface water, both presumably bound to silica. The
following figure is a fit of the data.
119
-------
3 n
I 1
re
O
-1
-1 0 1
O bsrve d
Figure 29. SPARC-calculated versus observed log Ksurface for LC retention time in water methanol
mixture at 25° C. The RMS deviation was 0.095 and R2 is 0.992.
4.6.12. Diffusion Coefficient in Air
Several engineering equations exist that do a very respectable job of calculating molecular
diffusion coefficients in air over wide ranges of temperature and pressure. The equation most
compatible with the SPARC calculator is also the relationship that seems to perform the best for a
wide variety of molecules. This equation is that of Wilke and Lee [59], which for a general binary
diffusion coefficient is expressed as:
DAB = f3.03 - (0.98/M^)J(10~3)-
T3/2
where DAB is the binary diffusion coefficient in cm2/s, T is the temperature in K, MA and MB are
the molecular weights of A and B in g/mol, MAE is 2[(1/MA) + (I/Me)]"1 and P is the pressure in
bar. The QD is a complex function of T , and has been accurately determined by Neufeld [60]
where T = kT/SAB- The term o is determined from the liquid molar volume (calculated by SPARC
as a function of T) as,
-------
for the molecule. The coefficients of the Neufeld equation are stored in the SPARC database. The
Wilke-Lee approach predicts gas phase diffusion coefficients to better than 6%. Figure 30
compares observed to SPARC calculated gas phase diffusion coefficients at 25° C.
3
_O
TO
9
o
Q.
CO
0.05
0 . 1
0.15
0 .2
Observed (cm / s )
Figure 30. The SPARC-calculated versus observed diffusion coefficients in Air. The RMS
deviation was 0.0034 cm2/s.
4.6.13.
Diffusion Coefficient in Water
Several engineering equations exist that do a very respectable job of calculating molecular
diffusion coefficients in water. The equation most compatible with the SPARC calculator is
_ 1.4xlQ4
VlsLer VT
where V; is the molar volume and Viswater is the viscosity of water equal to 1.004 at 20° C.
121
-------
300 -n
•g 100
5 -100
-300
-300
100 100
O bserved
300
Figure 31.Observed vs. SPARC training set for 2400 calculations. The RMS is 0.29 and R2 is 0.997.
4.7.
Conclusion
A composite SPARC training set output is shown in Figure 31. The training set includes
vapor pressure (as a function of temperature), boiling point (as a function of pressure), diffusion
coefficients (as a function of pressure and temperature), heat of vaporization (as function of
temperature), activity coefficient (as a function of solvent), solubility (as a function of solvent and
temperature), GC retention times (as a function of stationary liquid phase and temperature) and
partition coefficients (as a function of solvent). This set includes more than 50 different pure
solvents (see Table 15) as well as 18 mixed solvent systems.
Table 15. Solvents that have been tested in SPARC
Chloroform
1-propanol
isobutanol
benzyl ether
cyclohexane
cyanohexane
heptane
methanol
nitroethane
nonanenitrile
quinoline
1-butanol
butanone
acetone
benzene
decane
ethanol
hexane
nonane
octane
squalene
phenol
1-chloro hexadecane
1-nitro propane
2-nitro propane
benzylchloride
bromobenzene
dioctyl ether
hexadecane
1 -butyl chloride
nitro cyclohexane
pentadecane nitrile
1,2,4 trichlorobenzene
1-dodecanol OV-101
2-dodecanone isopropanol
aceteonitrile PEG-20M
benzonitrile SE-30
butronitrile pyridine
cyano cyclohexane water
heptadecane squalane
nitrobenzene 1 -me naphthalene
nitro methane 2-me naphthalene
isoquinoline m-cresol
hexafluorobenzene p-xylene
122
-------
SPARC can reliably estimate numerous physical properties of compounds using the same
interaction models without modifications or additional parameterization. The SPARC physical
properties calculator predictions are as reliable as most of the experimental measurements for these
properties. For simple structures, SPARC can calculate a property of interest within a factor of 2, or
even better. For complex structures, where dipole-dipole and/or H-bond interactions are strong,
SPARC calculations are within a factor of 3-4.
5. PHYSICAL PROPERTIES COUPLED WITH CHEMICAL REACTIVITY
MODELS
SPARC models have been extended to ionic organic species by incorporating monopole
electrostatic interaction models into SPARC's physical properties toolbox. These ionic models play
a major role in modeling the activity and solubility of ionic species in any solvent system. These
capabilities (ionic activity), in turn, allow SPARC to calculate gas phase pKa. Likewise, the
calculation of gas phase pKa will allow SPARC to estimate ionization pKa, zwitterionic equilibria,
ionic partitioning and Ei/2 chemical reduction potential in any solvent system.
5.1. Henry's Constant (Gas/liquid Partition Coefficient) for Charged Compounds
Recently, experimentalists have been able to carry out reasonably accurate measurements of
proton transfer equilibria in the gas phase. These measurements provide a direct measure of relative
gas phase acidity. Several international meetings have been held with the purpose of developing a
coherent absolute scale for gas phase acidity. This scale is now relatively stable, and Professor Taft
at U.C. Irvine has kindly provided us with these screened datasets. The combination of absolute and
relative pKa's in both the gas phase and in water were used to develop the SPARC ionic interaction
models. The following thermodynamic cycles were used in this development.
123
-------
Arlgas ^
•A- gas ^
TJ+ v
11 gas ^
AHwater (soiv) — >
A gas ' 11 gas
•A- water (solv)
TT+
Jl water (solv)
Arlgas
Gas pKa
-Henry's Constant
- Henry's Constant
Henry's Constant
AHwater (solv) —> A water (solv) + n water (solv) pKa in Water (solv)
Steps 1, 4 and 5 are (or are related to) the gas phase pKa for AH, the Henry's constant of AH out of
water (solvent) and the pKa of AH in water (solvent), respectively. These values are either known
or can be calculated by SPARC. Step 3 represents the Henry's constant for a proton, and will be the
same (or a constant) for all molecules AH. This value, along with those for most counter ions in
water, have been estimated in the literature. AG for step 2 can be inferred from the other steps.
SPARC monopole interaction models were developed to calculate the inferred step 2 values.
5.1.1. Microscopic Monopole
The effective molecular monopole (microscopic monopole) was computed as the sum of
monopole contributions of individual substituents, modified for steric blockage by appended
molecular structures. Individual monopoles were summed algebraically. Monopole moments for
each charged substituent were estimated. For each charged substituent, S+/-, an S+/—methyl, S+/~
aromatic and S+/~ethylenic dipoles were inferred from data and stored in the SPARC database.
5.1.2. Induction-Monopole Interaction
Induction or monopole-induced dipole interactions will occur between molecules where one
(or both) contain a monopole. Between like molecules
A GU (monopole induction) = pind P\ nii Vi
124
-------
where pmd is the susceptibility to induction and P1;, ni; and V; are polarizability density (adjusted for
induction), monopole density and molar volume of the molecule-in-question, respectively.
p, a,-+ A™ , MS .
P. = • and rn =
v, m' V,
where MS; is the microscopic monopole strength described above, A^ci is a polarizability
adjustment for induction and a; is the average molecular polarizability. Induction describes
molecular polarization effected by a monopole on the surface, averaged over all orientations of the
molecule. Inductive polarization interactions 'propagate' effectively within conjugated systems, but
only one or two atoms deep in a nonconjugated array of atoms. SPARC adjusts the molecular
polarizability algorithmically, utilizing electron withdrawing/releasing substituent parameters
derived from pKa models as described previously.
5.1.3. Monopole-Monopole Interaction
Monopole-monopole interactions occur between molecules, each containing a monopole.
Between like molecules
A0r« (monoP°le ~ monopole) = n ^n, V,
where pm.m is the susceptibility to monopole-monopole interactions; ni; and V; are monopole density
and molar volume of the molecule-in-question, which are calculated by SPARC as described in the
previous section.
5.1.4. Dipole-Monopole Interaction
Dipole-monopole interactions occur between molecules containing a permanent dipole and a
monopole. Between like molecules
125
-------
/±Gu(dipole-monopole) = p ^ J). m. y.
where pd-m is the susceptibility to monopole-dipole interactions, d;, ni; and V; are dipole density,
monopole density and molar volume of the molecule-in-question which are calculated by SPARC
described previously.
51.5. Hydrogen Bonding Interaction
An a and a B associated with each of the charged substituents are calculated as described earlier in
this report and the same SPARC hydrogen bonding interaction models are used
5.2. Estimation of lonization pKa in the Gas Phase and in non-Aqueous Solution
The pKa in the gas phase was calculated after these monopole ion models were stable, using
the same thermodynamic loop above with water as the solvent. Likewise, the pKain any solvent
can be calculated by using the same thermodynamic loop except changing the solvent from water to
the solvent of interest. Figure 32 and 33 show the SPARC calculated versus observed values for the
pKa's in the gas phase and in 9 different non-aqueous solvents.
600 n
•o 500
o>
ra 400
o 300
re
O 200
100
100 200 300 400 500 600
Observed (Kcal)
Figure 32. SPARC-calculated versus calculated ionization pKa in the gas phase for 400 organic
compounds. The RMS deviation was 2.25 Kcal with an R2 of 0.998
126
-------
3
_O
re
O
45
35
25
15
5
-5
-5
15 25 35
Observed (kcal)
45
55
Figure 33. SPARC-calculated versus calculated ionization pKa in non-aqueous solvents. Solvents
were DMSO, THF, DMF, 3 alcohols, aceteonitrile, pyridine and acetic acid. The RMS deviation
was 1.9 Kcal with an R2 of 0.996.
5.3. Ei/2 Chemical Reduction Potential
The original electron affinity (EA) calculator models were first refined to better integrate
with the new models that were used to estimate one electron reduction in the condensed phase. As
was the case for estimating gas phase and non-aqueous pKa, SPARC uses the following
thermodynamic cycles:
Mgas
AG
EA
M. g
1V1 solvent ' 1V1 gas
2 solvent ^ gas
-» ~ ~
-AG?
gas
solvent
solvent £ solve
nt ~^ M solvent A GEI , 2
The sum of the first four steps leads to the fifth, the desired half reduction potential for a
compound of interest in an arbitrary solvent. The change in internal energy for the addition of an
electron (step 1) has already been modeled (electron affinity section). Steps 2, 3 and 4 are Henry's
127
-------
constant calculations for the three species. Steps 1 and 2 are already implemented in SPARC. Step
3 exists in the literature [64] for water, but is not modeled for arbitrary solvents or conditions in the
present SPARC calculator. Step 3 was taken as a constant term in the SPARC model and was data-
fit for several solvent systems. Step 4 represented a new model for the SPARC system.
This overall new SPARC model was tested against the compendia of measured one electron
reduction potentials for clean systems. In these comparisons, the differential sorption terms were
ignored and equal access to all the substituents assumed. Of all the solvent systems studied, water
was the most problematic. Non-aqueous data in the literature was readily available and reasonably
consistent across measurements from several laboratories, whereas reported water measurements
often varied by as much as one electron volt. These SPARC models were initially developed and
tested using the non-aqueous data. The solvent systems used were picked to represent a wide
variety of hydrogen bonding, dipolar and inductive environments. Once the ion-transfer (step 4)
models were in place and tested for a large number of molecules in a variety of solvents, our efforts
were focused on unraveling the problems with modeling the aqueous reduction system. Steps 1, 2
and 4 were in place and step 3 was well estimated in the literature. The major problem ultimately
encountered with the aqueous reduction measurements was the lack of consistency in data reporting,
in particular, the reporting of 'effective' reduction potentials that had incorporated into them the
effect of pH. pH-dependent SPARC models were then developed and implemented to calculates
aqueous reduction potentials as a function of pH. Once these aqueous models were in place, a final
refinement of the models was undertaken using both aqueous and non-aqueous reduction data. The
refined models are now in place for a limited number of molecular structures (i.e., for only electron-
withdrawing groups such as NO2, C=N, etc) and available for estimation of one-electron reduction
potentials with the SPARC calculator.
128
-------
_O
ra
O
Observed (e.V)
Figure 36. SPARC-calculate versus observed Biochemical reduction potential in water, 3 alcohol's,
DMF, THF, DMSO, aceteonitrile at 25° C. The RMS was 0.35 e.v.
5.4. Chemical Speciation
Complex chemical solutes may exist in solvents in multiple forms or species (ions,
zwitterions, tautomers, hydrates) that differ dramatically in their chemical and physical properties.
The distribution of a given chemical among its various speciates forms depends on the system
conditions (temperature, pH, ionic strength) and medium composition (gas, liquid, solid
components). Although much is known about the existence of such species, with the exception of
simple ionization, very little data exist for quantifying these speciation processes. This is
particularly true for complex (poly-functional) molecules and for aqueous systems. Another
difficulty in studying or modeling speciation processes is that they are frequently coupled (e.g.,
ionization may occur with synchronous tautomerization or hydration). As described herein,
SPARC speciation models for ionization are fully developed and tested and models for
tautomerization are operational, but only minimally trained and tested at this writing ongoing
129
-------
research will complete and integrate these existing models into SPARC and then develop, test, and
integrate hydration models.
5.5. Hydration
Hydration in the SPARC context applied herein, is the reversible addition of water across a
'pi-electron functional group'. The two structural units where this is known to occur are the
carbonyl and imine functional groups. In each case, a hydroxyl group attaches to the base carbon
and a hydrogen atom to the heteroatom.
H
H9O
OH
H
H
H
OH
In the SPARC modeling approach, these functional groups are reaction centers, C, and any
molecular structure(s) appended thereto are designated perturber, P, structure.
P-C; ^ > P-Cf
In the case of hydration, differential solvation of the two species involved will play a major
role. In this case we started with the following thermodynamic cycles to model the reaction.
P-C=0 (g)
P-C(OH)2 (g)
P-C=O (1)
P-C=O (1)
•P-C(OH)2(g) AGhydration(g)
(1) AGtransfer(O)
-AGtranSfer(=0)
P-C(OH)2 (1) AGhydratlon(l)
Henry's constant
Henry's constant
130
-------
The top reaction was modeled using the usual SPARC perturbation approach (see chemical
reactivity section) as
^^J hydration ~ ( ^^ hydration )c + 8 p ( ^J hydration )c
where the reaction center AG (in this case formaldehyde) is perturbed by appended molecular
structure. The perturbation was further factored into mechanistic components such as:
O p \ ^^ hydration )c ~ & ele ^^ hydration & res ^^ hydration & steric ^^ hydration "•
From structure theory of organic chemistry, it is known that nucleophilic addition reactions
across TT bonds are sensitive to inductive and steric effects from atoms contiguous to the n group.
Also, it is known that functional groups containing non-bonded electrons (-OH, -OCH3, -NR.2)
attached to the base carbon will prohibit hydration (via induction and resonance). With this model,
we can confirm the failure of esters, amides, urea, and carboxylic acids to hydrate, and can project
other structures that are readily hydrated. The biggest perturbations were found to be direct field
effects (increase), sigma induction (decrease), resonance (decrease) and steric (decrease).
The literature was scoured for measurements of carbonyl hydration. Surprisingly, hydration
data for only 37 molecules have been reported in the literature. However, the chemistry represented
in these 37 structures was considerably varied, and the data set displayed large effects for all the
SPARC mechanistic models. We feel that the basic SPARC models for resonance, field effects
(direct and indirect), differential electronegativity and steric environment were well represented by
these 37 molecules. The extensive work done in SPARC pKa and property modeling provided all
the needed parameters for the substituents. Although the hydration data set was very limited, only
four new parameters were needed to describe the hydration constants for these molecules. These
four were the hydration susceptibility of the reaction center to a) resonance effects, b) field effects,
131
-------
c) steric occlusion and d) the effective electronegativity of the reaction center (C=O) in gauging
sigma inductive effects. The available data were least squares fitted using these parameters, and the
log hydration constants were estimated to better than 0.3 log pKa units as shown in Figure 34. This
predictability is about as good as the experimental error.
For imines, several compounds are known to readily hydrate, but we have found no
measured hydration constants in the literature. Most aliphatic imines readily degrade in water. It is
reasonable to assume that hydration may be involved in these processes. Imines that are stable in
water are highly sterically hindered, which blocks any potential hydration. Imine-like structures
within aromatic rings are known to form stable hydrates. For example, quinazoline (and several
quinazoline derivatives) are known to form stable hydrates in the cationic form. This is readily
apparent from the increased observed basicities for these compounds. The 'observed pKa's' for
these compounds are really mixed constants, representing concurrent protonation and hydration.
H+
OH
The number of quinazoline and pteridine hydration constants found in the literature was much
greater than that for the carbonyl. For these aromatic molecules, several researchers employed
stopped-flow techniques and pH jump experiments to sort out the individual components in the
observed mixed constant pKa measurements. For example, the pKa constants for direct protonation
of quinazoline was measured to be 1.8, very close to the SPARC calculated value of 1.9. The same
approach used in SPARC to model the hydration of C=O was used to model quinazoline and
pteridine hydration. These models were further tested by calculating the pKa of the hydrated form
132
-------
and comparing them to the values inferred from pH jump experiments. For example, the pKa of the
hydrated form of quinazoline was measured to be -7.5 and the SPARC-calculated pKa was 7.0.
This agreement is good considering the difficulties and assumptions made in both measurement and
calculation. Both the measurement and calculation were complicated by the possibility of
tautomeric conversion. The observed quinazoline and pteridine hydration constants were compared
to the SPARC-calculated values using the models described above. The RMS deviation was 0.43 as
shown in Figure 35. Again, the prediction errors are on the order of the experimental error. The
hydration models are now fully integrated into the SPARC calculation system and available for use.
5.6. Process Integration
For chemicals that can speciate or exist in multiple forms (ions, zwitterions, tautomers,
hydrates), observed chemical behavior may reflect integration over several discrete chemical
species or processes. It is convenient to designate as 'macro' macroscopic/observed equilibrium or
kinetic constants, and designate as 'micro', a constant for a single species or speciation event (which
may or may not be resolved experimentally). As an example in ionization, a micro constant would
describe the loss or gain of a proton at a specific site whereas a macro constant may involve poly-
protonic events relating to (1) loss or gain of protons from different sites on separate molecules that
are integrated in the measurement, or (2) synchronous loss/gain of protons from different sites on
the same molecule resulting in one unit change in total charge (e.g., gain of one and loss of two
protons). These concepts will be utilized in the following discussion on tautomeric equilibria.
133
-------
3
2 H
T3 A
-------
tautomeric and ionization events in order to generate these synchronous processes. In the case of
synchronous ionization and tautomerization, the SPARC system first calculates all possible neutral
tautomers. This process is currently a rule-driven search for possible tautomeric flips. The current
system analyzes the molecule and generates all possible neutral tautomers starting with the
molecule entered by the user. Once all the possible tautomeric forms have been identified and their
SMILES representations generated, the system proceeds to estimate each form's abundance relative
to the form entered by the user. The system currently moves 'a bond at a time', and keeps track not
only of the final species but the molecular path to get to that form as sequential tautomeric flips
occur. The tautomeric calculator uses a combination of the pKa calculator and the physical property
calculator to generate an estimate of the tautomeric equilibrium constant.
The sum of the reactions in Figure 36 leads to the pKtaut for a particular bond flip. There
may be further tautomerization possible out of this state. An equation similar to that for sequential
ionization was developed for all possible neutral tautomeric forms. The k's in the equation for
sequential ionization were replaced with ktautomer and the relative abundances of each form
calculated. The assigned relative weight for the original starting structure is 1, and to each of the
tautomers a weight T;. In order to capture synchronous ionization and tautomeric events, the
system then does a full speciation calculation for each possible neutral tautomer similar to that for
ionization previously via the following equation:
£ kt [H+]L> £ ^ k< k,[H + JL« 2S- S *'• kV" **••* [H + ]L^
'' '*'• **"•-
T 0! I! 2! N!
where DT; is the sum of the relative concentrations of all ionized species with the molecular
structure of the T;111 tautomer. The fraction of any particular ionization state at a given pH is
135
-------
expressed as one of the individual terms in the above equation divided the weighted sum of all DT;
givinby:
#tautomers
U total ~ 2-1 'Ti
i=l
where T; is the fraction of the ith tautomer relative to starting structure. For example, the fraction of
the starting molecule at a given pH would be simply l/Dtotai and the fraction of the 1th tautomer
compound having only its fh state ionized would be T; • kj'[H+]L^/Dtotai etc. The need to develop
intelligent filters were extremely important since the number of calculations grows geometrically
with both the number of ionizable sites and the number of possible tautomeric forms. The neutral
tautomeric relative concentration cannot be the only factor. For the simple simultaneous
ionization/tautomerization scheme shown above, the neutral endo form is predominant (-100/1). In
this case, the basicity of the exo form (pKa -11) drives the equilibrium and stabilizes the tautomeric
form as a cation. The observed apparent Ka (pKa -8) is the product of the tautomeric Ktam and the
pKa of the tautomeric form. Incorporation of tautomeric re-arrangements is now fully implemented
in the SPARC system and is available for use.
5.8. Conclusion
SPARC estimates gas phase and non-aqueous ionization pKa and Ei/2 chemical
reduction potential (only for electron withdrawing group) in any solvent within experimental
error. Hydration and tautomeric constants also can be calculated using the same physical and
chemical models. Further testing and refining of the SPARC models for these properties is
needed. Integration of speciation, tautomer and hydration models are underway at this time.
136
-------
•NH , H
pKa in water
Liquid to gas (- Henry's constant)
N
Gas phase rearrangement (assumed
zero energy)
:NH
N
:NH
Gas to liquid (Henry's constant)
+
rf
N
-pKa in water
V
-NH,
N
H Final tautomeric transformation in water
N
Figure 36. The thermodynamic cycle for the tautomerization of methyl-H-Indol-2-amine
137
-------
6. MODEL VERIFICATION AND VALIDATION
In chemistry, as with all physical sciences, one can never determine the "validity" of
any predictive model with absolute certainty. This is a direct consequence of the empirical
nature of science. Because SPARC is expected to predict reaction parameters for processes for
which little data exists, "validity" must drive the efficiency of the models constructs in
"capturing" or reflecting the existing base of chemical reactivity. In every aspect of SPARC
development, from choosing the programming environment to building model algorithms or
rule bases, system validation and verification were important criteria. The basic mechanistic
models in SPARC were designed and parameterized to be portable to any type of chemistry or
organic chemical structure. This extrapolatability impacts system validation and verification in
several ways. First, as the diversity of structures and the chemistry that is addressable
increases, so does the opportunity for error. More importantly, however, in verifying against
the theoretical knowledge of reactivity, specific situations can be chosen that offer specific
challenges. This is important when verifying or validating performance in areas where existing
data are limited or where additional data collection may be required. Finally, this expanded
prediction capability allows one to choose, for exhaustive validating, the reaction parameters
for which large and reliable data sets do exist to validate against.
Hence, in SPARC, the experimental data for physicochemical properties (such as
boiling point) are not used to develop (or directly impact) the model that calculates that
particular property. Instead, physicochemical properties are predicted using a few models that
quantify the underlying phenomena that drive all types of chemical behavior (e.g., resonance,
electrostatic, induction, dispersion, H-bonding interactions, etc.). These mechanistic models
were parameterized using a very limited set of experimental data, but not data for the end-use
138
-------
properties that will subsequently be predicted. After verification, the mechanistic models were
used in (or ported to) the various software modules that calculate the various end-use properties
(such as boiling point). It is critical to recognize that the same mechanistic model (e.g., H-
bonding model) will appear in all of the software modules that predict the various end-use
properties (e.g., boiling point) for which that phenomenon is important. Thus, any comparison
of SPARC-calculated physicochemical properties to an adequate experimental data set is a true
model validation test - there is no training (or calibration) data set in the traditional sense for
that particular property. The SPARC models have been validated on more than 10,000 data
points as shown in Table 14.
7. TRAINING AND MODEL PARAMETER INPUT
All quantitative chemical models requires, at some point, calibration or parameterization.
The quality of computational output necessarily reflects the quality of the calibration parameters.
For this reason, a self-training complement (TRAIN) to SPARC was developed. Although a
detailed description of TRAIN will not be given at this time, the following is a general review. For
a given set of targeted model parameters, the program takes initial "guesstimates" (and the
appropriate boundary constraints) together with a set of designated training data and provides an
optimizes set of model parameters. TRAIN cycles once or iteratively through Jacobian optimization
procedure that is basically a non-linear, least square matrix method. TRAIN sets up and executes
the optimization specifics according to user prescription.
8. QUALITY ASSURANCE
A quality assurance (QA) plan was developed to recalculate all the aforementioned
physical and chemical properties and compare each calculation to an originally-calculated-value
stored in the SPARC databases. Every quarter, two batch files that contain more than 3000
139
-------
compounds (4200 calculations) recalculate various physical/chemical properties. QA software
compares every single "new" output to the SPARC originally-calculated-value dated back to 1993-
1999. In this way, we ensure that existing parameter models still work correctly after new
capabilities and improvements are added to SPARC. This also ensures that the computer code for
all property and mechanistic models are fully operational.
9. SUMMARY
SPARC estimates numerous physical and chemical properties for a wide range of organic
compounds strictly from molecular structure. SPARC physical property and chemical reactivity
models have been rigorously tested against all available measurement data found. These data
cover a wide range of reaction conditions to include solvent, temperature, pressure, pH and ionic
strength. The diversity and complexity of the molecules used in the tests during the last few years
were drastically increased in order to develop more robust models. For simple structures SPARC
can predict the properties of interest within a factor of 2 or even better. For complicated structures,
where hydrogen bond and/or dipole interactions are strong, SPARC can estimate a property of
interest within a factor of 3-4 depending on the type of property.
The strength of the SPARC calculator is its ability to estimate the property of interest for
almost any molecular structure within an acceptable error, especially for molecules that are
difficult to measure. However, the real test of SPARC does not lie in testing the predictive
capability for pKa's, vapor pressure or activity coefficient but is determined by, the extrapolatabi-
lity of these models to other types of chemistry.
140
-------
For chemical reactivity models:
The ionization pKa models in water have been extended to calculate many other properties
to include:
1. Estimation of the thermodynamic microscopic ionization constants of molecules with
multiple ionization sites, zwttierionic constants and the corresponding complex speciation
as a function of pH and the isoelectric points in water.
2. Estimation of gas phase electron affinity.
3. Estimation of ester hydrolysis rate constants as function of solvents and temperature.
For physical property models
The vapor pressure and the activity coefficient models have been extended to calculate
many other properties using the solute-solute and solute-sol vent models without any
modifications to any of these models or any extra parameterization to include:
4. The SPARC self-interactions (solute-solute) model can predict the vapor pressure within
experimental error for a wide range of molecular structures over a wide range of
measurements. This model has been extensively tested on boiling points, heat of
vaporization and diffusion coefficients.
5. The solute/solvent interactions model can predict the activity coefficient within
experimental error for a wide range of molecular structures in any solvent. This model was
extended and tested on solubilities, partition coefficients (liquid/liquid, liquid/solid,
gas/liquid) and GC/LC chromatographic retention times in any single or mixed solvent
systems at any temperature.
141
-------
For Coupled physical property and chemical reactivity models:
Henry's constant for charge and neutral molecules and chemical reactivity models were
coupled and extended to calculate many other properties:
6. lonization pKa in the gas phase and in non-aqueous solutions.
7. Thermodynamic microscopic ionization, zwitterionic, hydration, and tautomeric
equilibrium constant in water or any other solvent.
8. Ei/2 chemical reduction in water and in many other solvent systems.
SPARC is online and can be used at : //ibmlc2. .
142
-------
10. REFERENCES
1. S. W. Karickhoff, V. K. McDaniel, C. M. Melton, A. N. Vellino, D. E. Nute, and L. A.
Carreira. US. EPA, Athens, GA, EPA/600/M-89/017.
2. M. M. Miller, S. P. Wasik, G. L Huang, W. Y. Shiu and D. Mackay. Environ. Sci. &
Technol. 19 522 1985
3. W. J. Doucette and A.W. Andres. Environ. Sci. Technol. 21 821 1987
4. R. F. Rekker, The Hydrophobic Fragment Constant, Elsevier, Amsterdam, Netherlands
1977.
5. S. Banerjee, S. H. Yalkowski and S. C. Valvani. Environ. Sci. andToxicol. 14 1227 1980.
6. M. J. Kamlet, R. M. Dougherty, V. M. Abboud, M. H. Abraham and R.W. Taft.
J. Pharm. Sci. 75(4) 338 1986
7. W. J. Lyman, W. E. Reehl and D. H. Rosenblatt. Handbook of Chemical Property
Estimation Methods: Environmental Behavior of Organic Chemicals. McGraw-Elill,
NewYork,NY. 1982.
8. G. Shuurmann. Quant. Struct. ActRelat. 9 326 1990.
9. A. J. Leo. Structure Activity Correlations in Studies ofToxicity and Bio-concentrations with
Aquatic Organisms. (Veith,G.D.,ed.), International Joint Commission, Windsor, Ontario
1975.
10. D. MacKay, A. Bobra, W. Y. Shiu and S. H. Yalkowski. Chemosphere, 9 701 1980.
11. R. G. Zepp. Handbook of Environmental Chemistry (Hutzinger,O.,ed.), vol.2(B)
Springer-Verlag, New York, NY, 1982.
12. R. G. Zepp and D. M. Cline. Environ Sci & Techno 11 359 1977.
143
-------
13. N. L. Wolfe, R.G. Zepp, J. A. Gordon, G. L. Baughman and D. M. Cline. Environ Sci &
Tecnoll 88 1977.
14. J. L. Smith, W. R. Mabey, N. Bohanes, B. B. Hold, S.S. Lee, T.W. Chou, D.C. Bomberger
and T. Mill. Environmental Pathways of Selected Chemicals in Fresh Water Systems: Part
11, U.S. Environmental Protection Agency, Athens, GA 1978.
15. H. Drossman, H. Johnson and T. Mill. Chemoshpere 17(8) 1509 1987.
16. S. W. Karickhoff, V. K. McDaniel, C. M. Melton, A. N. Vellino, D. E. Nute, and L. A.
Carreira., Environ. Toxicol. Chem. 10 1405 1991.
17. S. H. Hilal, L. A. Carreira, C. M. Melton and S. W. Karickhoff., Quant. Struct. Act. Relat.
123891993.
18. S. H. Hilal, L. A. Carreira, C. M. Melton, G. L. Baughman and S. W. Karickhoff, J. Phys.
Org. Chem. 7, 122 1994.
19. S. H. Hilal, L. A. Carreira and S. W. Karickhoff, Quant. Struct. Act. Relat, 14
348 1995.
20. S. H. Hilal, L. A. Carreira, S. W. Karickhoff, M. Rizk, Y. El-Shabrawy and N. A. Zakhari,
Talanta 43 607 1996.
21. S.H Hilal, L.A. Carreira and S. W. Karickhoff, "Theoretical and Computational
Chemistry, Quantitative Treatment of Solute/Solvent Interactions", Eds. P. Politzer and J. S.
Murray, Elsevier Publishers, 1994.
22. S. H. Hilal, L. A. Carreira and S. W. Karickhoff, Talanta, 50 , 827 1999.
23. S.H. Hilal, L. A. Carreira, C. M. Melton and S. W. Karickhoff, J. Chromatogr., 269 662
1994.
24. S. H. Hilal, L. A. Carreira and S. W. Karickhoff, Quant. Struct. Act. Relat., In Press.
144
-------
25. S. H. Hilal, J.M Brewer, L. Lebioda and L.A. Carreira, Biochem. Biophys. Res. Com., 607
211 1995
26. S. H. Hilal, L. A. Carreira, and S. W. Karickhoff., To be submitted
27. J. E. Lemer and E. Grunwald. Rates of Equilibria of Organic Reactions., John Wiley &
Sons, New York, NY., 1965
28. Thomas H. Lowry and Kathleen S. Richardson, Mechanism and Theory in Organic
Chemistry. 3ed ed. Harper & Row, New York, NY, 1987
29. R. W. Taft, Progress in Organic Chemistry, Vol.16, John Wiley & Sons, New York, NY,
1987.
30. L. P. Hammett, Physical Organic Chemistry, 2nd ed. McGraw Hill, New York, NY., 1970.
31. M. J. S. Dewar and R. C. Doughetry, The PMO Theory of Organic Chemistry. Plenum
Press, New York, NY, 1975.
32. M. J. S. Dewar, The Molecular Orbital Theory of Organic Chemistry, McGraw
Hill, New York, 1969.
33. C. Lim, D. Bashford, and M. Karplus, J. Phys. Chem., 95 5610 1991.
34 W. L. Jorgensen and J. M. Briggs, J. Am. Chem Soc., 111 4190 1989.
35. C. Griiber, and V. Buss, Chemosphere, 19 1595 1989.
36. K. Ohta., Bull. Chem. Soc.jpn., 65 2543 1992.
37. G. Klopman, Quant. Struct. Act. Relat., 11 176 1993.
38. G. Klopman, D. Fercu, J. Comp Chem., 15 1041 1994.
39. A. E. Martell and R. J. Motekaities, The determination and use of stability constants,
Weinhiem, New York, VCH publisher, 1988.
40. R. E. Benesch and R. Benesch, J. Am. Chem. Soc., 5877 77 1955.
41. M. A. Grafius and J. B. Neilands , J. Am. Chem. Soc., 3389 77 1955.
145
-------
42. R. B. Martin, J. Phys. Chem., 2657 75 1971.
43. R. B. Martin, J. T. Edsall, D. B. Wetlaufer and B.R. Hollingworth, J. Biol. Chem., 1429
233 1958.
44. T. L. Hill, J. Phys. Chem., 101 48 1944.
45. J. T. Edsall and J. Wyman, J., Biophysical Chemistry, Vol. 1, Academic Press Inc. New
York, 1958.
46. A. L. McClellan, Tables of Experimental Dipole Moments. W.H. Freeman and Co.,
London, 1963.
47. J. Dykyj, M. Repas and J. Anmd Svoboda. Vapor Pressure of Organic Substances. VEDA,
Vydavatel' stvo, Slovenskej Akademie Vied, Bratislava, 1984.
48. K. A. Sharp, A. Nicholls, R. Friedman and B. Honig, Biochemistry, 30 9686 1991.
49. P. J. Flory, Chem. Phys., 10 51 1942.
50. M. L. Huggins, J. Am. Chem. Soc., 64 1712 (1942)
51. R. C. Reid, J. M. Prausnitz and J. K. Sherwood, The Properties of Gasses and
Liquids, 3ed ., McGraw-hill book Co., 1977
52. G. Tarjan, I. Timar, J. M. Takacs, S. Y. Meszaros, Sz. Nyiredy, M. V. Budahegyl, E. R.
Lombosi and T. S. Lombosi., J. Chromatogr., 271 213 (1982)
53. J. K. Haken and M. B. Evans., J. Chromatogr., 472 93 (1989).
54. E.sz. Kovats, Adv. Chrommatogr., 1 31A(1965).
55. L. S. Anker and P. C. Jurs., Anal. Chem., 62 2676 (1990).
56. G. Schomburg and G. Dielmann., J. Chromatogr. Sci. 11151 (1973).
57. N. Dimov, J. Chromatogr., 347 366 (1985).
58. D. Papazova and N. Dimov, J. Chromatogr., 356 320 (1986).
146
-------
59. C. R. Wilke and C. Y. Lee, Ind Eng. Chem., 47 1253 (1955)
60 P. D. Neufeld, A.R. Janzen, and R. A. Aziz, J. Chem. Phys. 57 1100 (1972)
61. R. A. Larson and EJ. Weber, Reaction mechanisms in Environmental Organic Chemistry".,
Chelsea, MI, Lewis Publishers 1994.
62. E. J. Weber and R. L Adams, Environ. Sci. Technol, 29 1163 1995.
63. T. M. Vogel, C. S. Criddle and P.L. McCarty, Environ. Sci. Technol. 21 (8) 722 1987.
64. J. Saveant, Adv. Phys. Org., 26 1 1990.
65. S. W. Karickhoff and J. MacArthur long, US EPA, Internal Report, Athens, GA.
66. Haim Shalev and Dennis Evans, J. Am. Chem. Soc., 1117 1989.
147
-------
11. GLOSSARY
1. SPARC = SPARC Performs Automated Reasoning in Chemistry
2. EA= Electron Affinity
3. SAR= Structure Activity Relationships
4. QSAR = Quantitative structure Activity Relationship
5. LFER= Linear Free Energy Relationships
6. PMO = Perturbed Molecular Orbital Theory
7. IUPAC = International Union of Pure and Applied Chemistry
8. MO = Molecular Orbital
9. LUMO = Lowest Unoccupied Molecular Orbital
10. HOMO = Highest Occupied Molecular Orbital
11. NBMO = Non Bonded Molecular Orbital
12. C = Reaction Center
13. P= Perturber
14. S = Substituent
15. R = Molecular conductor connecting S to C
16. RK= A rigid fully conjugated n structure (such as benzene)
17. C;= Initial state
18. Cf = Final state
19. Aqc = Fraction of NBMO charge
20. v = Solid angle occluded by P
21. S; = Reduction factor for steric blockage
22. Ac, Bc = Entropic and the enthalpic van't Hoff coefficients of C, respectively.
148
-------
23. (pKa)c, EAC = pKa and EA of the reaction center (reference point), respectively.
24. 5p(pKa)c, 5P(EA)C = Change in the pKa and EA due to P, respectively.
25. kzw = Zwitterionic ionization constant
26. K, k = Macroscopic and microscopic equilibrium constants, respectively
27. D; = Fraction of the 1th microscopic species
28. DT= Sum of the relative concentration of all the ionizes species
29. T; = Fraction of the 1th tautomer species
30. N = Number of the ionizable sites in a molecule
31. N; = Number of the electrons in fragment i
32. NI = Total number of the microconstants
33. IS = Number (
-------
44. Er = Substituent resonance strength
45. a = Proton donating site
46. |3 = Proton accepting site
47. NB = Data-fitted parameter that depends on number of the substituents that are bonded
directly to the reaction center for sigma induction
48. 0,1= Average molecular polarizability
49. P;d = Effective polarizability density of molecule i
50. Dd; = Effective dipole density of molecule i
51. P; = Susceptibility to a mechanistic mechanism
52. V; = Molar volume
53. (j,; = Effective microscopic dipole
54. AdiSp = Polarizability adjustment for dispersion
55. Aind = Polarizability adjustment for induction
56. CN = Carbon number
57. KI= Kovats index
58. UI0 = Kovats index at 0° C
59. ni; = Monopole density
60. Rm = Ratio of the molecularites of the two phases
61. RMS = Root Mean Square
150
-------
12.
APPENDIX
Summary of usage of the SPARC-web version
Two months back-to-back report, which represents the usage of the SPARC calculator in October
and November, 2002. November was the highest while October was the lowest usage to date.
Summary of Activity for Report
October 2002
November 2002
Hits Entire Site (Successful) 56,875
Average Number of Hits per day on Weekdays
2,153
Average Number of Hits for the entire Weekend
1,297
Most Active Day of the Week Thu
Least Active Day of the Week Sat
Most Active Day Ever October 24, 2002
Number of Hits on Most Active Day 4,963
Least Active Day Ever October 05, 2002
Number of Hits on Least Active Day 7
URL's of most active users
207.168.147.52463
pl20xl83.tnrcc.state.tx.us 3,986
141.189.251.71,720
198.137.21.14455
57.67.16.50327
gateway.huntingdon.com 6,823
aries.chemie.uni-erlangen.de 1,487
pl20x226.tnrcc.state.tx.us 67
thompson.rtp.epa.gov 413
webcache.crd.GE.COM 143
Hits Entire Site (Successful) 95,447
Average Number of Hits per day on Weekdays
4,146
Average Number of Hits for the entire Weekend
842
Most Active Day of the Week Wed
Least Active Day of the Week Sun
Most Active Day Ever November 13, 2002
Number of Hits on Most Active Day 15,450
Least Active Day Ever November 02, 2002
Number of Hits on Least Active Day 7
URL's of most active users
141.189.251.71,223
gw.bas.roche.com 1,821
gateway.huntingdon.com 3,729
pl20xl83.tnrcc.state.tx.us 737
hwcgate.hc-sc.gc.ca 660
pl20x226.tnrcc.state.tx.us 379
thompson.rtp.epa.gov 563
chen.rice.edu 966
151
------- |