.sfe.
                                       EPA/600/R-16/058

              https://www. epa.gov/chemical-research/toxicity-estimation-software-tool-test
User's Guide for T.E.S.T. (version 4.2)

(Toxicity Estimation Software Tool)
A Program to Estimate Toxicity from Molecular Structure
                              .
                               01

                              11010
           100
   ©2016 U.S. Environmental Protection Agency

-------
           User's Guide for T.E.S.T.

    (Toxicity Estimation Software Tool)


                        by
                     T. Martin
      U.S. EPA/National Risk Management Research
Laboratory/Sustainable Technology Division, Cincinnati, OH
                       45268
            Sustainable Technology Division
     National Risk Management Research Laboratory
               Cincinnati, Ohio, 45268

-------
The U.S. Environmental Protection Agency, through its Office of Research and
Development, funded and conducted the research described herein under an approved
Quality Assurance Project Plan (Quality Assurance Identification Number S-14987-QP-
1-0). It has been subjected to the Agency's peer and administrative review and has been
approved for publication as an EPA document. Mention of trade names or commercial
products does not constitute endorsement or recommendation for use.

-------
The U.S. Environmental Protection Agency (US EPA) is charged by Congress with
protecting the Nation's land, air, and water resources. Under a mandate of national
environmental laws, the Agency strives to formulate and implement actions leading to a
compatible balance between human activities and the ability of natural systems to support
and nurture life. To meet this mandate, US EPA's research program is providing data and
technical support for solving environmental problems today and building a science
knowledge base necessary to manage our ecological resources wisely, understand how
pollutants affect our health, and prevent or reduce environmental risks in the future.

The National Risk Management Research Laboratory (NRMRL) within the Office of
Research and Development (ORD) is the Agency's center for investigation of
technological and management approaches for preventing and reducing risks from
pollution that threaten human health and the environment.  The focus of the Laboratory's
research program is on methods and their cost-effectiveness for prevention and control of
pollution to air, land, water, and subsurface resources; protection of water quality in
public water systems; remediation of contaminated sites, sediments and ground water;
prevention and control of indoor air pollution; and restoration of ecosystems. NRMRL
collaborates with both public and  private sector partners to foster technologies that reduce
the cost of compliance and to anticipate emerging problems. NRMRL's research provides
solutions to environmental problems by: developing and promoting technologies that
protect and improve the environment; advancing scientific and engineering information
to support regulatory and policy decisions; and providing the technical support and
information transfer to ensure implementation of environmental regulations and strategies
at the national, state, and community levels.

-------
This guide provides an introduction into QSAR (Quantitative Structure Activity
Relationship) models, a detailed description of the QSAR methodologies in TEST, a
description of the experimental datasets, a detailed analysis of the validation results for
the external test sets, and step-by-step instructions for using the software.

-------
Table of Contents
Notice/Disclaimer	2
Foreword	3
Abstract	4
1.   Introduction	8
  1.1.               Toxicity Endpoints	8
  1.2.               QSAR Methodologies	9
2.   THEORY	12
  2.1.               Molecular Descriptors	12
  2.2.               QSAR Methodologies	12
    2.2.1.           Hierarchical Clustering	12
    2.2.2.           FDA Method	17
    2.2.3.           Single model	18
    2.2.4.           Group contribution	18
    2.2.5.           Nearest neighbor	19
    2.2.6.           Mode of action	19
    2.2.7.           Consensus	19
  2.3.               Validation Methods	20
    2.3.1.           Statistical external validation	20
3.   EXPERIMENTAL DATA SETS	21
  3.1.               96 hour fathead minnow LC50 data set	21
  3.2.               48 hour Daphnia magna LC5o data set	22
  3.3.               40 hour Tetrahymena pyriformis IGC5o data set	22
  3.4.               Oral rat LD50 data set	22
  3.5.               Bioconcentration factor data set	23
  3.6.               Developmental toxicity data set	23
  3.7.               Ames mutagenicity data set	23
  3.8.               Normal boiling point	23
  3.9.               Density	24
  3.10.              Flashpoint	24
  3.11.              Thermal conductivity	24

-------
  3.12.              Viscosity	24
  3.13.              Surface tension	25
  3.14.              Water solubility	25
  3.15.              Vapor pressure	25
  3.16.              Melting point	25
4.   VALIDATION RESULTS	25
  4.1.               96 hour fathead minnow LC5o	25
     4.1.1.           Statistical External Validation	26
     4.1.2.           Statistical External Validation for mode of action method	28
  4.2.               48 hour Daphnia magna LC50	28
     4.2.1.           Statistical External Validation	28
  4.3.               Tetrahymena pyriformis 50% growth inhibitory concentration (IGC5o)	29
     4.3.1.           Statistical External Validation	29
  4.4.               Oral rat LD50 dataset	30
     4.4.1.           Statistical External Validation	30
  4.5.               ^ioaccumulation factor (BCF)	32
     4.5.1.           Statistical External Validation	32
  4.6.               Developmental toxicity	33
     4.6.1.           Statistical External Validation	33
  4.7.               Ames mutagenicity	34
     4.7.1.           4.7.1. Statistical External Validation	34
  4.8.               Normal boiling point	34
     4.8.1.           Statistical External Validation	34
  4.9.               Density	35
     4.9.1.           Statistical External Validation	35
  4.10.              Flashpoint	37
     4.10.1.          Statistical External Validation	37
  4.11.              Thermal conductivity	38
     4.11.1.          Statistical External Validation	38
  4.12.              Viscosity	39
     4.12.1.          Statistical External Validation	39
  4.13.              Surface tension	40
     4.13.1.          Statistical External Validation	40

-------
  4.14.              Water solubility	41



    4.14.1.          Statistical External Validation	41



  4.15.              Vapor pressure	42



    4.15.1.          Statistical External Validation	42



  4.16.              Melting point	43



    4.16.1.          Statistical External Validation	43



5.   USING THE SOFTWARE	44



  5.1.                Importing a single compound	44



    5.1.1.            Drawing a molecule using the structure drawing tool	44



    5.1.2.            Importing a molecule from an MDL molfile	44



    5.1.3.            Import a molecule from a SMILES string	44



    5.1.4.            Import from the structure database	45



  5.2.                Importing multiple compounds (batch import)	45



    5.2.1.            Importing from a MDLSDfile	46



    5.2.2.            Importing from a list of CAS numbers	46



    5.2.3.            Importing from a list of SMILES strings	47



    5.2.4.            Editing a chemical in the batch list	47



    5.2.5.            Adding chemicals to the batch list	48



    5.2.6.            Deleting chemicals from the batch list	48



    5.2.7.            Saving the batch list	48



    5.2.8.            Closing the batch list	48



  5.3.                Performing toxicity predictions	49



  5.4.                Interpretation of results	50



Bibliography	57

-------
    i5 SS-1*S)!S*
    11II
Quantitative Structure Activity Relationships (QSARs) are mathematical models that are used to
predict measures of toxicity from physical characteristics of the structure of chemicals (known as
molecular descriptors). Acute toxicities (such as the concentration, which causes half of fish to
die) are one example of toxicity measures, which may be predicted from QSARs. Simple QSAR
models calculate the toxicity of chemicals using a simple linear function of molecular
descriptors:

                                 Toxicity = axl + bx2 + c

where xi and X2 are the independent descriptor variables and a, 6, and c are fitted parameters. The
molecular weight and the octanol-water partition coefficient are examples of molecular
descriptors.

QSAR toxicity predictions may be used to screen untested compounds in order to establish
priorities for expensive and time-consuming traditional bioassays designed to establish toxicity
levels. When conditions do not permit traditional bioassays, QSARs are an alternative to
bioassays for estimating toxicity.  In addition, QSAR models are useful for estimating toxicities
needed for green process design algorithms such as the Waste Reduction Algorithm *.

The Toxicity Estimation Software Tool (T.E.S.T.) has been developed to allow users to easily
estimate toxicity using a variety of QSAR methodologies. T.E.S.T allows a user to estimate
toxicity without requiring any external programs. Users can input a chemical to be evaluated by
drawing it in an included chemical sketcher window, entering a structure text file, or importing it
from an included database of structures. Once a chemical has been entered, its toxicity can be
estimated using one of several advanced QSAR methodologies. The program does not require
molecular descriptors from external software packages (the required descriptors are calculated
within T.E.S.T.).
          T.E.S.T allows you to estimate the value for several toxicity end points:
          1.   96 hour fathead minnow LCso (concentration of the test chemical in water in
              mg/L that causes 50% of fathead minnow to die after 96 hours)
          2.   48 hour Daphnia magna LCso (concentration of the test chemical in water in
              mg/L that causes 50% of Daphnia magna to die after 48 hours)
          3.   48 hour Tetrahymena pyriformis IGCso (concentration of the test chemical in
              water in mg/L that causes 50% growth inhibition to Tetrahymena pyriformis after
              48 hours)
          4.   Oral rat LDso (amount of chemical in mg/kg body weight that causes 50% of rats
              to die after oral ingestion)
          5.   Bioaccumulation factor (ratio of the chemical concentration in fish as a result of
              absorption via the respiratory surface to that in water at steady state)
          6.   Developmental toxicity (whether or not a chemical causes developmental toxicity

-------
             effects to humans or animals)
         7.  Ames mutagenicity (a compound is positive for mutagenicity if it induces
             revertant colony growth in any strain of Salmonella typhimurium)

         T.E.S.T. allows you estimate several physical properties:
         1.  Normal boiling point (the temperature in °C at which a chemical  boils at
             atmospheric pressure)
         2.  Density (the density in g/cm3)
         3.  Flash point (the lowest temperature in °C at which it can vaporize to form an
             ignitable mixture in air)
         4.  Thermal conductivity (the property of a material in units of mW/mK reflecting its
             ability to conduct heat)
         5.  Viscosity (a measure of the resistance of a fluid to flow in cP defined as the
             proportionality constant between shear rate and shear stress)
         6.  Surface tension (a property of the surface in dyn/cm of a liquid that allows it to
             resist an external force)
         7.  Water solubility (the amount of a chemical in mg/L that will dissolve in liquid
             water to form a homogeneous solution)
         8.  Vapor pressure (the pressure of a vapor in mmHg in thermodynamic equilibrium
             with its condensed phases in a closed system)
         9.  Melting point (the temperature in °C at which a chemical in the solid state
             changes to a liquid state)

1.2.   '>  .   /  • '•   :

         T.E.S.T allows you to estimate toxicity values using several different advanced
         QSAR methodologies 2:

         •   Hierarchical method: The toxicity for a given query compound  is estimated
             using the weighted average of the predictions from several different models. The
             different models are obtained by using Ward's method to divide the training set
             into a series of structurally similar clusters. A genetic algorithm based technique
             is used to generate models for each cluster. The models are generated prior to
             runtime.
         •   FDA method: The prediction for each test chemical is made using a new model
             that is fit to the chemicals that are most similar to the test compound. Each model
             is generated at runtime.
         •   Single model method: Predictions are made using a multilinear regression model
             that is fit to the training set  (using molecular descriptors as independent variables)
             using a genetic algorithm based approach. The regression model  is generated prior
             to runtime.
         •   Group contribution method: Predictions are made using a multilinear regression
             model that is fit to the training set (using molecular fragment counts as
             independent variables). The regression model is generated prior to runtime.
         •   Nearest neighbor method: The predicted toxicity is estimated by taking an
             average of the 3  chemicals in the training set that are most similar to the test

-------
   chemical.
•  Consensus method: The predicted toxicity is estimated by taking an average of
   the predicted toxicities from the above QSAR methods (provided the predictions
   are within the respective applicability domains).
•  Random forest method: The predicted toxicity is estimated using a decision tree
   which bins a chemical into a certain toxicity score (i.e. positive or negative
   developmental toxicity) using a set of molecular descriptors as decision variables.
   The random forest method is currently only available for the developmental
   toxicity endpoint. The random forest models for the developmental toxicity
   endpoint were developed by researchers at Mario Negri Institute for
   Pharmacological Research as part of the CAESAR project3.
•  Mode of action method: The predicted toxicity is estimated using a two-step
   process. In the first step the  mode of action is determined from the linear
   discriminant  analysis model with the highest score. In the second step the toxicity
   is estimated using the multilinear regression model corresponding to the predicted
   mode of action. The mode of action method is currently only available for the 96
   hour fathead minnow LCso endpoint.

T.E.S.T provides multiple prediction methodologies so one can have greater
confidence in the predicted toxicities (assuming the predicted toxicities are similar
from different methods). In addition, some researchers may have more confidence in
particular QSAR approaches based on personal experience. The QSAR
methodologies above are described in more detail in the Theory section. The
advantages and disadvantages of the different QSAR methods are given in  Table 1.2.

-------
Table 1.2. Advantages and disadvantages of the QSAR methods in I.E.ST.
Method
Hierarchical
Single model
Group contribution
FDA
Nearest neighbor
MOA
Consensus
Advantages
• Can produce more reliable
predictions since predictions are
made from multiple models
• Single transparent model can be
easily viewed/exported
• The model does not need to rely
on clustering the chemicals
correctly
• Single transparent model can be
easily viewed/exported
• Estimates of toxicity can be
made without using a computer
program
• Can generate a new model
based the closest analogs to the
test compound
• Always provides an external
prediction of toxicity
• Provides a quick estimate of
toxicity
• Allows one to determine
structural analogs for a given test
compound
• Always provide an external
prediction of toxicity
• Provides a more biologically
relevant estimate of acute
aquatic toxicity which provides
greater confidence in the
prediction fortoxicologists
• Was shown to achieve the best
prediction results during external
validation
Disadvantages
• Cannot provide external estimates of
toxicity for compounds in the training
set
• Since the model is fit to the entire
dataset it may incorrectly predict the
trends in toxicity for certain chemical
classes
• Cannot provide external estimates of
toxicity for compounds in the training
set
• The model doesn't correct for the
interactions of adjacent fragments
• Since the model is fit to the entire
dataset it may incorrectly predict the
trends in toxicity for certain chemical
classes
• Cannot provide external estimates of
toxicity for compounds in the training
set
• Predictions sometimes take longer
since it has to generate a new model
each time
• It does not use a QSAR model to
correlate the differences between the
test compound and the nearest
neighbors
• Was shown to achieve the worst
prediction results during external
validation
• Size of the training set is reduced
• Prediction error may be compounded
by the fact that the mode of action
must be predicted correctly
• Cannot provide external estimates of
toxicity for compounds in the training
set

-------
2.2.
         Molecular descriptors are physical characteristics of the structure of chemicals such
         as the molecular weight or the number of benzene rings. The overall pool of
         descriptors in the software contains 797 2-dimensional descriptors. The descriptors
         include the following classes of descriptors: E-state values and E-state counts,
         constitutional descriptors, topological descriptors, walk and path counts, connectivity,
         information content, 2d autocorrelation, Burden eigenvalue, molecular property (such
         as the octanol-water partition  coefficient), Kappa, hydrogen bond acceptor/donor
         counts, molecular distance edge, and molecular fragment counts. The complete list of
         descriptors and their sources from the literature are described in the Molecular
         Descriptors Guide.

         The descriptors were calculated using computer code written in Java. The basis of the
         molecular calculations was the Chemistry Development Kit 4. The Chemistry
         Development Kit (CDK) is a Java library for structural chemo- and bioinformatics 5.
         The descriptor values were validated using MDL  QSAR 6, Dragon 7, and Molconn-z
         8. The descriptor values were generally in good agreement (aside from small
         differences in the descriptor definitions for descriptors such as the number of
         hydrogen bond acceptors).
     2.2.1.

          The hierarchical clustering method utilizes a variation of Ward's Method 9 to
          produce a series of clusters from the training set. Clusters are subsets of chemicals
          from the overall set, which possess similar properties. An example of a hierarchical
          clustering for a hypothetical training set with five chemicals is given in Figure 2.2.1.

-------
                Figure 2.1.1. Hierarchical clustering with five chemicals

For a training set of n chemicals, initially there will be n clusters (each cluster
contains one chemical). The overall variance in the system at a given step /is defined
to be the sum of the variances of the individual clusters:

                                                                             (1)
                                    k=l
where v(kj) is the variance (in terms of the molecular descriptors) for cluster k at
step /:
                                                                             (2)
                                      '=1 7=1
where nk is the number of chemicals in the Mi cluster, Jis the number of descriptors
in the overall descriptor pool,  xi}. is the normalized descriptory for chemical /', and Cj
is the centroid or average value for descriptory for cluster k.

                                                                              (3)
Each step of the method adds two of the clusters together into one cluster so that the
increase in variance over all clusters in the system is minimized:
min AV(l + l) = V(l +1) - V(l) = v(k', I +1) - v(k,, 1) - v(k2,1)                    (4)

where clusters k^ and k2 join together at step/ to make cluster  k' at step/ + 1. The
process of combining clusters continues until all of the chemicals are lumped into a
single cluster.
After the clustering is complete, each cluster is analyzed to determine if an

-------
acceptable QSAR can be developed. Each cluster undergoes evaluation using a
genetic algorithm technique to determine an optimal descriptor set for characterizing
the toxicity values of the chemicals within that cluster. The maximum number of
descriptors allowed for a given cluster will be nk 15 because the recommended ratio
of compounds to variables should be at least 5 10' n for reasonably small probability
for chance correlations. The genetic algorithm used in this study was taken from the
Weka statistical package, version 3.5.1 12> 13.

The genetic algorithm is used to maximize the adjusted fivefold leave many out cross
validation coefficient (qldJLMO)'.
                    tfadj.LMO
where y.  and yexpf are the predicted and experimental toxicity values for chemical /',
jexp is the average experimental toxicity for the chemicals in the cluster, and/? is the
number of parameters in the model. The predicted toxicity values are calculated by
dividing the dataset into five folds (a fold is a subset of the training set). The
toxicities of the chemicals in each fold (yt) are predicted using a multiple linear
regression model fit to the chemicals in the other folds. The five fold q2 was used
instead of the traditional q2 LOO (leave-one-out) inside the genetic algorithm
because it yields a significant degree of computational savings for large cluster sizes.
The nk-p-\ term penalizes models that  include extra parameters that do not
significantly increase the  predictive power of the model (by decreasing the value of
During the optimization process the models are checked for outliers. A chemical is
determined to be an outlier if at least two statistical tests (e.g., DFFITS, leverage,
Cook's distance, and covariance ratio) indicate that the chemical represents an
influential data point and if the chemical represents an outlier in terms of the
studentized deleted residual 14. If a chemical is determined to be an outlier, the
chemical is deleted from the cluster and the genetic algorithm descriptor selection is
repeated. The process of model building via the genetic algorithm and outlier
removal is repeated until no outliers are detected in the optimized model. For binary
endpoints such as Ames mutagenicity, outliers were not removed because this had
the potential to produce clusters with all positive or all negative chemicals. In
addition the outlier statistical tests described above may not apply to binary
endpoints.

Once the iteration for the optimum model has been completed, the q2 LOO value for
the model is calculated.  If the q2 LOO is greater than or equal to 0.5, the model is
considered to be valid (see pg 67 of Erikkson et al. 15). If the q2 LOO is less than 0.5,
the model from the cluster is not used to make predictions for test compounds. For

-------
binary endpoints, the validity of a model is determined from the concordance LOO
instead of q2 LOO. Concordance is the fraction of all compounds that are predicted
correctly (i.e., experimentally active compounds that are predicted to be active and
experimentally inactive compounds that are predicted to be inactive).  If the
concordance LOO is greater than or equal to 0.8, the model is considered to be valid.
In addition both the leave-one-out sensitivity and specificity must be at least 0.5 to
avoid using models which are heavily biased to predict either active or inactive
scores. Sensitivity is the fraction of experimentally active compounds that are
predicted to be active. Specificity is the fraction of experimentally inactive
compounds that are predicted to be inactive.

The predicted toxicity ( y ) for a test  chemical is given by the weighted average for
all the valid predictions 16:
                              .
                             '   #valid cluste
where y j  and Wj are prediction and weight for they'th model and nvc is the number
of valid cluster model predictions. If the mean toxicity is given by the maximum
likelihood estimator of the mean of the probability distributions, the weight values
are given by 16
where sej is the standard error for they'th prediction given by
                             se  = Ja 2(l + /700)                             (8)

where ov  is given by
                                       - —                          (9)
                                        ~Pj~l
where «/• is the number of chemicals in cluster model y' and/?/ is the number of model
parameters for model y. hoo, the leverage for the test chemical, is given by
whereto is the vector of model descriptor values for the test compound. For binary
endpoints such as Ames mutagenicity, the predictions were made using equal
weighting of the individual predictions (i.e. Wj = 1 in equation 6) because weighting
by the standard error (see equation 7) did not improve the external prediction
accuracy.

The square of the standard deviation for the prediction from multiple models (
-------
      <*=*
            nvc
(ym,c
                           V             y   i/        y   i/
                          iij             ^ \  7se 2      ^  Ase 2
                          -7-1                7=1 v    v )    7=1 v/   7 y
The uncertainty (w ) in the overall prediction for the test chemical is given by
where t is the t-statistic, a = 0. 1 (90% confidence interval), and sej is the standard
error for they'th prediction. The prediction interval is obtained by adding and
subtracting the uncertainty from the predicted toxicity:
                            y — u < Toxicity 
-------
avoid situations where a chemical might have a similar backbone structure to the
chemicals in a given cluster but has a different functional group attached. For
example if a given cluster contained only short-chained aliphatic amines one would
not want to use it to predict the toxicity of ethanol. If a chemical contains a fragment
that is not present in the training set, the toxicity cannot be predicted. The fragment
constraint can be removed by checking the Relax fragment constraint checkbox.
For binary endpoints such as Ames mutagenicity,  the fragment constraint was not
employed since it did not improve the external prediction accuracy and decreased
the prediction coverage.

 In the current version of the software, the predictions are made using the closest
cluster from each step in the hierarchical clustering (in terms of the distance of the
chemical to the centroid of the cluster defined above). The rationale behind this
approach is that one would like to follow the hierarchical clustering process,
selecting the best model from each step. In order for the prediction from the model to
be used it must be statistically valid and meet the constraints defined above. If the
closest cluster for a given step does not have a statistically valid model (or violates
any of the constraints), no prediction is used from that step. If the closest cluster for a
given step in the clustering process is the same as the closest cluster from a previous
step, it is not used again in the prediction of toxicity.

 FDA

The Food and Drug Administration (FDA) method is based on the work of Contrera
and coworkers 18. In this method, predictions for each test chemical are made using a
unique cluster (constructed at runtime) which contains structurally similar chemicals
selected from the overall training set. This is in contrast to the Hierarchical method,
where the predictions are made using one or more clusters that were constructed a
priori using Ward's method.

Contrera and coworkers constructed the training cluster by selecting 15-20
chemicals,  which had at least a cosine similarity coefficient of 75% with the test
chemical. The cosine similarity coefficient, SCik, is given by
                                            ttdescriptois

                                                                           (16)
                                      ittdescriptois     #descriptos
                                                           2
where Xy is the value of they'th normalized descriptor for chemical /' (normalized with
respect to all the chemicals in the original training set) and % is the value of they'th
descriptor for chemical k. A multiple linear regression model is then built for the new
cluster using a genetic algorithm and the toxicity is predicted. The advantage of this
method is that the training cluster is tailored to fit the test chemical. In addition, the
test chemical is never present in the cluster model, which allows one to make
external predictions for training set chemicals. The disadvantage of this method is

-------
     that a new model has to be generated at runtime (which takes somewhat longer than
     computing the toxicity from preexisting models).

     In this version of the software, clusters are constructed using the 30 most similar
     chemicals from the training set in terms of the cosine similarity coefficient.
     However, a minimum similarity coefficient of 75% is not required for membership in
     the training cluster. Previously, it was determined that this constraint did not increase
     the predictive performance of the methodology 2. For a prediction to be valid, the
     cluster must not violate the model ellipsoid  and fragment constraints described
     above. In addition, the predicted toxicity value must be within the range of
     experimental toxicity values for the chemicals used to build the model. This
     additional constraint was added to avoid potentially erroneous predictions. However
     this constraint was not utilized for binary toxicity endpoints such as Ames
     mutagenicity since predicted values less than 0 or greater than 1 do not invalidate
     the prediction result.

     Again, for a cluster to have a valid predictive model, the LOO q2 must be at least 0.5.
     If the model for the cluster is invalid or the prediction violates one of the constraints,
     the cluster size is increased incrementally (up to a maximum of 75 chemicals) until a
     valid prediction can be made. If a prediction cannot be made using a cluster with 75
     chemicals, no prediction is made.



     In the single model approach, a single multiple linear regression model is fit to the
     entire training set. The model is generated using techniques and constraints similar to
     those for the hierarchical method (except that the training cluster contains the entire
     training set). The advantage  of this approach is that a simple transparent model can
     be developed which does not rely on clustering the chemicals correctly.  The
     disadvantage of this approach is that sometimes an overall model cannot correctly
     correlate the toxicity for every chemical class 19. For example the single model might
     be able to correctly describe the trend of linearly increasing toxicity for a series of
     normal alcohols (i.e. 1-propanol, l-butanol,l-pentanol, ...), but it may incorrectly
     describe the trend for a series of normal acids (i.e. propanoic acid, butanoic acid,
     pentanoic acid, ...) that does not increase linearly.

2,," '  -•:-

     The group contribution approach is based on the group contribution approach of
     Martin and Young 20. Fragment counts (such as the number of methyl and hydroxyl
     groups in a compound) are used to fit a multiple linear regression model to the entire
     data set.  A genetic algorithm approach is not used to reduce the number of
     parameters in the model because the approach tries to characterize the contribution
     from all the fragments  appearing in the training set. The only constraint on the
     fragments appearing in the final model is that there must be at least three molecules
     in the training set that contain each fragment. If a fragment appears less than three

-------
times in the training set, it is deleted from the list of fragments and all the chemicals
containing this fragment are removed from the training set. After the multiple linear
regression is performed, the model is checked for outliers. If outliers are detected,
they are removed and the regression is performed again. The process is repeated until
no more outliers are found.  Similar to the hierarchical methodology, predictions are
made using the model ellipse and fragment constraints.

The advantage of this approach is a single transparent model can be developed
whose descriptors can be determined from visual inspection of the molecular
structure of the test compound. The disadvantage of this approach is that it assumes
that the contribution of each fragment does not depend on the presence of nearby
fragments in the molecule.
In the nearest neighbor approach, the predicted toxicity is simply the average of the
toxicities of the three most similar chemicals (structural analogs) in the training set.
In order to make a prediction, each of the structural analogs must exceed a certain
minimum cosine similarity coefficient (SCmin). SCmin was set at 0.5 so that the
prediction coverage was similar to the other QSAR methods 2. The nearest neighbor
method provides a quick external estimate of toxicity (the test chemical is never
present in the selected set of analogs). The disadvantage of the nearest neighbor
method is that the structural differences between the test chemical and its structural
analogs are not accounted for.
In the mode of action (MOA) method, the toxicity is predicted using a two-step
process 21; 22. In the first step, the MOA is predicted using a series of linear
discriminant analysis (LDA) models.  The predicted MOA is given by the LDA
model, which yields the highest score. In order for a predicted MOA to be valid, the
maximum score must be at least 0.5. In addition, the model ellipsoid and Rmax
constraints must be satisfied. In the second step, the toxicity is predicted using the
multilinear regression model, which corresponds to the predicted MOA. Again, the
model ellipsoid and Rmax constraints must be satisfied for the toxicity model for a
prediction to be within the domain of applicability. The fragment constraint is not
employed for the MOA method. The advantage of the MOA method is that it
provides a more biologically relevant estimate of acute aquatic toxicity, which can
greater confidence in the prediction for toxicologists. The disadvantages of this
method are that the size of the training set is reduced (which reduces the chemical
space covered by the model) and that the prediction error may be compounded by the
fact that the mode of action must be predicted correctly.
In the consensus method, the predicted toxicity is simply the average of the predicted

-------
          toxicities from the other QSAR methodologies (taking into account the applicability
          domain of each method)23. If only a single QSAR methodology can make a
          prediction, the predicted value is deemed unreliable and not used. This method
          typically provides the highest prediction accuracy since errant predictions are
          dampened by the predictions from the other methods. In addition, this method
          provides the highest prediction coverage because several methods with slightly
          different applicability domains are used to make a prediction.
2,3  \

     *. l /  <*•

          The predictive ability of each of the QSAR methodologies was evaluated using
          statistical external validation 24. In version 2.0 of the TEST software, the data set was
          divided into training and test sets using the Kennard-Stone rational design algorithm
          25~28. Starting in version 3.0, random selection was used to develop the training and
          test sets because it was felt that using Kennard-Stone method yields an overly
          optimistic estimate of predictive ability (because the test compounds are always
          within the model calibration domain). For the developmental toxicity endpoint,
          however, the training and test sets were taken from  the datasets used in CAESAR 3.
          This was done so that the CAESAR random forest model could be incorporated into
          the TEST software.
          A QSAR model  has acceptable predictive power if the following conditions are
          satisfied 29:
                                                  0.5;                          (17)
                                                  R2>0.6-                          (18)

                                       ^—^-^- < 0.1  and  0.85 
-------
          compounds that are predicted correctly (i.e. experimentally active compounds that
          are predicted to be active and experimentally inactive compounds that are predicted
          to be inactive). Sensitivity is the fraction of experimentally active compounds that
          are predicted to be active. Specificity is the fraction of experimentally inactive
          compounds that are predicted to be inactive.
3.1.  98                                        set

          The fathead minnow LCso endpoint represents the concentration in water, which kills
          half of fathead minnow (Pimephalespromelets) in 4 days (96 hours). The data set for
          this endpoint was obtained by downloading the ECOTOX aquatic toxicity
          database31.

         The database was then filtered using the following criteria:
         •  The ECOTOX "Media Type" field = "FW" (fresh water)
         •  The ECOTOX "Test Location" field = "Lab" (laboratory)
         •  The ECOTOX "Cone 1 Op (ug/L)" field cannot be <, >, or ~ (i.e. use only
            discrete LCso values)
         •  The ECOTOX "Effect" field = "Mor" (mortality)
         •  The ECOTOX "Effect Measurement" field = "MORT" (mortality)
         •  The ECOTOX "Exposure Duration" field = "4" (4 days or 96 hours)
         •  Compounds can only contain the following element symbols: C, H, O, N, F, Cl,
            Br, I, S, P, Si, As
         •  Compounds must represent a single pure component (i.e. salts, undefined
            isomeric mixtures, polymers, or mixtures were removed)

         The LCso values were taken from the "Cone 1 (ug/L)" field in ECOTOX. For
         chemicals with multiple LCso values, the median value was used.

         In version 2.0 of T.E.S.T., 10 compounds in this dataset possessed 2d isomers (the
         structures were equivalent in terms of their molecular connectivity).  In version 3.0,
         only one isomer was kept, using the average toxicity value. In version 4.0, all isomers
         were kept since the presence of the isomers had negligible impact on the external
         prediction statistics. The final fathead minnow LCso data set contained 823 chemicals.
         For use in QSAR modeling, the  experimental values in |J,g/L were converted to -
         Logio (LCso mol/L).
         For the hierarchical, single model, group contribution, FDA, Nearest neighbor, and
         Consensus methods, the data set were divided randomly into a training set (80% of
         the overall set) and a test set (20% of the overall set). For the mode of action method,
         chemicals with aknownMOA (372 chemicals) were placed in the training set while
         the remaining chemicals (440 chemicals) were placed in the test set22. Thus,  the
         results for the mode of action method will have to be considered separately.

-------
3.2.  48                                       set

         The Daphnia magna LCso endpoint represents the concentration in water, which kills
         half of D. magna (a water flea) in 48 hours. The data set for this endpoint was
         obtained from the ECOTOX aquatic toxicity database31. The database was filtered
         using the same criteria as those for the 96 hour fathead minnow LCso. The final D.
         magna LCso data set contained 353 chemicals. The modeled endpoint was -Logic
         (LCso mol/L).

3.3.  40                                                   set

         The Tetrahymenapyriformis IGCso endpoint represents the 50% growth inhibitory
         concentration of the T. pyriformis organism (a protozoan ciliate) after 40 hours. The
         IGCso training set was obtained from Schultz and coworkers 23> 32~69. The final T.
        pyriformis IGCso data set contained 1792 chemicals. The modeled endpoint was -
         Logio (IGCso mol/L).

3.4.        rat              set

         The oral rat LDso endpoint represents the amount of the chemical (mass of the
         chemical per body weight of the rat) which when orally ingested kills half of rats. The
         dataset for this endpoint was obtained by downloading records from the ChemlDplus
         database 70^13548 records were obtained by using the following search criteria:
         •  "Test" = LDSO
         •  "Species" = rat
         •  "Route" = oral

         The list of chemicals was filtered using the following criteria:
         •  Only chemicals with discrete LDso values were used (i.e. chemicals with LDso
            values with ">" or "<" were removed)
         •  Compounds can only contain the following element symbols: C, H, O, N, F, Cl,
            Br, I,  S, P, Si, or As
         •  Compounds must represent a single pure component (i.e. salts, undefined
            isomeric mixtures, polymers, or mixtures were removed)

         In version 2.0 of T.E.S.T., the final dataset consisted of 7392 chemicals. 87
         compounds in this dataset possessed 106 2d isomers. In version 3.0, only one isomer
         was kept, using the average toxicity value. In version 4.0 and greater, all isomers
         were kept because the presence of the isomers had negligible impact on the external
         prediction statistics. The final oral rat LDso data set contained 7413 chemicals.  The
         modeled endpoint was the -Logio (LDso mol/kg).

-------
3.5.                                      set

         The bioconcentration factor (BCF) is defined as the ratio of the chemical
         concentration in biota as a result of absorption via the respiratory surface to that in
         water at steady state 71. Data were compiled from several different databases 72~75. The
         final dataset consists of 676 chemicals (after removing salts, mixtures, and ambiguous
         compounds). The modeled endpoint was the Logio(BCF).

3.6.                                    set

         The developmental toxicity is defined as whether or not a chemical causes
         developmental toxicity effects in humans and animals. Developmental toxicity
         includes any effect interfering with normal development, both before and after birth.
         A dataset of 293  chemicals was created by Arena and Coworkers 76:7? by combining
         data from the Teratogen Information System (TERIS) 78 and FDA guidelines 79. The
         developmental toxicity values were taken from the revised binary toxicity values
         developed for the CAESAR project3. One chemical, Azatguiorube, was removed
         because structural information could not be found for this chemical. The final dataset
         consists of 285 chemicals (after removing salts, mixtures, and ambiguous
         compounds).

3.7.                                set

         In the Ames test, frame-shift mutations or base-pair substitutions can be detected by
         exposure of histidine-dependent strains of Salmonella typhimurium to a test
         compound. When these strains are exposed to a mutagen, reverse mutations that
         restore the functional capability of the bacteria to synthesize histidine enable bacterial
         colony growth on a medium deficient in histidine (revertants). A compound is
         classified Ames positive if it significantly induces revertant colony growth in at least
         one of out of five strains. A dataset of 6512 chemicals was compiled by Hansen and
         coworkers from several different sources 80> 81. The final dataset consists of 5743
         chemicals  (after removing salts, mixtures, ambiguous compounds, and compounds
         without CAS numbers).

3.8.

         The normal boiling point is defined as the temperature at which a chemical boils at
         atmospheric pressure. The data set for this  endpoint was obtained from  the boiling
         point data  contained in EPI Suite 82.  Forty-one chemicals were removed from the data
         set because they were previously shown to be badly predicted and had experimental
         values which were significantly different (>50K) from other sources such as NIST83
         and LookChem 84. The final data set contained 5759 chemicals. The modeled
         property was the boiling point in °C.

-------
The density is defined as mass per unit volume. The data set for this endpoint was
obtained from the density data contained in LookChem 84. The data set was restricted
to chemicals with boiling points greater than 25°C (or the boiling point was
unavailable).  The data set was further restricted to chemical with densities > 0.5 and
< 5 g/cm3. The final dataset consisted of 8909 chemicals. Data from LookChem are
not peer reviewed but the set is very large and thus provides a large degree of
structural diversity. The modeled property was density in g/cm3.
The flash point is defined as the lowest temperature at which a chemical can vaporize
to form an ignitable mixture in air. A dataset of 8362 chemicals was compiled from
lookchem.com 84. Chemicals with flash points greater than 1000°C were omitted from
the data set. The modeled property was the flash point in °C.
Thermal conductivity is defined as a materials ability to conduct heat. The thermal
conductivity at 25°C for 442 chemicals was obtained from Jamieson and Vargaftik 85>
86. Thermal conductivity values were obtained from Jamieson and Vargaftik as
follows:
•  If a value is available at 25°C this value is used
•  If an experimental value is not available, a value is extrapolated to 25°C (as long
   as the closest data point is within 10°C of 25°C)
•  If the temperature coefficient is not available (or only a single data point is
   available), the thermal conductivity of the nearest data point is used (as long as
   the closest data point is within 10°C of 25°C)
•  Only data with a quality grade of A or B (preferably grade A) in Jamieson were
   used. The thermal conductivities for the chemicals in common between Jamieson
   and Vargaftik agreed rather well (R2 = 0.95 for 381 compounds). The modeled
   property was the thermal conductivity in mW/mK.
Viscosity is a measure of the resistance of a fluid to flow in cP defined as the
proportionality constant between shear rate and shear stress). The viscosity at 25°C
for 557 chemicals was obtained from Viswanath and Riddick 87'88. The viscosity
values were obtained from Viswanath and Riddick were obtained as follows:
1.     If a value is available at 25°C this value is used
2.     If an experimental value is not available, a value is extrapolated to 25°C (as
      long as the closest data point is within 10°C of 25°C) using the following
      empirical correlation:

-------
                                  Iog10 viscosity = A + B/T
          Extrapolation was used in order to expand size of the overall dataset. The modeled
          property was logio(viscosity cP).
 3.13. Surface tension
          Surface tension is a property of the surface of a liquid that allows it to resist an
          external force. The surface tension at 25°C for 1416 chemicals was obtained from the
          data compilation of Jaspar 89. The experimental values (at 25°C) are estimated using
          an empirical correlation, which is fit to experimental data from Jaspar:
                                   surface tension = A — BT
          The estimated experimental surface tension value is only used if the closest
          experimental data point is within 10°C of 25°C. The modeled property was the
          surface tension in dyn/cm.
 3.14. Water solubility
          Water solubility is defined as the amount of chemical that will dissolve in liquid
          water to form a homogeneous solution. A dataset of 5020 chemicals was compiled
          from the database in EPI Suite 82. Chemicals with water solubilities exceeding
          1,000,000 mg/L were omitted from the overall dataset. In addition, data were limited
          to data points that are within 10°C of 25°C. The water solubility is an important
          property because sometimes the predicted LC50 values for aquatic species can exceed
          the water solubility. The modeled property was -LoglO(water solubility mol/L).
 3.15. Vapor pressure
          Vapor pressure is defined as the pressure of a vapor in mmHg in thermodynamic
          equilibrium with its condensed phases in a closed system. The vapor pressure at 25°C
          for 2511 chemicals was obtained from the database in EPI Suite 82. The modeled
          property was LoglO(vapor pressure mmHg).

 3.16.Melting point

          Melting point is the temperature, in °C, at which a chemical in the solid state changes
          to a liquid state. The melting point for 9385 chemicals was  obtained from the
          database in EPI Suite 82. The modeled property was LoglO(vapor pressure mmHg).

4. VALIDATION RESULTS

 4.1. 96 hour fathead minnow LCso

-------
4.1.1. Statistical External Validation

      The consensus approach achieved the best results in terms of all the prediction
      statistics (see Table 4.1.1). The hierarchical method achieved the best results of any
      of the individual QSAR methods. Statistics highlighted in pink represent predictions
      where a condition in equation 18 or 19 was not met. Models, which do not meet
      these conditions, are not invalid, per se, but should be used with caution.  The
      predicted values for the test set for the fathead minnow LC50 endpoint for the
      consensus method are given in Figure 4.1.1.

           Table 4.1.1. Prediction results for the fathead minnow LCso test set
Method
Hierarchical
Single Model
FDA
Group contribution
Nearest neighbor
Consensus
R2
0.710
0.704
0.626
0.686
0.667
0.728
R2-R20
R2
0.075
0.134
0.113
0.123
0.080
0.121
k
0.966
0.960
0.985
0.949
1.001
0.969
RMSE
0.801
0.803
0.915
0.810
0.876
0.768
MAE
0.574
0.605
0.656
0.578
0.649
0.545
Coverage
0.951
0.945
0.945
0.872
0.939
0.951

-------
                           External prediction results
                    0   123456739   10
                  Exp. Fathead minnow LC50 (96 hr) -LoglO(molL)
Figure 4.1.1. Experimental vs predicted values for the fathead minnow LCso test set

-------
    4.1.2. Statistical External Validation for mode of action method

          The mode of action method yields slightly worse results than the hierarchical and
          single model methods (see Table 4.1.2). The results for the hierarchical and single
          model methods are worse than those from section 4.1.1 because the training  set used
          to fit the models was smaller.

     Table 4.1.2. Prediction results for the fathead minnow LCso test set using the MOA method
Method
Hierarchical
Single model
Mode of action
R2
0.612
0.575
0.543
p2 P2
0 k RM-^E MAE ^"vrag*1
R2
0.242 0.990 0.847 0.611 0.907
0.141 0.993 0.920 0.640 0.902
0.049 0.949 0.978 0.678 0.834
4.2.  48 hour Daphnia magna LCso

    4.2.1. Statistical External Validation

          The consensus method achieved the best results in terms of both prediction accuracy
          and coverage (see Table 4.2.1). The prediction results for the consensus method are
          given in Figure 4.2.1.

                 Table 4.2.1. Prediction results for the D. magna LCso test set
Method
Hierarchical
Single Model
FDA
Group contribution
Nearest neighbor
Consensus
R2
0.695
0.697
0.565
0.671
0.733
0.739
R2-R2
R2
0.151
0.152
0.257
0.049
0.014
0.118
k
0.981
1.002
0.987
0.999
1.015
1.001
RMSE
0.979
0.993
1.190
0.803
0.975
0.911
MAE
0.757
0.772
0.909
0.620
0.745
0.727
Coverage
0.886
0.871
0.900
0.657
0.871
0.900

-------
                               External prediction results
                   1
                      0123456789   10
                        Exp. Daphnia magna LC50 (48 hr) -LoglO(mol.'L)
        Figure 4.2.1. Experimental vs predicted values for the fathead minnow LCso test set

4.3.  Tetrahymena pyriformis 50% growth inhibitory concentration
 (IGCso)

    4.3.1. Statistical External Validation

          Again, the consensus method achieved the best results (see Table 4.3.1). The R2
          value for the consensus method in version 4.1 of TEST was slightly lower than the
          value for version 4.0. This is because the data set has been expanded to include a
          wider variety of chemical classes. The prediction results for the consensus method
          are given in Figure 4.3.1.

-------
                 Table 4.3.1. Prediction results for the T. pyriformis IGCso test set
Method
Hierarchical
FDA
Group contribution
Nearest neighbor
Consensus
R2
0.719
0.747
0.682
0.600
0.764
R2-R2
R2
0.023
0.056
0.065
0.170
0.065
k
0.978
0.988
0.994
0.976
0.983
RMSE
0.539
0.489
0.575
0.638
0.475
MAE
0.358
0.337
0.411
0.451
0.332
Coverage
0.933
0.978
0.955
0.986
0.983
                                 External prediction results
                       012345678
                          Exp. T. pyriformis IGCSO (48 hr) -Log10(mol,'L)
          Figure 4.3.1. Experimental vs predicted values for the T. pyriformis IGCso test set

4.4.  Oral rat LDso dataset

    4.4.1. Statistical External Validation

          It was not possible to develop a single model or a group contribution model that fit
          the entire training set (see Table 4.4.1). The consensus method achieved the best
          results in terms of both prediction accuracy and prediction coverage. The prediction
          statistics for this endpoint were not as good as those for the other endpoints. This is
          not surprising because this endpoint has a higher degree of experimental uncertainty
          and has been shown to be more difficult to model than other endpoints 90. The
          prediction results for the consensus method are given by in Figure 4.4.1.

-------
        Table 4.4.1. Prediction results for the oral rat LDso test set
Method
Hierarchical
FDA
Nearest neighbor
Consensus
8
7
3
« 6
1
I
o
HI
Q
-1 3
ti
r
0
»
r>2 r>2
° fr

RMSE MAE Coverage
0.578 0.184 0.969 0.650 0.460 0.876
0.557 0.238 0.953 0.657 0.474 0.984
0.557 0.243 0.961 0.656 0.477 0.993
0.626 0.235 0.959 0.594 0.431 0.984
External prediction results





• Exp.
— Y=X line










^
7
/

•
$&
1



•**
P




if
^A
»




/
•






/

•







                 01234567
                    Exp. Oral rat LDSO -LoglO(mol.'kg)
Figure 4.4.1.  Experimental vs predicted values for the oral rat LDso test set

-------
4.5.  1Bioaccumulation factor (BCF)

    4.5.1. Statistical External Validation

          Again, the consensus method yielded the best statistics if one considers both
          prediction accuracy and coverage (see Table 4.5.1.). The prediction results for the
          consensus method are given in Figure 4.5.1.


                     Table 4.5.1. Prediction results for the BCF test set
Method
Hierarchical
Single Model
FDA
Group Contribution
Nearest neighbor
Consensus
R2
0.734
0.742
0.705
0.675
0.609
0.760
R2-R20
R2
0.019
0.083
0.036
0.187
0.100
0.066
k
0.888
0.901
0.905
0.888
0.931
0.900
RMSE
0.712
0.684
0.746
0.760
0.884
0.661
MAE
0.541
0.543
0.571
0.622
0.604
0.513
Coverage
0.926
0.926
0.911
0.874
0.948
0.926

-------
                                External prediction results
                           -1012345
                             Exp. Bioaccumulation factor LoglO
               Figure 4.5.1. Experimental vs predicted values for the BCF test set

          The BCFBAF (bioconcentration factor bioaccumulation factor) module (v. 3.00) of
          US EPA's EPI Suite software package 82 yielded an R2 value of 0.766 and MAE of
          0.50 (for the same chemicals that were able to be predicted by the consensus
          method). Thus, the predictions for the consensus method are comparable to those
          from EPI Suite. However, this may not be a fair comparison because some of the
          chemicals in the prediction set may have appeared in the training set for the BCF
          model in EPI Suite.

4.6.  Developmental toxicity
    4.6.1. Statistical External Validation

          The consensus method achieved the best results for the EPA developed QSAR
          methods (in terms of prediction accuracy and coverage) (see Table 4.6.1). The
          CAESAR random forest method achieved similar results to the EPA Consensus
          model (the concordance was higher but the coverage was lower). All of the methods
          achieved appreciably higher prediction sensitivities than specificities. This is
          acceptable for regulatory applications because it is desired to minimize the number
          of false negatives.

-------
                Table 4.6.1. Prediction results for the reproductive toxicity test set
Method
Hierarchical
Single Model
FDA
Nearest neighbor
Consensus
Random Forest
Concordance
0.724
0.732
0.724
0.795
0.793
0.852
Sensitivity
0.829
0.850
0.780
0.844
0.902
0.949
Specificity
0.471
0.438
0.588
0.667
0.529
0.600
Coverage
1.000
0.966
1.000
0.759
1.000
0.931
4.7.

    4.7.1.

          Again, the consensus method achieved the best prediction accuracy (concordance)
          and prediction coverage (see Table 4.7.1). The single model and group contribution
          methods could not be applied to this endpoint. All of the methods achieved a nice
          balance of prediction sensitivity and specificity.

                 Table 4.7.1. Prediction results for the Ames mutagenicity test set
Method
Hierarchical
FDA
Nearest neighbor
Consensus
Concordance
0.763
0.775
0.770
0.790
Sensitivity
0.776
0.766
0.783
0.789
Specificity
0.746
0.787
0.752
0.791
Coverage
0.956
0.961
0.990
0.995
4.8.
     4.8.1.

          The consensus method achieved the best statistics in terms of both prediction
          accuracy and coverage (see Table 4.8.1). In general, the prediction statistics for the
          physical properties were excellent. The prediction results for the consensus method
          are given in Figure 4.8.1.

-------
                 Table 4.8.1. Prediction results for the normal boiling point test set
Method
Hierarchical
FDA
Group contribution
Nearest neighbor
Consensus

0
0
0
0
0
R2
.949
.936
.897
.877
.947
R2

0
0
0
0
0
P2
Ko
R2
.001
.002
.002
.005
.002

0.
0.
0.
0.
0.
k
.991
.991
.997
.968
.987
RMSE
18.700
21.431
27.554
29.967
19.403
MAE
10.613
12.214
17.000
19.754
1 1 .460
Coverage
0.935
0.988
0.977
0.988
0.986
                                  External prediction results
                     600
                     500
                     -100
                     -200
                       -200  -100   0   100  200   300   400   500   600
                                  Exp. Normal boiling point °C
          Figure 4.8.1. Experimental vs predicted values for the normal oiling point test set
4.9.  Density
     4.9.1. Statistical External Validation

          For this property, the hierarchical and FDA methods gave a slightly higher R2 value
          than the consensus method (see Table 4.9.1.). However, the consensus method
          yielded a near 100% prediction coverage. The prediction results for the consensus
          method are given in Figure 4.9.1.

-------
          Table 4.9.1. Prediction results for the density test set
     Method
R2
                             R2
RMSE  MAE   Coverage
Hierarchical        0.972    0.001    0.997   0.052   0.026     0.942
FDA               0.968    0.001    0.993   0.057   0.031     0.992
Group contribution  0.872    0.005    0.997   0.116   0.071     0.992
Nearest neighbor   0.859    0.021    0.978   0.121   0.073     0.997
Consensus         0.956    0.005    0.991   0.068   0.038     0.996
                       External prediction results
           0.0    0.5   1.0    1.5   2.0   2.5   3.0   3.5    4.0
                          Exp. Density g/cm3
   Figure 4.9.1. Experimental vs predicted values for the density test set

-------
4.10.Flash point

     4.10.1. Statistical External Validation

          For this property, the consensus method gives the best results in terms of prediction
          accuracy and coverage (see Table 4.10.1). The prediction results for the consensus
          method are given in Figure 4.10.1.

                    Table 4.10.1. Prediction results for the flash  point test set
                Method
R2
                                       R2
RMSE    MAE   Coverage
            Hierarchical        0.871    0.008   0.962  28.898  16.749    0.924

            FDA              0.853    0.010   0.960  31.481  19.227    0.989

            Group contribution  0.834    0.009   0.968  33.630  20.426    0.987

            Nearest neighbor   0.801    0.018   0.925  36.833  23.832    0.993

            Consensus         0.879    0.011   0.953  28.503  16.908    0.992
                                  External prediction results
                     -200
                       -200-100  0  100 200 300 400 500 600 700 300 900 1000
                                     Exp. Flash point °C
                4.10.1. Experimental vs predicted values for the flash point test set

-------
4.11.Thermal conductivity

     4.11.1.  Statistical External Validation

          For this property, the hierarchical method gives similar results to the consensus
          method (see Table 4.11.1). The prediction results for the consensus method are given
          in Table 4.11.1.

                Table 4.11.1. Prediction results for the thermal conductivity test set
Method
Hierarchical
Single Model
FDA
Group contribution
Nearest neighbor
Consensus
Pred. Thermal conductivity at 25°C niW niK
-^ -* U M U U
01 a ui o ui o cn
Doannnoo
R2
0.906
0.890
0.845
0.803
0.884
0.892


R2-Rl
R2
k RMSE MAE Coverage
0.025 0.996 11.024 6.731 0.956
0.031 0.992 11.864 8.524 0.956
0.000 1.018 16.406 9.008 0.967
0.088 0.979 15.898 9.825 0.911
0.021 1.004 12.832 8.449 0.978
0.010 1.005 12.413 7.046 0.967
External prediction results



• Exp.
— Y=X line



-
/


?






'
?









• j
/




m




/







                       0     50    100    150   200   250    300   350
                            Exp. Thermal conductivity at 2 5 C mW/mK
         Figure 4.11.1. Experimental vs predicted values for the thermal conductivity test set

-------
4.12.Viscosity
     4.12.1. Statistical External Validation

          For this property, the consensus method gives the best results if you consider both
          prediction accuracy and coverage (see Table 4.12.1). The low lvalues for this
          endpoint can be attributed to the two possible outliers in the test set that fall below
          the Y=X line. The prediction results for the consensus method are given Figure
          4.12.1.

                     Table 4.12.1. Prediction  results for the viscosity test set
Method
Hierarchical
Single Model
FDA
Group contribution
Nearest neighbor
Consensus
Pred. Viscosity at 25°C Log10(cP)
i6oo-i-irorocjco
nuibuibinbuibui
R2
R2-R20
R2
k RMSE MAE Coverage
0.868 0.001 0.809 0.214 0.131 0.929
0.644 0.010 0.625 0.346 0.217 0.929
0.868 0.003 0.875 0.207 0.142 0.929
0.888 0.001 0.831 0.200 0.113 0.814
0.757 0.009 0.726 0.289 0.194 0.920
0.876 0.004 0.778 0.215 0.125 0.929
External prediction results



• Exp.
— ¥=X line













-------
4.13. Surface tension

     4.13.1. Statistical External Validation

          For this property, the consensus method gives the best results in terms of prediction
          accuracy and coverage (see Table 4.13.1(. The prediction results for the consensus
          method are given Figure 4.13.1.

                  Table 4.13.1. Prediction results for the surface tension test set
                 Method
R2
                                        R2
RMSE   MAE   Coverage
            Hierarchical        0.929   0.016   0.989   1.792   1.037    0.919

            FDA              0.890   0.015   0.992   2.219   1.297    0.979

            Group contribution  0.794   0.044   0.986   2.933   2.114    0.926

            Nearest neighbor   0.759   0.068   0.973   3.317   1.923    0.936

            Consensus        0.903   0.027   0.987   2.112   1.317    0.968
                                  External prediction results
                           10   15  20   25   30   35   40   45   50  55
                              Exp. Surface tension at 25'X: dyn/cm


           Figure 4.13.1. Experimental vs predicted values for the surface tension test set

-------
4.14. Water solubility

     4.14.1. Statistical External Validation

          For this property, the consensus method gives the best statistics in terms of
          prediction accuracy and coverage (see Table 4.14.1). The prediction results for the
          consensus method are given in Figure 4.14.1.

                  Table 4.14.1. Prediction results for the water solubility test set
                 Method
R2
                                        R2
RMSE   MAE   Coverage
             Hierarchical        0.834   0.015   0.943  0.903   0.601     0.935

             FDA               0.809   0.014   0.950  0.953   0.639     0.984

             Group contribution   0.766   0.039   0.933  1.074   0.798     0.982

             Nearest neighbor   0.791    0.022   0.950  1.023   0.735     0.985

             Consensus         0.857   0.021    0.943  0.835   0.578     0.987


                                  External prediction results
                        -5         0          5          10         15
                            Exp. Water solubility at 25°C Log10(mol,'L)
           Figure 4.14.1. Experimental vs predicted values for the water solubility test set

-------
4.15. Vapor pressure

     4.15.1. Statistical External Validation

          The prediction statistics were excellent and again the consensus method achieved the
          best results (see Table 4.15.1). The prediction results for the consensus method are
          given in Table 4.15.1.

                  Table 4.15.1. Prediction results for the vapor pressure test set
Method
Hierarchical
FDA
Group contribution
Nearest neighbor
Consensus
R2
0.956
0.946
0.929
0.878
0.954
R2-Rl
R2
0.001
0.001
0.001
0.001
0.001
k
0.977
0.985
1.020
0.937
0.980
RMSE
0.745
0.827
0.998
1.251
0.769
MAE
0
0
0
0
0
.455
.494
.608
.823
.466
Coverage
0.
0.
0.
0.
0.
.940
.982
.968
.980
.980
                                  External prediction results
                        20     -15     -10     -5      0      5      10
                            Exp. Vapor pressure at 25°C LoglO(mmHg)
           Figure 4.15.1. Experimental vs predicted values for the vapor pressure test set

-------
4.16. Melting  point

     4.16.1. Statistical External Validation

          The prediction statistics were very good and the again the consensus method
          achieved the best results (see Table 4.16.1.). The prediction results for the consensus
          method are given in Figure 4.16.1.

                  Table 4.16.1. Prediction results for the water solubility test set
            Method
R2
R'-Rj
  R2
RMSE   MAE    Coverage
            Hierarchical        0.811   0.011   0.892  44.355  31.433     0.932

            FDA              0.801   0.011   0.879  45.095  32.920     0.993

            Group contribution  0.704   0.065   0.837  54.947  41.274     0.997

            Nearest neighbor   0.738   0.017   0.850  52.095  37.837     0.998

            Consensus         0.834   0.021   0.863  41.464  30.207     0.998
                                  External prediction results
                     500
                     -400
                     -200
                     -300
                       -300  -200  -100   0    100   200   300  -WO  500
                                     Exp. Melting point °C
            Figure 4.16.1. Experimental vs predicted values for the melting point test set

-------
5.           THE

   5.1.               a
             A compound can be imported into the software several different ways:
          •  Drawn using the provided molecular structure drawing tool
          •  Imported from an MDL molfile
          •  Imported from a SMILES string
          •  Imported from the included structure data base


         5.1.1.         a                 the

          •  First, add any rings present in the molecule using the ring template buttons " '-'
              O  O  O O  ® (click on a button and then click somewhere in the document).
          •  Next, step add any chains using the /  button.
          •  Next, add double or triple bonds by using /  again and clicking on the bonds to
             make them double or triple bonds. You can use ^ and """to make existing bonds
             wedge bonds or you can draw wedge bonds directly.
          •  Finally, any hetero atoms (non  carbon atoms) need to be set. Either use one of the
             element symbol buttons and click on an atom to change it to this symbol. You can use
             the periodic table Oil to choose an element.
                         c-*o
          •  Finally, with H«-N you can go through some common elements by clicking on  an atom
             repeatedly. With +1 and ~1 you can change the  charge.

         5.1.2.           a                an

              The structure for a test compound can be imported from an MDL molfile
              (https://en.wikipedia.org/wiki/Chemical table file)

              To import a structure using a MDL molfile,  select Import from MDL molfile from
              the File menu.


         5.1.3.        a                a

              The structure for a test compound can be imported from a SMILES string
              (http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html).

              To import a structure using a SMILES string, select Generate from SMILES string
              from the File menu.

              Enter the desired SMILES string in the dialog box provided and press OK.

-------
         For example, to import benzene enter clcccccl as the SMILES string. A SMILES
         string can be pasted from the clipboard by selecting Generate from SMILES on
         clipboard

    5.1.4. Import from the structure database

         To import a structure from the structure database, first select Import from structure
         database from the File menu.

         One can then import a structure from the CAS number, molecular weight, or
         formula:
           Search structure database
              ® CAS # (e.g. 71-43-2):

              O Molecular weight:

              O Formula (e.g. C6H6):

              O Currently drawn structure
         One can enter the CAS number with or without dashes (i.e. 71-43-2 or 71432). The
         Currently drawn structure option allows you to retrieve the CAS number for a
         given drawn structure (assuming it is available in the database included with the
         software).

         You can import a chemical by its CAS number by entering a CAS number in the
         Molecule ID field and pressing enter.

5.2.  Importing multiple compounds (batch  import)

         Multiple compounds can be imported simultaneously several different ways:
         •   Importing from a MDL SDfile

-------
•  Importing from a list of CAS numbers
•  Importing from a list of SMILES strings

Sample files in each of these formats are available in a zip file at the following link:
https://www. epa.gov/sites/production/files/2015-07/samplefiles.zip
                 a

 To import multiple structures from an MDL SDfile select Batch import from MDL
 SDfile from the Import Chemical menu option.

 For best results, one should use SDfiles with either a "CAS" or a "Name" field
 included to uniquely identify each chemical in the file. The program first looks for a
 "CAS" field and then looks for "Name" field when assigning identifiers. For
 example, a sample from an SDfile including formaldehyde would be as follows:

 Formaldehyde
 csChFndS0/07260508122D

  210000000 0999 V2000
  0.0000 0.0000  0.0000  COOOOOOOOOOOO
  1.4000 0.0000  0.0000  0000000000000
  1220000
 M END

 > 
 50-00-0

 > 
 Formaldehyde

 $$$$
                 a list of

 To import multiple structures from a list of CAS numbers (in a text file), select
 Batch import from list of CAS numbers from the Import Chemical menu option.

 For example to import benzene and formaldehyde, the contents of the text file should
 be as follows:

 71-43-2
 50-00-0

-------
                 a list

To import multiple structures from a list of SMILES strings (in a text file), select
Batch import from list of SMILES strings from the Import Chemical menu
option.

The text file should contain the SMILES string and an unique identifier on each line.
A comma, tab, or a space can separate the SMILES string and the identifier. The text
file should not container a header line.

For example to import benzene and formaldehyde, the contents of the text file should
be as follows:

clcccccl                        71-43-2
C=O                            50-00-0
         a           in the        list

After importing the desired set of chemicals, you can edit individual chemicals in the
list by double clicking on its row in the list. An example of an imported batch list is
given in Figure 5.2.4.

-------
                    Figure 5.2.4. Batch mode screen in I.E.ST.
" T.E.S.T(Toxicity Estimation Software Tool) f. | D fcl


£
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
ID
50-06-6
50-31-7
55-21-0
55-63-0
57-14-7
5B-27-5
60-00-4
60-29-7
65-45-2
67-36-7
67-56-1
67-63-0
75-09-2
75-89-8
76-01-7
76-29-9
76-44-8
77-71-4
77-73-6
77-74-7
78-93-3
79-01-6
79-06-1
79-19-6
79-34-5
80-06-8
83-32-9
84-69-5
86-57-7
87-72-9
88-85-7
Formula
C12H12N203
C7H302CI3
C7H7NO
C3H5N309
C2H8N2
C11H802
C10H16N208
C4H100
C7H7N02
C13H1002
CH40
C3H80
CH2CI2
C2H30F3
C2HCI5
C10H150Br
C10H5CI7
C5H8N202
C10H12
C6H140
C4H80
C2HCI3
C3H5NO
CH5N3S
C2H2CI4
C14H120CI2
C12H10
C16H2204
C10H7N02
C5H1005
C10H12N205
Error































*.

-r
^lote: double click to edit a chemical
Add | Delete
Endpoint: Fathead minnow LCSO (96 hr) \f | ? | Method: | Consensus f\\

Save list as SDF | Close batch list



[ Options... Calculal

e,
T]
H
5.2.5. Adding chemicals to the batch list

     To add chemicals to the list, click the Add button. Double click on the new chemical
     to add the molecular structure for the new chemical.

5.2.6. Deleting chemicals  from the batch list

     To delete chemicals from the list, select one or more rows in the batch list and click
     the Delete button (or press the Delete key on the keyboard).

5.2.7. Saving the batch list

     To save the batch list as an MDL SD file, click on the Save list as SDF button. This
     feature allows you to  save changes to your list.

5.2.8. Closing the batch list

     To close the batch list click on the Close batch list button. One can close the batch

-------
          list by deleting all the chemicals in the list.

5.3.  Performing toxicity predictions

         If the Molecule ID is blank, enter a unique identifier for the compound. It is
         recommended that the CAS number be used for the Molecule ID but the name can be
         used as well. The software needs the Molecule ID in order to generate the output web
         pages. Warning: if two molecules have the same Molecule ID, the results files will get
         overwritten.
         Select a toxicity endpoint using the drop down list provided (the fathead minnow
         LCso is selected by default).
         Select a QSAR toxicity estimation method using the drop down list provided (the
         hierarchical clustering method is chosen by default). The methodologies are described
         in detail in the Theory section.
         Sometimes predictions for a given chemical cannot be made because the model(s)
         violate the fragment constraint. The fragment constraint says that in order for a
         prediction to be made using a given model, the chemicals used in the construction of
         the model must possess at least one example of each molecular fragment present in
         the test compound. One can relax this constraint by checking the Relax fragment
         constraint checkbox (now accessed by clicking the Options button on the bottom of
         the screen). The fragment constraint is described in the Theory section.
         Once the desired options have been selected, one can start the toxicity estimation
         calculations by clicking Calculate!.
         Before the calculations can proceed, one must first select the location where the
         output files will be stored:
          Select folder to store the output files from this software
           Select folder to store the output files from this software:
           C:\Documents and SettingsUJserlDWy DocurnentsWyToxicity
                                                                       Browse...
Cancel

OK
         The output folder can be changed at any time by choosing Select output folder from
         the Options screen. The software will remember the selected output folder the next
         time the software is loaded.
         If one wishes to abort the currently running calculations, click on the red Stop button.

-------
 5.4.  Interpretation of results
          After performing the toxicity estimation calculations, a web page is generated which
          displays the results. The results for 87-60-5 (for the Tetrahymena pyriformis IGCso
          endpoint and the Consensus method) are given in Table 5.4.1. The predicted toxicity
          is 69.12 mg/L and the experimental value is 59.03 mg/L. The prediction is flagged in
          this example because the chemical was part of the external test set. The predicted
          toxicity from the consensus method represents the average of the predicted toxicities
          from all the different QSAR methods incorporated into the TEST software. The
          individual prediction are given in Table 5.4.2.  The average of the values from all the
          different QSAR methods is 3.31 which is close to the experimental value of 3.38 (in
          units of-Log(mol/L)).

               Table 5.4.1. Prediction results from the consensus method for 87-60-5
Prediction results
Endpoint
T. pyriformis IGC50 (48 hr) -LoglO(mol/L)
T. pyriformis IGCso (48 hr) mg/L
Experimental value (CAS=
87-60-5)
Source: TETRATOX

3.38
59.03
Predicted value3
3.31
69.12
aNote: the test chemical was present in the external test set.
                        Table 5.4.2 Individual predictions for 87-60-5
Individual Predictions
. . Predicted value
Method , „„, ,.,.
-LoglO(mol/L)
Hierarchical clustering
Group contribution
FDA
Nearest neighbor
3.37

3.36

3.37

3.15



Test chemical
r
a.

-------
      The software provides predictions for similar chemicals from the test set (see Figure
      5.4.1).  The colors of the data points are defined in Table 5.4.3.  The MAE (mean
      absolute error) for similar chemicals (0.25) was slightly lower than the value for the
      entire test set (0.33). This increases ones confidence in the predicted value.  The
      structures for the similar chemicals in the test set are given in Table 5.4.3.
      Prediction results (colors defined in table below)
  5.0
o
I 4.5
MAE
= 0.25
                                                           Test set chemicals       MAE*
                                                           Entire set               0.33
                                                           Similarity coefficient >
                                                           0.5
                                                         *Mean absolute error in -
                                                         LoglO(mol/L)
    2.0     2.5     3.0     3.5     4.G     4.5     5.0
        Exp. T. pyriformis IGC50 (48 hr) -Log10(mol/L)
               Figure 5.4.1. Predictions for similar chemicals from the test set

-------
          Table 5.4.3. Structures for the similar chemicals in the test set
CAS
87-60-5
(test chemical)
Structure
Similarity   Experimental value Predicted value
Coefficient  -LoglO(mol/L)      -LoglO(mol/L)
  The most similar chemicals are very similar to the test chemical (benzenes substituted
  with chloro and amino groups) and were accurately predicted. This increases ones
  confidence in the predicted value.  The program lists the similar chemicals in the
  training set (see Table 5.4.4).  As shown by the fairly large similarity coefficients,
  there are very similar chemicals in the training set (the only difference is the
  substitution pattern). This increases ones confidence in the predicted value because
  similar chemicals were used to build the QSAR models.

  One can view the details of the predictions for the different QSAR methods by
  clicking on the predicted value for each method. For example, for the Hierarchical
  clustering methodthe main prediction table is given in Table 5.4.5.  The prediction
  interval is 48.78 < Tox < 75.30 (one is 90% confident that the predicted value is

-------
between 48.78 and 75.30). The experimental value falls within the prediction interval.

       Table 5.4.4. Structures for the similar chemicals in the training set
      CAS
      87-60-5
      (test chemical)
Structure
Similarity  Experimental value
Coefficient -LoglO(mol/L)
                                   3.38
                                                       3.39
                                                       2.57
                                                       3.50

-------
                  Table 5.4.5. Prediction from the hierarchical clustering method.
Prediction results
Endpoint
T. pyriformis IGC50 (48 hr) -
LoglO(mol/L)
T. pyriformis IGC5o (48 hr) mg/L
Experimental value (CAS=
87-60-5)
Source: TETRATOX

3.38
59.03
Predicted .. .
. a Prediction
value
3.37 3.27
-------
 Table 5.4.6. Regression statistics for model 2481
Parameter
Endpoint
r2
q2
#chemicals
Model
Value
T. pyriformis IGC5o (48 hr)
0.926
0.861
10
Model # 2481
                 Model fit results
Pred. T. pyriformis IGC50 (48 hr) -Log10(molfl_)
JMMUUUUUUUUIt
*ioo
-------
                        Table 5.4.6. Model parameters for model 2481
Model coefficients

Coefficient
Intercept
MATS4e
GATS3p
Definition
Model intercept
Moran autocorrelation - lag 4 / weighted by atomic Sanderson
electronegativities
Geary autocorrelation - lag 3 / weighted by atomic polarizabilities
Value
2.5043
0.7092
0.6683
Uncertainty*
0.2689
0.1648
0.2168
* value for 90% confidence interval

          Table 5.4.6. indicates that the equation for the model is as follows:
          Model equation:
          T. pyriformis IGC50 (48 hr) = 0.7092x(MATS4e) + 0.6683x(GATS3p) + 2.5043

          The fit results (and structures) for each chemical in the model's training set can be
          obtained by clicking on Model 2481 fit results by  chemical.
          The descriptor values (in a "|" delimited text file) can be obtained by clicking on
          Model 2481 training set descriptors.

-------
Bibliography
(1)     US EPA. Environmental Optimization Using the Waste Reduction Algorithm.
       nepis.epa.gov/Exe/ZyPURL.cgi?Dockey=P100DZKT.TXT (accessed 4/18/16).
(2)     Martin, T. M.; Marten, P.; Venkatapathy, R.; Das, S.; Young, D. M. A Hierarchical Clustering
       Methodology for the Estimation of Toxicity. Toxicol. Mech. Method. 2008,18, 251-266.
(3)     CAESAR. Developmental Toxicity Model. http://www.caesar-
       proiect.eu/index.php?page=results§ion=endpoint&ne=5 (accessed 9/21/09).
(4)     Steinbeck, C; Han, Y.; Kuhn, S.; Horlacher, O.; Luttmann, E.; Willighagen,  E. The Chemistry
       Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics. J. Chem.
       Inf. Comp. Sci. 2003, 43, 493-500.
(5)     Sourceforge.net. Chemistry Development Kit (CDK). https://sourceforge.net/projects/cdk/
       (accessed 4/14/2016).
(6)     Elsevier MDL. MDL QSAR Version 2.2. http://www.mdl.com/products/predictive/qsar/index.isp
       (accessed 8/17/2006).
(7)     Talete. Dragon Version 5.4. http://www.talete.mi.it/ (accessed 5/26/09).
(8)     Edusoft-LC. Molconn-z Version 4.0. http://www.edusoft-lc.com/molconn/ (accessed 5/26/09).
(9)     Romesburg, H. C., Cluster Analysis for Researchers. Lifetime Learning Publications: Belmont, CA,
       1984.
(10)    Eriksson, L; Jaworska, J. S.; Worth, A. P.; Cronin, M. T. D.; McDowell, R. M.; Gramatica, P.
       Methods for Reliability and Uncertainty Assessment and for Applicability Evaluations of
       Classification- and Regression-Based QSARs. Environ. Health Persp. 2003, 111, 1361-1375.
(11)    Topliss, J. G.; Edwards, R. P. Chance factors in Studies of Quantitative Structure-Activity
       Relationships. J. Med. Chem. 1979, 22, 1238-1244.
(12)    The University of Waikato. WEKA - The Waikato Environment for Knowledge Analysis.
       http://www.cs.waikato.ac.nz/~ml/weka/ (accessed 5/26/09).
(13)    Witten, I. H., Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann:
       San Francisco, 2005.
(14)    Kutner, M. H., Nachtsheim, C. J., Neter, J., and Li, W., Applied Linear Statistical Models.
       McGraw-Hill: New York, 2004.
(15)    Eriksson, L.; Johannson, E.; Kettaneh-Wold, N.; Wold, S., Multi- and Megavariate Data Analysis -
       Principles and Applications. Umetrics AB: Umea, Sweden, 2001.
(16)    Wikipedia.org. Weighted mean. http://en.wikipedia.org/wiki/Weighted mean (accessed
       4/14/16).
(17)    Montgomery, D. C., Introduction to linear regression analysis. In John Wiley and Sons: New York,
       1982; p 141.
(18)    Contrera, J. F.; Matthews, E. J.; Benz, R. D. Predicting the carcinogenic potential of
       Pharmaceuticals in rodents using molecular structural similarity and  E-state indices. Regul.
       Toxicol. Pharm. 2003, 38, 243-259.
(19)    Benigni, R.; Richard,  A. M. QSARS of mutagens and carcinogens: Two case studies illustrating
       problems in the construction of models for noncongeneric chemicals. Mutat. Res. 1996, 371, 29-
       46.
(20)    Martin, T. M.; Young, D. M. Prediction of the  Acute Toxicity (96-h LC50)  of Organic Compounds ti
       the Fathead Minnow (Pimephales promelas)  Using a Group Contribution  Method. Chem. Res.
       Toxicol. 2001,14, 1378-1385.

-------
(21)    Martin, T. M.; Grulke, C. M.; Young, D. M.; Russom, C. L; Wang, N. Y.; Jackson, C. R.; Barren, M.
       G. Prediction of Aquatic Toxicity Mode of Action Using Linear Discriminant and Random Forest
       Models. J. Chem. Inf. Model. 2013, 53, 2229-2239.
(22)    Martin, T. M.; Young, D. M.; Lilavois, C. R.; Barren, M. G. Comparison of global and mode of
       action-based models for aquatic toxicity. SAR QSAR Environ. Res. 2015, 26, 245-262.
(23)    Zhu, H.; Tropsha, A.; Fourches, D.; Varnek, A.; Papa, E.; Gramatica, P.; Oberg, T.; Dao, P.;
       Cherkasov, A.; Tetko, I.  V. Combinational QSAR Model of Chemical Toxicants Tested against
       Tetrahymena pyriformis. J. Chem. Inf. Model. 2008, 48, 766 - 784.
(24)    Gramatica, P.; Pilutti, P. Evaluation of different statistical approaches for the validation of
       quantitative structure-activity relationships; The European Commission -Joint Research Centre,
       Institute for Health & Consumer Protection - ECVAM: Ispra,  Italy, 2004.
(25)    Bourguignon, B.; Deaguiar, P. F.; Khots, M. S.; Massart, D. L.  Optimization in Irregularly Shaped
       Regions: pH and Solvent Strength in Reversed-Phase High-Performance Liquid Chromatography
       Separations. Analytical Chemistry 1994, 66, 893-904.
(26)    Bourguignon, B.; Deaguiar, P. F.;Thorre, K.; Massart, D. L. Journal of Chromatography Science
       1994, 32, 144-152.
(27)    Kennard, R. W.; Stone,  L. A. Technometrics 1969,11, 137-148.
(28)    Snarey, M.; Terrett, N. K.; Willet, P.; Wilton, D. J. Comparison of Algorithms for Dissimilarity-
       Based Compound Selection. J. Mol. Graph. Model. 1997,15, 372-385.
(29)    Golbraikh, A.; Shen, M.; Xiao, Z.; Xiao, Y.-D.; Lee, K.-H.; Tropsha, A. Rational Selection of Training
       and Test sets for the Development of Validated QSAR Models. J. Comput. Aid. Mol. Des. 2003,
       17,241-253.
(30)    Golbraikh, A.; Tropsha,  A. Beware of q2! J. Mol. Graph. Model. 2002, 20, 269-276.
(31)    US EPA. ECOTOX Database.                            (accessed 4/14/2016).
(32)    Akers, K. S.; Sinks, G. D.; Schultz, T. W. Structure-toxicity relationships for selected halogenated
       aliphatic chemicals. Environmental Toxicology and Pharmacology 1999, 7, 33-39.
(33)    Aptula, A. O.; Roberts, D. W.; Cronin, M. T. D.; Schultz, T. W. Chemistry-Toxicity Relationships for
       the Effects of Di- and Trihydroxybenzenes to Tetrahymena pyriformis. Chem. Res. Toxicol. 2005,
       18, 844-854.
(34)    Bearden, A. P.; Schultz, T. W. Structure-Activity Relationships For Pimephales And Tetrahymena:
       A Mechanism Of Action Approach. Environmental Toxicology and Chemistry 1997,16, 1311-
       1317.
(35)    Bohme, A.; Thaens, D.;  Schramm, F.; Paschke, A.; Schuurmann, G. Thiol Reactivity and Its Impact
       on the Ciliate Toxicity of Unsaturated Aldehydes, Ketones, and Esters. Chem. Res. Toxicol. 2010,
       23, 1905-1912.
(36)    Cottrell, M. B.; Schultz,  T. W. Structure-Toxicity Relationships for Methyl  Esters of Cyanoacetic
       Acids to Tetrahymena pyriformis. Bull. Environ. Contam. Toxicol. 2003, 70, 549-556.
(37)    Cronin, M. T. D.; Bowers, G. S.; Sinks, G. D.; Schultz, T. W. Structure-Toxicity Relationships for
       Aliphatic Compounds Encompassing a Variety of Mechanisms of Toxic Action to Vibrio fischeri.
       SAR QSAR Environ. Res. 2000,11, 301-312.
(38)    Cronin, M. T. D.; Manga, N.; Seward, J. R.; Sinks, G. D.; Schultz, T. W. Parametrization of
       Electrophilicity for the Prediction of the Toxicity of Aromatic Compounds. Chem. Res. Toxicol.
       2001,14, 1498-1505.
(39)    Cronin, M. T. D.; Aptula, A. O.; Duffy, J. C; Netzeva, T. I.; Rowe, P. H.; Valkova, I. V.; Schultz, T. W.
       Comparative assessment of methods to develop QSARs for the prediction of the toxicity of
       phenols to Tetrahymena pyriformis. Chemosphere 2002, 49, 1201-1221.
(40)    DeWeese, A. D.; Schultz, T. W. Structure-Activity Relationships for Aquatic Toxicity to
       Tetrahymena: Halogen-Substituted Aliphatic Esters. Environ. Toxicol. 2001,16, 54-60.

-------
(41)    Dimitrov, S.; Koleva, Y.; Schultz, T. W.; Walker, J. D.; Mekenyan, O. Interspecies Quantitative
       Structure-Activity Relationship Model For Aldehydes: Aquatic Toxicity. Environmental
       Toxicology and Chemistry 2004, 23, 463-470.
(42)    Ellison, C. M.; Cronin, M. T. D.; Madden, J. C.; Schultz, T. W. Definition of the structural domain
       of the baseline non-polar narcosis model for Tetrahymena pyriformis. SAR QSAR Environ. Res.
       2008,19, 751-783.
(43)    Gagliardi, S. R.; Schultz, T. W. Regression Comparisons of Aquatic Toxicity of Benzene
       Derivatives: Tetrahymena pyriformis and Rana japonica. Bull. Environ. Contam. Toxicol. 2005, 74,
       256-262.
(44)    Muccini, M.; Layton, A. C; Sayler, G. S.; Schultz, T. W. Aquatic Toxicities of Halogenated Benzoic
       Acids to Tetrahymena pyriformis. Bull. Environ. Contam. Toxicol. 1999, 62, 616-622.
(45)    Netzeva, T. I.; Schultz, T. W.; Aptula, A. O.; Cronin, M. T. D. Partial least squares modelling of the
       acute toxicity of aliphatic compounds to Tetrahymena pyriformis. SAR QSAR Environ. Res. 2003,
       14, 265-83.
(46)    Netzeva, T. I.; Schultz, T. W. QSARs for the aquatic toxicity of aromatic aldehydes from
       Tetrahymena data. Chemosphere 2005, 61, 1632-1643.
(47)    Ren, S.; Frymier, P. D.; Schultz, T. W. An exploratory study of the use of multivariate techniques
       to determine mechanisms of toxic action. Ecotoxicology and Environmental Safety 2003, 55, 86-
       97.
(48)    Roberts, D. W.; Schultz, T. W.; Wolf, E. M.; Aptula, A. O. Experimental Reactivity Parameters for
       Toxicity Modeling: Application to the Acute Aquatic Toxicity of SN2 Electrophiles to
       Tetrahymena pyriformis. Chem. Res. Toxicol. 2010, 23,  228-234.
(49)    Schultz, T. W.; Kier, L B.; Hall, L H. Structure-Toxicity Relationships of Selected Nitrogenous
       Heterocyclic Compounds. III. Relations Using Molecular Connectivity. Bull. Environ. Contam.
       Toxicol. 1982, 28, 373-378.
(50)    Schultz, T. W.; Wesley, S. K.; Baker,  L. L. Structure-Activity Relationships for Di and Tri Alkyl
       and/or Halogen Substituted Phenol. Bull. Environ. Contam. Toxicol. 1989, 43, 192-198.
(51)    Schultz, T. W.; Tichy, M. Structure-Toxicity Relationships for Unsaturated Alcohols to
       Tetrahymena pyriformis: C5 and C6 analogs and Primary Propargylic Alcohols. Bull. Environ.
       Contam. Toxicol. 1993, 51, 681-688.
(52)    Schultz, T. W.; Comeaux, J. L. Structure-Toxicity Relationships for Aliphatic Isothiocyanates to
       Tetrahymena pyriformis. Bull. Environ. Contam. Toxicol. 1996, 56, 638-642.
(53)    Schultz, T. W.; Bearden, A. P. Structure-Toxicity Relationships for Selected Naphthoquinones to
       Tetrahymena pyriformis. Bull. Environ.  Contam. Toxicol. 1998, 61, 405-410.
(54)    Schultz, T. W. Structure-Toxicity Relationships for Benzenes Evaluated with Tetrahymena
       pyriformis. Chem. Res. Toxicol. 1999,12,  1262-1267.
(55)    Schultz, T. W.; Sinks, G. D.; Miller, L. A.  Population growth impairment of sulfur-containing
       compounds to Tetrahymena pyriformis. Environ. Toxicol. 2001,16, 543-549.
(56)    Schultz, T. W.; Netzeva, T. I.; Cronin, M. T. D. Selection of data sets for qsars: Analyses of
       tetrahymena toxicity from aromatic compounds. SAR QSAR Environ. Res. 2003, Vol.  14, pp. 59-
       81.
(57)    Schultz, T. W.; Tucker, V. A. Structure-Toxicity Relationships for the Effects of N- and N,N-Alkyl
       Thioureas to Tetrahymena pyriformis. Bull. Environ. Contam. Toxicol. 2003, 70, 1251-1258.
(58)    Schultz, T. W.; Burgan, J. T. pH-Stress and Toxicity of Nitrophenols to Tetrahymena pyriformis.
       Bull. Environ. Contam. Toxicol. 2003, 71, 1069-1076.
(59)    Schultz, T. W.; Seward-Nagel, J.; Foster, K. A.; Tucker, V. A. Population Growth Impairment of
       Aliphatic Alcohols to Tetrahymena.  Environ. Toxicol. 2004,19, 1-10.

-------
(60)    Schultz, T. W.; Yarbrough, J. W.; Woldemeskel, M. Toxicity to Tetrahymena and abiotic thiol
       reactivity of aromatic isothiocyanates. Cell Biol. Toxicol. 2005, 21, 181-189.
(61)    Schultz, T. W.; Netzeva, T. I.; Roberts, D. W.; Cronin, M. T. D. Structure-Toxicity Relationships for
       the Effects to Tetrahymena pyriformis of Aliphatic, Carbonyl-Containing, a,B-Unsaturated
       Chemicals. Chem. Res. Toxicol. 2005,18, 330-341.
(62)    Schultz, T. W.; Yarbrough, J. W.; Koss, S. K. Identification of reactive toxicants: Structure-activity
       relationships for amides. Cell Biol Toxicol 2006, 22, 339-349.
(63)    Schultz, T. W. Tetratox. http://www.vet.utk.edu/TETRATOX/ (accessed 5/26/09).
(64)    Schultz, T. W.; Hewitt, M.; Netzeva, T. I.; Cronin, M. T. D. Assessing Applicability Domains of
       Toxicological QSARs: Definition, Confidence in Predicted Values, and the Role of Mechanisms of
       Action. QSAR Comb. Sci. 2007, 26, 238-254.
(65)    Schultz, T. W.; Ralston, K. E.; Roberts, D. W.; Veith, G. D.; Aptula, A. O. Structure-activity
       relationships for abiotic thiol reactivity and aquatic toxicity of halo-substituted carbonyl
       compounds. SAR QSAR Environ. Res. 2007,18, 21-29.
(66)    Schultz, T. W.; Sparfkin, C. L; Aptula, A. O. Reactivity-based toxicity modelling of five-membered
       heterocyclic compounds: Application to Tetrahymena pyriformis. SAR QSAR Environ. Res. 2010,
       21, 681-691.
(67)    Schwobel, J. A. H.; Madden, J. C.;  Cronin, M. T.  D. Application of a computational model for
       Michael addition reactivity in the prediction of toxicity to Tetrahymena pyriformis. Chemosphere
       2011, 85, 1066-1074.
(68)    Seward, J. R.; Hamblen, E. L; Schultz, T. W. Regression comparisons of Tetrahymena pyriformis
       and Poecilia reticulata toxicity.  Chemosphere 2002, 47, 93-101.
(69)    Sinks, G. D.; Schultz, T. W. Correlation Of Tetrahymena And Pimephales Toxicity: Evaluation Of
       100 Additional Compounds. Environmental Toxicology and Chemistry 2001, 20, 917-921.
(70)    U.S. National  Library of Medicine. ChemlDplus.
       http://chem.sis.nlm.nih.gov/chemidplus/chemidheavy.isp (accessed 4/14/16).
(71)    Hamelink, J. L., Current bioconcentration test methods and theory. In Aquatic Toxicology and
       Hazard Evaluation, Mayer, F. L.; Hamelink, J. L., Eds.; ASTM STP: West Conshohocken, PA 1977;
       Vol. 634, pp 149-161.
(72)    Dimitrov, S.; Dimitrova, N.; Parkerton, T.; Combers, M.; Bonnell, M.; Mekenyan, O. Base-line
       model for identifying the bioaccumulation potential of chemicals. SAR QSAR Environ. Res. 2005,
       16, 531-554
(73)    Arnot, J. A.; Gobas, F. A. P.  C. A review of bioconcentration factor (BCF) and bioaccumulation
       factor (BAF) assessments for organic chemicals in aquatic organisms. Environ. Rev. 2006,14,
       257-297.
(74)    EURAS.  Establishing a bioconcentration factor (BCF) Gold Standard Database.
       http://www.euras.be/eng/proiect.asp?Projectld=92 (accessed 5/20/09).
(75)    Zhao, C. B., E.; Ghana, A.; Roncaglioni, A.; Benfenati, E. A new hybrid system of QSAR models for
       predicting bioconcentration factors (BCF). Chemosphere 2008, 73, 1701-1707.
(76)    Arena, V. C.; Sussman, N. B.; Mazumdar, S.; Yu, S.; Macina, O. T. The Utility of Structure-Activity
       Relationship (SAR) Models for Prediction and Covariate Selection in Developmental Toxicity:
       Comparative Analysis of Logistic Regression and Decision Tree  Models. SAR QSAR Environ. Res.
       2004,15, 1-18.
(77)    Sussman, N. B.; Arena, V. C.; Yu, S.; Mazumdar, S.; Thampatty,  B. P. Decision Tree SAR Models
       for Developmental Toxicity Based on an FDA/TERIS Database. SAR QSAR Environ. Res. 2003,14,
       83-96.
(78)    Briggs, G. G.; Freeman, R. K.; Yaffe, S. J., Drugs in Pregnancy and Lactation, 3rd ed. Williams and
       Wilkens: Baltimore,  MD, 1990.

-------
(79)    Shepard, T. H., Catalog of Teratologic Agents, 5th ed. Johns Hopkins University Press: Baltimore,
       MD, 1992.
(80)    Hansen, K.; Mika, S.; Schroeter, T.; Sutter, A.; ter Laak, A.; Steger-Hartmann, T.; Heinrich, N.;
       Muller, K.-R. Benchmark Data Set for in Silico Prediction of Ames Mutagenicity. J. Chem. Inf.
       Model. 2009, 49, 2077-2081.
(81)    Benchmark, T. http://ml.cs.tu-berlin.de/toxbenchmark/ (accessed 4/30/10).
(82)    US EPA. EPI Suite, Version 4.0. http://www.epa.gov/oppt/exposure/pubs/episuitedl.htm
       (accessed 5/21/09).
(83)    NIST. NIST Chemistry WebBook. http://webbook.nist.gov/chemistry/ (accessed
(84)    Lookchem.com. http://www.lookchem.com (accessed
(85)    Jamieson, D. T. I., J.B; Tudhope, J.S., Liquid Thermal Conductivity. A Data Survey to 1973. H. M.
       Stationary  Office: Edinburgh, 1975.
(86)    Vargaftik, N. B.; Filippov, L. P.; Tarzimanov, A. A.; Totskii, E. E., Handbook of thermal conductivity
       of liquids and gases. CRC Press: Boca Raton, 1994; p 358.
(87)    Viswanath, D. S. N., G., Data Book on the Viscosity of Liquids. Hemisphere Pub. Co.: New York,
       1989.
(88)    Riddick, J. A.; Bunger, W. B.; Sakano, T. K., Organic Solvents Physical Properties and Methods of
       Purification, 4th ed. Wiley: New York, 1986.
(89)    Jasper, J. J. The Surface Tension of Pure Liquid Compounds. J. Phys. Chem. Ref. Data 1972,1,
       841-1009.
(90)    Zhu, H.; Martin, T. M.; Ye, L; Sedykh, A.; Young, D. M.; Tropsha, A. Quantitative Structure-
       Activity Relationship Modeling of Rat Acute Toxicity by Oral Exposure. Chem. Res. Toxicol. 2009,
       22, 1913-1921.

-------
oEPA
     United States
     Environmental Protection
     Agency
   Office of Research
   and Development
   (SioiR)
   Washington, DC
   20460
   PRESORTED
STANDARD POSTAGE
  & FEES PAID EPA
  PERMIT NO. G-35
   Official Business
   Penalty for Private Use$300

-------