xvEPA
United States
Environmental Protection
Agency
Revisiting and updating chemical groupings with new
approach methodologies
US EPA in collaboration with Health Canada, Environment Climate Change
Canada
Office of Research and Development
Center for Computational Toxicology and Exposure
ACS Fall 2020 Virtual Meeting & Expo
Computational Strategies in Modern Agrochemical Discovery and De-risking Symposium
August 17-20, 2020

-------
SEPA
United States
Environmental Protection
Agency
Disclaimer: The views expressed in this presentation are
those of the authors and do not necessarily reflect the
views or policies of the U.S. Environmental Protection
Agency. This presentation has not been reviewed for
policy and is not for distribution.
2
Office of Research and Development
Center for Computational Toxicology and Exposure

-------
&EPA
United States
Environmental Protection
Agency
Team members
Accelerating the Pace of Chemical Risk
Assessment (APCRA)
US EPA
•	Dan Chang
•	Kellie Fay
•	Kristan Markey
•	Martin Phillips
•	Grace Patlewicz
•	Ann Richard
•	Gino Scarano
•	Mahmoud Shobair
•	Ryan Lougee
•	Ellery Saluck (summer intern)
Environment & Climate
Change Canada (ECCC)
•	John Prindiville
•	Cristina Inglis
Health Canada
•	Mark Lewis
ILS
- Kamel Mansouri
3
Office of Research and Development
Center for Computational Toxicology and Exposure

-------
Overview
A chemical category is a group of chemicals whose physicochemical and human
health and/or ecotoxicological properties and/or environmental fate properties are
likely to be similar or follow a regular pattern, usually as a result of structural
similarity. - OECD
Applications of chemical categorization include first tier assessment efforts and read across
from structurally similar analogs:
-Toxic Substances Control Act (TSCA) New Chemical Program Chemical Categories
(NCC; US EPA)
-ECOSAR (focus of presented work)
SEPA
United States
Environmental Protection
Agency
Office of Research and Development
Center for Computational Toxicology and Exposure

-------
&EPA
United States
Environmental Protection
Agency
US EPA ECOSAR chemical classifications
ECOSAR
fm

»klUru 3mt	ImM	tel	ui<,
Class-based SAR to predict aquatic toxicity
Classification scheme identifies excess toxicity
Estimates acute and chronic toxicity based on accumulated
data and past decisional precedents
Acute Effects:
Fish 96-hr LC
50
Daphnid 48-hr EC50
Algae 72/96-hr EC50
Chronic Effects:
Fish ChV
Daphnid ChV
Algae ChV
Profiler in OECD QSAR Toolbox
5
Office of Research and Development
Center for Computational Toxicology and Exposure

-------
&EPA
United States
Environmental Protection
Agency
Narcosis vs. specific-acting toxicity MOA
Less Toxic
Regulators (ECCC)
consider MOA
information to
determine the size
of assessment
factors
O
¦
X
|2
J2
o
U)
o
More Toxic
1
o
-1
-2
-3
-4
-5
-6
-7
-8
-9
-10
"Baseline" or "Narcosis" mechanism shown by all
organic toxicants lacking a more specific mechanism

Points falling below the "Baseline Toxicity"
presumed to have more specific mechanism |
Office of Research and Development
Center for Computational Toxicology and Exposure
~ Narcosis AChE Inhibitors
1	2	3	4	5	6
Log P
Reactive * Unknown • Uncouplers ¦ Neurotoxicants

-------
&EPA
United States
Environmental Protection
Agency
Potential approach for updating chemical categories
Multiple

categorie
s
10%

J ^
Not categorized
Single category
45%
45%
i
Almost half of all New Chemical
inventories across regulatory jurisdictions
cannot be categorized using NCC or
ECOSAR
Some fall into multiple categories
How do we update?
•	Incorporate New Approach Methodologies (NAMs)-
i.e.,ToxCast and Tox21 biological activity information
•	Apply cheminformatic approaches
Office of Research and Development
Center for Computational Toxicology and Exposure

-------
&EPA
United States
Environmental Protection
Agency
Training set
chemicals
Characterize
training set
Model
General approach
Well-defined MOA (narcosis vs. specific-acting)
NAM data in vitro toxicity data
in vivo toxicity data
Representative of chemicals of interest for prediction
1.	ECOSAR classes
2.	NCC
3.	Chemotype fingerprints (ToxPrints)
NAM data, chemotypes and combination of both
Evaluate different machine learning algorithms
8
Office of Research and Development
Center for Computational Toxicology and Exposure

-------
&EPA
United States
Environmental Protection
Agency
EnviroTox training set chemicals
EnviroTox
database
4016
NAM data
Consensus
MOA:
(880) or S
(350)
Training set chemicals
Office of Research and Development
Center for Computational Toxicology and Exposure
1.	Chemicals with in vivo eco-data - from the
EnviroTox1 database - 4016
2.	Sub-selection for chemicals with NAM data
(ToxCast and Tox21) -1904
3.	MOA predictions based on 4 publicly-available
classification models
¦	VERHAAR, ASTER, OASIS, TEST
¦	Each predicts Narcotic, Specific-Acting or
Unclassified
Consensus MOA with confidence scores:
Examples:
NNNN = N, score =3
NNSN = N, score= 2
SUSS = S, score= 2
NUNS = U, score = 0
Results:
880 Narcotic
350 Specific-acting
674 Unclassified
1Health and Environmental Sciences Institute (HESI). 2019. EnviroTox Database & Tools. Version 1.1.0
Available: http://www.envirotoxdatabase.org/
2 Kienzler et al.. Environ Toxicol and Chem. 2019, 38(10) 2294-2304

-------
n>EPA
United States
Environmental Protection
Agency
Characterize EnviroTox training set chemicals: ECOSAR classes
Neutral Organics:
"enriched" in narcotics
Consensus MOA
Narcotic
Specific-acting
Unclassified
Not classified
EcoSAR
Classification
Non-Neutral Organics: includes
narcotics (e.g., esters)
Office of Research and Development
Center for Computational Toxicology and Exposure

-------
&EPA
United States
Environmental Protection
Agency
Expanding the chemical space of the EnviroTox dataset
Added 6215 chemicals with NAM data (invitrodb v3.2)
Applied the same consensus MOA methodology
CO
03
o

-------
	 Characterize training set chemicals: ToxPrints
Agency
• Pull in chemotype information for our
chemicals via ToxPrints (TxPs)
•	Publicly available tool
•	EPA Comptox Chemicals Dashboard
ToxPrints:
S 729 chemical features
S Chemically interpretahle
S Coverage of diverse chemistry
S Hierarchical: Includes scaffolds,
functional groups, chains, rings,
bonding patterns, atom-types
ring:aromatic_biphenyl 587
NC
cN xev /CN
c c c
L c
c
ring:fused_steroid_gene 602
ric [5 6 6 6] c c
C C * \
c
.c. .c.. ,c-./
Q- c' c
c.. .c,. .c
c c
bond:PO_pbospftite 258
C P c
I
s
chain;alkanelinear_octy1 H2
_C8
r C C C
c" Y
ring:fused_[5_7]_azulene 591
c I c
ririg: po lycyc le_bi cycl o_[ 714
33.2]decane C
A
\-tK
bond:QQ(Q~0_S)_sulfi 202
de_di-
s
chain:alkanelinear_hexa 446
decyl_C 16
bond:quatN_alkyLacycli 264
c C
K'
N—c
I C
c
borid:CX_halide_alkyl-X 170
trihalo (1 1 2-)
i
?
bond:CX_halide_alkyl-F 149
pertluoro hexyl
i Kf\/
Wy'
f \ f \ J
bond:CX_ha(ide_alkyl-X 153
aromatic alkane
C
Office of Research and Development
Center for Computational Toxicology and Exposure
Yang et al. J. Chem. Inf. Model. 2015. Richard et at., Chem. Res. Toxicol. 2016, 29(8) 1225 -
1251; Strickland et al., Arch Toxicol. 2018 92(1) 487 - 500; Wang et al., Environment
International 2019,126 377 - 386

-------
&EPA
United States
Environmental Protection
Agency
Train the Model
ToxPrints, NAM data or
both
Classification model development
Figure adapted from Katherine
Phillips
«$•

&
f? q,
*Z/ <>r
^ rV
^ J? ^
Has Benzene
Repeated many times
with different samples
to build "forest" of
classifier trees.
Chemical 1

N
Chemical 2

N
Chemical 3

S
Chemical 4
S
0
I1!
0
0
0
0
1
1
0
i
1
0
1
i
1
1
Yes (3)
[ Has Halide |
No (1)
v.
v
Features

YES
YES
cMOA
Predict with the Model
Has MOA S?
Chemical 5
Chemical 6
Chemical 7
MM
1
0

o
1
0
0
—
0
0
1
1
—
In Applicability
Domain of
Model?
Has Benzene
Has Halide |-
No Benzene
Has Benzene
No Halide )-
Features
Office of Research and Development
Center for Computational Toxicology and Exposure
Predict target with valid
models using features
MaWti models must:
•	accurately predict the
training set
•	predict beyond the
training set
•	be more predictive
than a model built on
randomized data

_^YES
+ NO
Repeated
with each
"tree"
-~YES
Has MOA S?
Probability for
N orS

-------
&EPA
United States
Environmental Protection
Agency
Classification model details
• Random Forest (Boosted Gradient Method) provided the best model results:
•	Split data into 80% training and 20% hold out (test) sets
•	Hyperparameter tuning with 5-fold cross validation, square-root sampling, etc.
Random Forest
Training set: "balanced" down-sampled subset (2104 chemicals w/ a cMOA = N or S)
High accuracy in both training and test sets (training = 99.7%; test = 95.8%) Random Forest simplified
Total Accuracy on all N + S data set = 97.6% (4356 cMOA = N or S)
• Across all N + S chemicals -> 105 chemicals misclassified:
•	24 Fpos{predicted S}
•	81 Fneg{predicted N}
Instance
Tree-1
Tree-2
Class-A
Class-B
	I	
Majority-Voting ¦
Final-Class
14
Office of Research and Development
Center for Computational Toxicology and Exposure
https://medium.corn/@williamkoehrsen/raridom-
foresl simple explanation- 37/89Sa60d2d

-------
&EPA
United States
Environmental Protection
Agency
Distribution of prediction confidence [0,1] by (N,S) class
Training Set


























S N
Prediction (Consensus_MOA)
Test Set
Unclassified Set
Training Set
Median: 0.99
-------
&EPA Prediction confidence across the cMOA = N or S
United States
Environmental Protection
Agency
Distribution of prediction confidence (PC) tends to
be > 0.8 for the classified data (cMOA = N or S)
Model has fewer # misclassifications in S
-Misclassifications for 93 cMOA confidence = 2,
and 12 with 1,3 scores (recall 3>2>1 for
confidence)
-~46% of the misclassifications can be attributed
to the chemicals with PC < 0.8
-~67% of the misclassification can be attributed
to chemicals with PC < 0.88
Distribution of Prediction Confidence
4065 Chemicals
>0.9 PC
(93.3% of data)
4225 Chemicals
>0.8 PC
(97.0% of data)
131 Chemicals
< 0.8 PC
(3.0% of data)
16
Office of Research and Development
Center for Computational Toxicology and Exposure

-------
&EPA
United States
Environmental Protection
Agency
Characterization of ToxPrint coverage across different classes
Heatmap representation of ToxPrints
U: 540
Narcotic
(S: 435 TxPs
Dataset Unclassified Specific
Missing structural
Domains across
full dataset:
Metal &
metalloid bonds
Propyleneoxide
chains
Bicyclo ring
structures
Office of Research and Development
Center for Computational Toxicology and Exposure
# ToxPrints: Dataset > Unclassified > Specific-acting > Narcotic

-------
n>EPA
United States
Environmental Protection
Agency
What are these 75 unique ToxPrints in the Unclassified set?
~7x more unique features in U (than in
N orS)
Could explain the lower prediction
confidence in N/S classification of the
U set
Potential for additional categories
based on structures:
-2 atom TxPs (metal group
-38 bond TxPs (metalloid: silane an
siloxanes...)
-8 chain TxPs (ethyleneoxide alkanes
C10-C20)
-19 group TxPs (amino acids,
polydentate ligands)
-8 ring TxPs
Frequency of ToxPrints per consensus MOA class
Narcotic & Specific-acting
I
i i ni
Unclassified
atom
bond
chain group ring
Office of Research and Development
Center for Computational Toxicology and Exposure
ToxPrint Hierarchy

-------
n>EPA
United States
Environmental Protection
Agency
Example: Differences in model prediction vs. cMOA:
Triasulfuron
,CK
N-sulfonylurea herbicide
Model prediction: ecific-acting
EnviroTox consensus MOA: larcotic
ECOSAR classification: .ulfonyl Urea
S(=0)_sulfonyl ToxPrint is enriched in the specific-acting MOA
space and 47 assays
Triasulfuron
II I

cl
CI'
O'
//
0 ^
o
CH,
CASRN 82097-50-5
DTXSID0024345
-i—i—i	r—i	1	i—i—
—T	1	1	1—
~
~
Active
Inactive
Not tested
i 8, i ni m y i y1 R $ '"i (n1 2	^	w £ iu, i
< o' < ® M	I	i i
5 ss 4 t	i	a
X	*1	LU
|	a
a. 9- —
"l	S
£ ^
£
8
o
Office of Research and Development
Center for Computational Toxicology and Exposure
_ - 2 2 q
°- § I I O $
& CL
I 5
CL
Q
J 1 . .
O CC CC DC
J _J
—I	1	1	1	I—
£ £
5 I
£ £ ^1
- £2	i	y
I ~l £	h	w
^ £|	y	2
y	s,
I -1 I CL O o O  ^
; to
o»
I
J
I
CL.

-------
S pp/y
Bsssssir™—. Preliminary predicted MOAs of the EnviroTox Unclassified set
Agency
•	674 chemicals in the EnviroTox dataset that had low confidence or ambiguous
consensus
•	Applied model to the Unclassified set and compared predictions to ECOSAR
classification
•	Currently extending this analysis to the additional 3089 unclassified chemicals
361 predicted as Narcotic	313 predicted as Specific-acting
~ ECOSAR Classified
I | ECOSAR Not Classified

-------
&EPA Unclassified chemicals, predicted Specific-Acting: Enriched ToxPrints
llnitoH Staton	* ¦	¦
United States
Environmental Protection
Agency
Criteria:
Results:
60
50
40
30
20
lO
O
-10
I
¦ I
•	> 3 chemicals per
chemotype
•	RatioofS:N>3
•	Or no N
lllalllll...
Ketones
Alkyl-Tri-halo
Sulfide, sulfonate, sulfonic acids
Benzopyran, benzopyrone
||.I..bI.bI.|||||.i|||.mI|ibI
"¦ " ¦ 	¦ ¦¦¦ I
U	i->	"LJ ^-J	¦U	CJ	¦ .
jS"	^	-q-i	c i "w
¦i" . 1	4?	oj	i	¦	T
5	3	S	^
U U (_? U	Qj	"*T~	"TT
w 4= C "w =	"O	¦&".£"	5"
a* re ^ a* oj	3 —	>u
C-Ct-cC	—	.oai	i—
O. oj at >n	c=;
=5 ^ ^ ^
I I —i
TT'T'S-o y-1	'P*	u w ^ rr ai .y
«3 § := fj § £ = = 2
-H1
^ 5 ?
I
.§ € = = s _
~ ~i -a	^ ¦ J K
£
£
3 5
.y	^ oj	u	^
ti	^ ^	1^1	'
=	¦& & 3	c	1NJ-
aj	| e	oj	.—•
^	~ 3	^
these features might be useful for refining chemical categories to capture more of the
chemicals currently unclassified
=—O	—	C	s^—TT
9
V?
c
=«
c
•I-
_o
_c
8
o'
II
V>
"b
-cj-
S? y I
cr "g «
SH o
¦o _a
I
5 1
9 a
"O"
II
o
_s-
"LJ
O
-O
Q
C
¦=:
.o
^ = -= £ j2. -5 =
_g _2
K
"b
c
o
X
¦a
=
o
_Q
^ I
P 5,
"I-
-Cl
>C
¦—1

X

"2"
X
"b
~
o
_Q
J 2
x:
w
"tj
~
o
-
o
"2"
X
w
-b
£Z
O
_lZ'
o
_c
—w	r—:	——
I -
Q. 9
Q irt «
"S
i-?
3 £
oil

d
~J
™
C
ro
jfi.
"S S" ™
1 -6 -6
s

O
L_
S
£
—,	
i_c. 'J3 c;
i i
o a
Qj
SP
£
§P
Office of Research and Development
Center for Computational Toxicology and Exposure
Bond
r~
Chain
Ring

-------
SEPA
UnitedStates	V| I nfilTiO r\/
Environmental Protection	| | | | | | Q | V
Agency	/
•	Identified relevant NAM information to develop a classification model for specific-
acting MOAs
-Increased the available chemical space of EnviroTox
•	Explored differences in predicted and consensus MOA via chemotype enrichments
•	Used model to inform ECOSAR preliminary set of unclassified chemicals
-Majority of unclassified chemicals were predicted to have a specific acting MOA
-Identified primary chemotypes for specific acting MOAs
•	Use methods to inform classification models for TSCA (New Chemical Categories)
•	Use chemotype enrichments to identify potential bioassays with bioactivity to provide
support of NAM data in category development
22
Office of Research and Development
Center for Computational Toxicology and Exposure

-------
SEPA
United States
Environmental Protection
Agency
Thank you!
23
Office of Research and Development
Center for Computational Toxicology and Exposure

-------