Revisiting and updating
chemical groupings with new
approach methodologies
US EPA in collaboration with Health Canada,
Environment Climate Change Canada
*The views expressed in this presentation do not represent US EPA policy or endorsement. Mention of
trade names of commercial products should not be interpreted as an endorsement.

-------
Team Members
Accelerating the Pace of Chemical
Risk Assessment (APCRA)
US EPA
•	Dan Chang
•	Kellie Fay
•	Kristan Markey
•	Martin Phillips	*
•	Grace Patlewicz
•	Ann Richard
•	Gino Scarano
•	Mahmoud Shobair
•	Haseeb Bah a (summer intern)
•	Ellery Saluck (summer intern)
Environment & Climate
Change Canada (ECCC)
•	John Prindiville
Health Canada
•	Mark Lewis
ILS
•	Kamel Mansouri

-------
Overview
A chemical category is a group of chemicals whose physicochemical and human
health and/or ecotoxicological properties and/or environmental fate properties are
likely to be similar or follow a regular pattern, usually as a result of structural
similarity. - OECD
• Applications of chemical categorization include first tier assessment efforts
and read across from structurally similar analogs:
•	Toxic Substances Control Act (TSCA) New Chemical Program Chemical
Categories (NCC; US EPA)
•	ECOSAR (focus of presented work)

-------
US EPA ECOSAR Chemical Classifications
Class-based SAR to predict aquatic toxicity
Classification scheme identifies excess toxicity
Estimates acute and chronic toxicity based on
accumulated data and past decisional precedents
Acute Effects:	Chronic Effects:
Fish 96-hr LC50	Fish ChV
Daphnid 48-hr EC50	Daphnid ChV
Algae 72/96-hr EC50	Algae ChV
• Profiler in OECD QSAR Toolbox

-------
Narcosis vs. specific-acting toxicity MOA
Less Toxic
Regulators
(ECCC) consider
MOA information
to determine the
size of assessment
factors
O
¦
X
¦2
J2
o
U)
o
1
o
-1
-2
-3
-4
-5
-6
-7
-8
More Toxic "9
-10
"Baseline" or "Narcosis" mechanism shown
by all organic toxicants lacking a more
specific mechanism
t
Points falling below the "Baseline
Toxicity" presumed to have more
specific mechanism
-2	-1	0	1	2	3	4	5	6
Log P
~ Narcosis AChE Inhibitors • Reactive * Unknown « Uncouplers ¦ Neurotoxicants

-------
Potential approach for updating chemical
categories
•	Almost half of all New Chemical inventories
across regulatory jurisdictions cannot be
categorized using NCC or ECOSAR
•	Some fall into multiple categories
How to update?
•	Incorporate New Approach Methodologies (NAMs) - i.e., ToxCast and
Tox21 biological activity information
•	Apply cheminformatic approaches

-------
General approach
Training
set
chemicals
Characterize
training set
Well-defined MOA (narcosis vs. specific-acting)
NAM data in vitro toxicity data
in vivo toxicity data
Representative of chemicals of interest for prediction
1.	ECOSAR classes
2.	NCC
3.	Chemotype fingerprints (ToxPrints)
Model
NAM data, chemotypes and combination of both
Evaluate different machine learning algorithms

-------
Training set chemicals
Training set chemicals
1. Chemicals with in vivo eco-data - from
the EnviroTox1 database - 4016
2.	Sub-selection for chemicals with NAM
data (ToxCast and Tox21) - 1904
3.	MOA predictions based on 4 publicly-
available classification models
¦	VERHAAR, ASTER, OASIS, TEST
¦	Each predicts Narcotic, Specific-Acting
or Unclassified-
Consensus MOA with confidence scores2
Examples:
NNNN = N, score =3
NNSN = N, score= 2
SUSS = S, score= 2
NUNS = U, score = 0
Results:
880 Narcotic
350 Specific-acting
674 Unclassified
1Health and Environmental Sciences Institute (HESI). 2019. EnviroTox Database &
Tools. Version 1.1.0 Available: http://www.envirotoxdatabase.org/
2 Kienzler et al.. Environ Toxicol and Chem. 2019, 38(10) 2294-2304

-------
Characterize training set chemicals:
ECOSAR classes
Not classified
i
265
¦
Neutral Organics:
"enriched" in narcotics
Consensus MOA
Narcotic
Specific-acting
Unclassified
Non-Neutral Organics:
includes narcotics (e.g., esters)

-------
Characterize Training Set Chemicals:
ToxPrints
• Pull in chemotype information for
our chemicals via ToxPrints
•	Publicly available tool
•	EPA Comptox Chemistry Dashboard
ToxPrints:
S 729 chemical features
S Chemically interpretable
S Coverage of diverse chemistry
S Includes scaffolds, functional
groups, chains, rings, bonding
patterns, atom-types
ring:aromatic_biphenyl 587
cx Nc
C ,CN c.
c c c
C. X
c
ring:fused_steroid_geae 602
ric_(5_6_6_6]^ c c ,-c
C
,C. .C.. ,C-./
C* c c'
c.. .c.. .c
c" c
bond:PG_pbosp}iite 256
C P C
i
s
chair\:alkarieLinear_octyl 442
_CS
c c c c
ring:fused [5 7] azulene591
c f c
c Vc
rir»g: po lyc/c le_bicyd o_[ 714
33.2]decane C
/1
VcK
bond;QQ{Q-»0_S)_sulfi 262
de_di-
S
S
cti a i n:al ka r*eL i n ear_h exa 446
decyl_C16
boncJ:q uatN _a lkyl_acycl i 264
c C
N— r
I C
c
bond:CX_halide_alkyl-X 170
trihalo (1 1 2-)
VS
I
?
bond:CX_halide_alkyl-F 149
pertluoro he*yl
i \f \ /
^/vVF
f'V'\ 1
bond:CX_halide alkyl-X L
_aromatk_alkarie
?Ni
c
Yang et al. J. Chem. Inf. Model. 2015. Richard, et al., Chem. Res. Toxicol. 2016,
29(8) 1225 - 1251; Strickland et ah, Arch Toxicol. 2018 92(1) 487 - 500; Wang et
al., Environment International 2019, 126 377 - 386

-------
Classification model development
Train the Model
ToxPri]
or both

Ny
ToxPrints, NAM data	^
4?4? v
<5p <5p <3p Gq
s$r Sp
^ ^ •sp'
Chemical 1

N
Chemical 2

N
Chemical 3

S
Chemical 4
S

0
M
0
0
0
0
1
1
0
1
1
0
1
1
1
1



J
¦V"
cMOA Features
Repeated many
times with different
samples to build
'forest" of classifier
trees.
YES
NO YES
Has MOA S?
Figure adapted from
Katherine Phillips
Valid models must:
accurately predict
the training set
predict beyond the
training set
be more predictive
than a model built
on randomized data
Predict with the Model
Chemical 5
Chemical 6
Chemical 7
0
l

1
ol

0
l

0
0
—
M
K

1
1
—
v.
In
Applicability
Domain of
Model?
	v	
Features
j
Has Benzene )¦
Has Halide
Repeated
with each
*^S "tree"
No Benzene
NO
Has Benzene
No Halide
+YES
Predict target with valid
models using features
Has MOA S?
p
Probability for
Nor S
ll

-------
Preliminary results
Random Forest provided the best model results:
•	Trained on a "balanced" down-sampled subset (675 cMOA N+S)
•	Training Out-of-Bag (OOB) error rate = 10.2%
•	Total Accuracy on the full N+S data set = 94.5% (1230 cMOA N+S)
•	68 chemicals misclassified:
Distribution of Prediction Confidence by Class
1230 cMOA
11 Fpog{predicted S}
57 Fneg{predicted N}
Random Forest Simplified
Random Forest

Tree-1
0
Class-A
Tree-2
Q
Class-B
Tree-n
0
Class-B
CD
O
rt
CD
T3
id
a
o
o
d
o
•	rH
o
•	rH
0)
Vl
PL,
Majority-Voting |
IFinal-Class
Median = 0.87
Mean = 0.83
Median = 0.86
Mean = 0.82
https://medium.eom/@williamkoehrsen/ra
ndom-forest-simple-explanation-
377S95a60d2d
N
S
Predicted cMOA

-------
Example: Differences in model prediction
vs. cMOA: Triasulfuron
,CH,
N-sulfonylurea herbicide
Model prediction: >eci ic-ac ing
EnviroTox consensus MOA: Jarcotic
ECOSAR classification: 5ulfonyl Urea
S(=0) .sulfonyl ToxPrint is enriched in the specific-acting
MOA space and 47 assays
r^l
o
//
O'
//
CHo
CI'
T riasulfuron
II I
c
T3
o. a. 9-
3 3, 3
a a 9- -
- 3. D,
<2 
s'i
si
-¦ , =. -I ° ® *¦ S I 5 .2 5 _ „ 2 g 2 5 2
III! f illUiJP
mi
I
I
e i I
I I
2 & *' k a wi q
2 x o p x in "
h (j) x w co
•t to	I CD
(A
CD
5 ? if ?
< LU
r J
UJ
2
I
i 5	2
S	£
o	3
y11
Z	UJ.
iK
53 S 2 p1 s1 y1
n1 ,J It H UJ
^ r
tn
'Jin
£L O Q
o, 0. £L
> 3 8i
i j
cc	±
o	0
a.	y
6	5
3	z.
*
5
o
CL
o.
5
£L
j
Z
Q.
I
UJ.
u'y
LU.
I
£T
O
£L
O
0
LL
~
1
t
*
'j
Active
Inactive
Not tested
13

-------
Predicted MOAs of the Unclassified set
674 chemicals in the EnviroTox dataset that had low confidence or
ambiguous consensus
Applied model to the Unclassified set and compared predictions to ECOSAR
classification

-------
Unclassified chemicals, predicted Specific-
Acting: Enriched ToxPrints
60
50
40
30
20
10
0
-10
II. I
I
Criteria:
•	> 3 chemicals
per chemotype
•	Ratio of S:N > 3
•	Or no N
lilillill...
Results:
•	Ketones
•	Alkyl-Tri-halo
•	Sulfide, sulfonate, sulfonic acids
•	Benzopyran, benzopyrone
		l-.i-iill>-.iii
I
I...I
tJ U -T-	-7- -7-	h-h
_2~ ^	"h—	^ "-w	'i. e "c "q -2"	s«~ i—i
® Oil — So rn r"n	I T1 ^	Q.	|	p 	j
Q
C.
^	ii ^ 4) u	i
M a °i °i y E
^ t§ ^ ^ ^ £ ° g' g'

T	T" T"	"T"	"T*	5^S-Si a-t2
oJ	i—i c-J	oS	rsi	>5 S 1 ¦% .5= 9r ¦«. ^3- ^ c i_
i	i i	i	i	e ^ £ i ^ ^
«	»-H ^	M	»—I	S- ^ *0 & J= C
'	' '	'	~i	-
J *5
1 —¦		
-Oil	£L_
_ _ « « J=, -J a-
« —I JJ.' M Clfl	I LJ- "S
_a	s-l_=	—1 —1 i	sl_
«—>	'U w ¦ c
E 111 1^1
«) a
afl -h' s
	!_Z	1-1	
s
OJ
"I
these features might be useful for refining chemical categories to
capture more of the chemicals currently unclassified
Bond

-------
Summary
Identified relevant NAM information to develop a classification model for
specific-acting MOAs
Explored differences in predicted and consensus MOA via chemotype
enrichments
Used model to inform ECOSAR unclassified chemicals
•	Majority of unclassified chemicals were predicted to have a specific acting
MOA
•	Identified primary chemotypes for specific acting MOAs

-------
Next steps/ongoing work
Leverage more invitroDB chemicals beyond the 1905 EnviroTox chemicals
•	Generated KNIME workflow for the consensus MOA calls
•	Greater coverage of the NAM assay space
•	>7000 chemicals with MOA calls
~	Integration of HTS and transcription assay data
Use methods to inform classification models for TSCA (New Chemical
Categories)
Use chemotype enrichments to identify potential bioassays with bioactivity
Thank you!

-------