PFAS Toxprints:
A Hierarchical Structure-Based Categorization
Method for Characterization of Per- and
Polyfluoroalkyl Substances
ORD/CCTE/CCCB
ORISE / Unied States Enviromneta! Protection Agency
Ryan Lougee
and Education
The views expressed in this presentation are those of the presenter and do not necessarily reflect the views or policies of the U.S. EPA
-------
Global PFOS river discharge
2010 River discharge |kg yr'ji
* 00-10
1.1 -100
ft 10 f - 100 0
100 1 - 1000 0
1000 1 - 6873 7
195S-201S River discharge |Mg yr1}
Nol "Kluded
D0-10D.Q
100 1 -10QQ.Q
1000 1 -2093.0
If A A A /-
F FFFFFFO
Perfluorooctane sulfonate (PFOS) _
Perfluorooctanoic acid (PFOA)
https://scholar. harvard. edu/files/styles/os_files_xlarge/public/ccwagner/files/global_pfos_riverdischarge_webpage.png?m=1549744187&itok=UITzT_qf
-------
416
26
354-96-1
H H
-------
United States
Environmental Protection Home Advanced Search Batch Search Lists v Predictions Downloads
Agency
Share ^
RCF2CFR'R" (R cannot be H)
PFAS|EPA: PFAS structures in DSSTox
Q Search PFASS7RUCT Chemicals
M Identifier substring search
CI-
¦CI
o
ci
1,1,2-Trichloro-l ,2,2-trifIuoroethane
DTXSID: DTXSID6Q21377
r-ACUM.7fi._1 3.1
_rS'
m
Fulvestrant
DTXS I Di DTXS ID4022369
CI
CI-
O
¦F
¦F
1,2-DichIoro-1,1,2,2-tetrafluoroethane
DTXS ID; DTXS ID3G2 6434
-------
How Can We Form a Greater Understanding of
this Broad Chemical Category?
PFAS
N on polymers
I
fluoroalkyl Substances
fluoroalkyl Substances
Perfluoroalkyl acids (PFAAs)
Perfluoroalkyl carboxylic acids/
Perfluoroalkyl carboxylales (PFCAs)
Perfluoroalkane sulfonic acids/
Perfluoroalkane sulfonates (PFSAs)
1 f
Fiuorotelomer-based substances
Polymers
Fluoropdymers
Polyfluoroalkyl ether carboxylic acids
Side-chain tiuorinated polymers
Perfluoroalkane sulfonamides (FAS As}
Perfluoroalkane
sulfonamide substances
Perfluoropotyethers (PFPE)
-------
Why Categories?
"Use Categories" could show how these effect
the environment etc (surfactants)
Specific groupings of structures seem to
exhibit specific adverse effects C6-C8 chains
for instance in literature
The presence of certain functional groups
sulfonyls and phosphates as well also effect
adverse outcomes
We can build categories for the breadth of
byproducts, breakdown products,
alternatives, and scaffold structures
-------
What Do We Need From Categories?
* Structure-Based
* Useful categories that reflect:
Adverse outcomes
* Environmental outcomes
* Byproducts
Others
* Reproducible
* Easy to use
* Enables Automation &
Cheminformatics application
-------
Molecular Fingerprints
o -***
4 /
.* t
/ 0 N\
/ it \ \
i «
! 0/1
i y \
* a
* | » '
J is
V CfT"
% %
* %
%
\
h!
H.,-/
ECFP
p^/
r
w
\ -
o
O
W
y
Toxprints
Output:
0100000101101110011001000111001001100101011101110010000001010111011010000110010101100101011011000110010]
0111001000100000011010010111001100100000011000010010000001100110011000010111010000100000011000100110100]
0111010001100011011010001
-------
Toxprints
729 fragments
PFAS substructures
Good functional groups
Some scaffolds
di
bond:CX halide alkyl-F perfluoro butyl 147
' )j
F r
I
"TV
F
F
211
bond:CX_halide_alkyl-F_perfluoro_ethyl 148 1
F F
\ /
c C
/ \
F F
354
bond:CX_halide_alkyl-F_perfluoro_hexyl 149
1 F F Fv F
1 \/ \/
F/C\c/C\c/C\
-'V'\ 1
135
bond: CX_halide_alkyl-F_p erf luoro_octyl 15 C
1 \ / \ / \ /
f \ f \ f \
59
bond:CX halide allcyd-F tetrafluoro (11 151
1 2-)
F
F
| F
F
283
bond: CX_h a 1 i d e_a 1 Icyl - F_trif 1 u o ro_(1 _1 _1 -) 152
F^c/F
F
336
-------
WHAT PFAS
CONCEPTS ARE
MISSING?
FLUORINATED RINGS
BRANCHING
MULTIPLE R GROUPS
POLYFLUORINATION NOT
CAPTURED WELL
ALTERNATIVE
HALOGENATION
MANY FUNCTIONAL
GROUPS
SPECIFIC CHAIN
LENGTHS
Table 5 Selected cases outside the current scope of split PFAS.
Example structure
Explanation
Branched or cyclic pcrtluoroaJkyl chains
99324-96-6 Other
examples: 28788-68-3
This structure contains a branched pcrtluoroaJkyl chain with two
terminal CF3 groups. To capture these, the default "pacs" SMARTS
may need adjusting in future studies. It is likely that results for
scenarios (iii) to (v) would be similar to those already observed
Polytluoraalkyl (not pcrfluoroalkyl) chain
The default "pacs1* SMARTS in splitPFAS currently searches for C-C
or C-F bonds, thus any structures with a non-C or F atom in the
iluoroalkyl chain will not fulfil the pattern, like here where the
pattern is H-(C JF^J-X-R, where here X = C(=0). Other members
followed e.g. a Cl-{Q,FJ.n)-X-R pattern. These can be captured by
adjusting the "pacs" option
The functional group R is F only
These substances likewise failed the SMARTS pattern encoded into
splitPFAS, which currently excludes compounds with a generic
formula C^^j-X-F. This could be addressed by adjusting the
"pacs" option as well in future studies
Multiple R groups
H-.N
These examples were outside the scope defined for this article,
examples of the form Rj-X-IC^Pi^-X-Rj arc split correcdv, but result
in two PFAS chain results, which we did not consider further here
Multiple X Groups
For compounds in the form of (C(TF,utj)X-R-X(C>MF2m+,)J the main
issue is how to define C-X-R. There arc built-in options to try various
splitPFAS options in future studies
-------
Time to Build Something New
j\ A | B
C
D
E
F
G
H
1
Condition
Primary Level
Secondary Level
Tertiary Level
2
DSSTox_Sub
3
TxP PFAS acrylate
4
bQnd:C[=0)0 carboxylicEster alkenyl
5
AND bond:C(=0)0 carboxylicEster acyclic
6
AND chaimalkeneLinear mono-ene ethylene generic
7
AND bond:C=0 carbonyl generic
8
2
DTXSID5067:
9
TxP PFAS alcohol
3
DTXSID50591
DTXSID0019'
DTXSID30475
10
AND
bond:COH alcohol aliphatic generic
5
11
AND
bond:COH alcohol generic
6
7
DTXSID30655
DTXSID8037J
12
8~
DTXSID6037J
13
TxP PFAS alcohol primary
y
10
DTXSID0059J
DTXSID9059J
14
AND
bond:COH alcohol pri-alkyl
15
11
12
DTXSID1062
DTXSID5044J
16
TxP PFAS alcohol primary FT diol
17
AND
bond:COH alcohol sec-alkyl
13
DTXSID70295
18
14
DTXSID4059J
19
TxP PFAS alcohol primary FTn1
15
DTXSID0019(
20
AND
bond:C(~Z)~C~Q a-haloalcohol
16
17
DTXSID6037J
DTXSID6037;
21
22
TxP PFAS alcohol primary FTn2
18
19
?0
DTXSID5038;
DTXSID3059<
DTXSID6027'
23
AND
chaimalkaneLinear propyl C3
24
AND
chain:alkaneLinear ethyl C2(H gt 1)
25
21
DTXSID7027J
26
TxP PFAS alcohol polyF
27
AND
bond:C(~Z^C~Q_a-haloalcohol
28
NOT
bond:COH alcohol pri-alkyl
29
30
TxP PFAS alcohol sulfonylamide
31
AND
bond:S(=0)N sulfonylamide
32
A BCDEFGHI
J
K
L
M N
O
P
Q
R
S
T
U
V
W
X
Y
z
AAAB ACADAE
AFAGAH
Al
AJ AKALAfvANAOAPAQARAS
AT AU AVA\A AX AY AZ BA BB
BC
BD
BE BF BG BH Bl BJ BK BL BM BN BO
BP BQ BR BS BT
BU BV BW
BX
BY BZ CA CB CC
CD CE
LU
O
3
(D
CD
q
O"
o
3
CL
O
'T
O"
o
ZJ
Cl
6
X,
cr
o
ZJ
CL
O
"T
O1
o
ZJ
CL
b
TT
CT
o
Cl
b
"TT
CJ
O
ZD
CL
O
TT
CJ
o
13
Cl
b
TT
cr
o
CL
b
TT
cr
o
ZD
Cl
b
ii
o
O"
O
ZD
Cl
b
II
O
cr
o
ZJ
Cl
b
o
cr
o
ZJ
CL
b
2
cr
o
3
Cl
b
cr
o
3
Cl
b
o
n
cr
o
Cl
b
o
T
cr
o
ZJ
Cl
b
O
~T
cr
o
3
Cl
b
o
T
cr
o
3
CL
b
o
T
cr ex
o o
ZJ ZJ
CL CL
b b
X x
cr cr
0 o
ZJ ZJ
CL Cl
b b
X X
1 1
cr
0
ZJ
Cl
b
X
1
CT
O
ZJ
CL
3
(0
cr
o
ZJ
Cl
o
N
CT
O
Z!
Cl
CO
TT
r->
CT
o
3
Cl
CO
TT
o
ZJ
Cl
in
TT
o
cr
o
ZJ
Cl
CO
IT
f-i
cr
o
ZD
Cl
CO
IT
CT
O
ZJ
Cl
CO
IT
o
cr
o
3
Cl
09
II
o
o
zr
QJ
p
QJ
. j
IT
QJ
ZJ
QJ
QJ
p
QJ
zr
QJ
ZD
9-L
7T
Q)
o
IT
QJ
p
hi
cr>
3" ZT
QJ QJ
zj" Zj"
QJ_ QJ_
(Q
QJ
s1
¦o
l
"D
~n
>
£
"D
"D
~n
>
57
"0
1
Tl
~n
>
"D
"0
~n
>
£
TJ
~D
Tl
>
£
T)
"0
~n
>
"0
"0
~n
>
3
"0
"0
~n
>
"0
~D
-n
>
"D
"0
~n
>
"0
"0
"*n
>
51
~0
Tl
~n
>
"0
"0
~n
>
£
"0
"D
"n
>
TJ
"0
~n
>
~0
"0
~n
>
"D
1
"D
~n
>
"D
"D
~n
>
"0
"0
~n
>
Tl
~D
~n
>
T)
~0
~n
>
"D
"0
~n
>
"D
"D
"Tl
>
"0
~o
-n
>
"D
"0
~n
>
"D
"O
~n
>
TJ
"0
~n
>
"D
"0
~n
>
TJ
"D
Tl
>
~0
~o
Tl
>
"D
"D
~n
>
£
Tl
"D
Tl
>
"D
"0
-n
>
"0
Tl
~n
>
-------
1005 > =-
1085 |
NEW_TXP_PFAS.xml X toxprint_V2.0_r711.xml X PFAS_TXP_vljcml X
ToxPrint PFAS/PFOA Categories Version 1.0
$Id: PFASCategories.xml Lougee $
$Author: Lougee $
10
11 »»
TXP_PFAS_acrylateTXP_PFAS_acrylate
CSRML:
Chemical Subgraphs and
Reactions Markup Language
(CSRML)
XML based language
Supports connectivity and
topology but also properties
of atoms, bonds, electronic
systems
TXP_PFAS_acrylate
1
0
7
-------
INTERESTING
THINGS ABOUT
CSRML:
HIERARCHY
MULTIPLE DISTINCT
SUB-STRUCTURES
INTERESTING ATOM AND
BOND TYPES
Ex: Ring & Chain Atom
CUSTOMIZABLE ATOM AND
BOND TYPES
SPECIFIC CHAIN LENGTHS
DTXCID701D33479
RANGES OF CHAIN LENGTHS
0
1
TxP PFAS C6toCS exc
c
-
~l~
F
44
^ FAS TXP v1.xml
Chemotype Sets
v 0 TxP_PFAS_Categories
0 TxP_PFAS_COOR
0 TXP_P FAS_a cry late
0 TxP_PFAS_acylhalide
^ 0 TxP_PFAS_alcohol
0 TxP_PFAS_alcohol_polyF
v 0 TxP_PFAS_alcohol_primary
0 TxP_P FAS_a I co h o l_p ri m a ry_FT_d.,
0 TxP_PFAS_alcohol_primary_FTn1
0 TxP_PFAS_alcohol_prirnary_FTn2
0 TxP_P FAS_a I c o h ol_su Ifonyl amide
0 TxP_P FAS_a I dehydean hydride
0 TxP_PFAS_alkylXprimary
0 TxP_PFAS_alkylXtertiaryxCO
v 0 TxP_PFAS_amine
0 TxP_PFAS_amine_ether
0 TxP_PFAS_amine_primary
0 TxP_PFAS_carboxamide
0 TxP_PFAS_ether
0 TxP_PFAS_ethylene_xCO
0 TXP_PFAS_ketone
0 TxP_PFAS_oxidehydroxy
0 TxP_PFAS_perFhexyl
0 TxP_PFAS_perFoctyl
0 TxP_PFAS_siIane
v 0 TxP_PFAS_suIfonyl
v 0 TxP_PFAS_sulfonamide
0 TxP_PFAS_sulfonamide_alcohol
v 0 TxP_PFAS_sulfonate
0 TxP_PFAS_sulforiate_FTn2
0 TxP_PFAS_sulfonylhalide
-------
How These Were Built:
Structure Aggregation
Searching Through literature to find interesting byproducts and
structures
Structures related to Adverse Outcomes
Buck et al expert categories
OECD category structures
Missing OECD categories
Once these were built I could filter out functional groups and structural
groups and see what may still be missing
Finally, generalized groups were added to capture broader categories
-------
How Were These Built:
Programming Process
DTXCID4035251
Syntax similar to XML
Loaded into an IDE
Looked for similar structure to what I was
interested in
Examined code
Repurposed it
Tested it in the Chemotyper
Resolved loading errors
Once structures loaded correctly, checked
against dataset of PFAS to see that they
correctly captured intended chemical concept,
and chernotype looked correct
Eventually, understood CSRML well enough to
construct new concepts
Lastly, encoded the hierarchy
PFAS_test2.sdf -
File Edit View Selection Find Packages Help
TXP_PFAS_v1.6 I
Se
ChemoTyper
The specified XML file is not a valid CSRML file:
C:\Users\Adrnini5trator\OneDrive\Profile\De5lctop\TXP_PFAS_v1,6.xml
o ggggx
OK
Hide Details...
Error affile "C:
\U sers\Ad mini strator\On eDrive'VP rofi I e'yDesktop\TXP_P FAS_v1.6.xm Iline
31976, column 41
Message: no character data is allowed by content model
2
033479 14
DTXCIDS0113437
15
1
l< «< « < 12/147 > » »> >l
0 TxP_PFAS_Q6_ring
0 TxP_P FAS_Q7_ri n g
0 TxP_PFAS_Q8_ring
0 TxP_PFAS_Q9_ring
> 0 Bicyclo Rings
0 Chain Triple
' 0 Misc Functional Groups
0 TXP_PFAS_alkyne
0 TxP_PFAS_amine
0 TxP_PFAS_imino
0 TxP_PFAS_nitrile
0 TxP_PFAS_nitro
0 TXP_PFAS_phosphate
0 TXP_PFAS_su)fonyl
TxP_PFAS_nitro
TxP_PFAS_Bicydo_Q6-Q5_ring
TxP.PFAS.nitrile
l< «< « < 59 / 68 >
31988
C:\User5\Adn1rnistrator\OneDrive\ProfiIe\Desfctop\TXP_PFAS_vl ,6jtml CRLF UTF-S XML Q GrtHub
-------
Some Things I Like
*£ ChemoTyper
a
PFAS test2.sdf
O @ O O 0l x
DTXCID8027583
f F F F
# F F F
DTXCID20S30876
19
DTXCID2039041
DTXCID20897070
ft
20
10
DTXCID4035251
DTXCID9D2757S
l< «< « < 1/26 > » »> >|
Filter Structures by ID type ID Filter Pattern
Filter Chemotypes
No Filter
Structures Loaded: 147 Total Coverage: 147 Selected: 0 Matched: 26 ID: NAME
12
22
TXP PFAS v1.6.xml
O 30Q1[
Chemotype Sets
v [¦] Perfluoro Chain Length Exclusive
~ Tx
~
~
0 TxP_P FAS_C4_exc I
~
~
~
~
~ Tx
~
~
~ Tx
~
~
~
v [S] Perfluoro Chain Length Exclusive Un...
~
~
~ Tx
0 TxP_PFAS_C4_nocap_excl
~
~
~
~
~
~
~
~
~
l< «< « < 2/2; > » »> >|
TxP_PFAS_C4_e^cl
I
FC*
I
T
T
f-rf
17 F
TxP_PFAS_C4_n^icap_excl
i
F-C-?
I
f-r
ff
F-f-F
Filter Chemotypes by ID type ID Filter Pattern
Filter Structures
Containing Any Selected Chemotype (OR)
Chemotypes Loaded: 143 Total Coverage: 86 Selected: 2 (110 hidden) ID: Auto
-------
How to use these now?
ChemoTyper
PFAS testlsdf
O 300 01
DTXCID9039369 48
52
DTXCIDS0331283 56
J
DTXCID3040061 49
DTXCID8Q1Q21863
53
DTXCIDS05"
50
DTXCID50103C§72
» »> >|
Filter Structures by ID type ID Filter Pattern
Filter Chemotypes
No Filter
Structures Loaded: 147 Total Coverage: 147 Selected: 0 ID: NAME
59
TXP PFAS v1.6.xml
-OX
o agggix
>
v
Chemotype Sets
v 0 PFAS Toxprint Categories
v 0 TxP_PFAS_generic_CF2_CF
v 0 TxP_PFAS_generic_CF_chain
0 TxP_PFAS_generic_C2F4
0 Perfluoro Chain Length Exclusive
0 Perfluoro Chain Length Exclusive U,.
v 0 TxP_PFAS_generic_CF_ring
> 0 Bicyclo Rings
> 0 Carbon Rings
0 Fluorinated Carbon Rings
> 0 General Rings
0 TxP_PFAS_polyF_generic
0 Branching
> 0 Chain Double
0 Chain Quads and Above
> 0 Chain Triple
0 Functionalization Categories
v 0 Carbon Bonds
0 Fluorotelomer-type
0 TxP_PFAS_3lkene
0 TxP_PFAS_alkene_ether
0 TXP_PFAS_alkyne
0 Nitrogen-Based Functionalization
0 Oxygen-Based Functionalization
0 Phosphate-Based Functionalization
0 Silicon-Based Functionalization
0 Sulfur-Based Functionalization
0 TxP_PFAS_inorganic_F
0 TxP_PFAS_other_halogens
l< <« « <
TxP PFAS generic CF2_ 2
CF~
s
I
F^F
136
TxP_PFAS_generic_CF_c 3
hain
F
I
C
144
TxP_PFAS_generic_CF_ri 4
ng
F
I
C
7
TxP_P FAS_C 1 _excl
F
I
F C*
I
F
32
TxP PFAS C2 excl
F
I
* C F
I
FC F
F
9
TxP PFAS C3 excl
_ ~ f
1
F-f-F
FCF
1
F-C-F
20
TxP_PFAS_C4pexcl
F-C-*
I
T
F-C-F
17 f
TxP PFAS CS excl
vn
F-C-F
F-fF
FT
F-f-F
4 F
2/143
> » »> >l
Filter Chemotypes by ID type ID Filter Pattern
Filter Structures
No Filter
Chemotypes Loaded: 143 Total Coverage: 86 Selected: 143 ID: Auto
-------
How to use these now?
Backlogs
Bug Backing
Feature Requests
Wiki
Wife Homepage
Eventually here: https://t0xprint.0rg/#T0xPrintChem0tvpes
Wk
Advanced Search
Acknowledgement
ToxPrint Chemotypes
The ChemoTyper organizes the current version ToxPrint chemotypes into three functional areas:
1. Generic Structural Fragments
2. Structural Rules and Alerts
3. Category Classifiers
Generic Structural Fragments
Generic structural fragments are organized by atom, bond, chain, ring types as well as chemical groups including amino acids carbohydrates, ligands, and nucleobases based on 729 essential
chemotypes of the current ToxPrint_v2.0_r1520.xml (whatever the file name). These chemotypes can be generated as chemical fingerprints, either in binary (0/1) or counts data. They can be
used to calculate similarity measures or structural feature descriptors for building models. (Yang 2015)
Structural Rules and Alerts
These can be developed using ToxPrint chemotypes as building blocks. The chemotypes defined in the ToxPrint set can be further refined or coded with properties (atom, bond, molecular, or
physicochemical) to constrain the matches in order to enhance the signal-to-noise ratio of ToxPrint chemotypes when profiling the biological observations. To this end, we are developing
ChemoType Editor to empower the users with the ability to fluently manipulate the CSRML query definitions graphically in a molecular editor. Please contact MN-AM if you are interested.
Ashby-Tennant Genotoxic Carcinogen Alerts
DNA binders
Protein binders
General Liver Alerts
Lougee.Ryan@epa.gov
-------
OECD PFAS PROFILE
TxP_P FASC 13_n oc a p_e xc I
TxP_P FASC 12_n oca p_e xcl
Tx P_P FASC llnocapexci
Tx P_P FAS_C10_nocapexci
Tx P_P F AS_C9_no ca pexci
Tx P_P F AS_C8_no ca p_exc 1
Tx P_P F AS_C7_no ca p_exc 1
Tx P_P F AS_C6_no ca pexci
Tx P_P F AS_C5_no ca pexci
TxP_PFAS C4 nocap exci
Tx P_P FASC3 noca pexci
Tx P_P F AS_C2_no ca p_exc I
TxPPFASCinocapexcl
TxP_PFAS_C15_plus
T xP_P FAS_C14_e xcl
TxP_PFAS_C13_excl
TxP_PFAS_C12_exci
TxPPFASCllexcl
TxPPFASClOexcl
TxP_P FAS_C9_exc I
TxP_P FAS_C8_exc I
TxP_PFAS_C7_excl
TxP_PFAS_C6_excl
T xP_P FAS_C5_exc I
T xP_P FAS_C4_exc I
TxP_PFAS_C3_excl
TxP PFAS C2 excl
TxP_PFAS_alternative_halogen_l
TxP PFAS alternative halogen Br
TxP PFAS afternatrve halogen CI
TxP PFAS alternative_halogens
TxP P FAS J norganic F P
TxP PFAS inorganic F S
¦
TxP PFAS inorganic^F
m
¦
L
100
200
300
400
500
TxP_PFAS_C6_ri ng
TxP_PFAS_C5_ri ng
TxP_PFAS_C4_ri ng
20 40 60 80 100 120 14
TXP^PFAS^aikyne
TxP^PFAS^al ke neither
TxP_PFA5_alkene
TxP^P FAS^sulf oni c^acid
TxP_P F AS^sulfide
TxP^P F AS _di sulfide
TxP^P FAS^su Ifony lhali de
TxP^P FAS^sulfonate
TxP_P FAS^sulfonami tie
TxP^P FAS^su Ifony lamide
TX P_P FAS_sulf onyl_y 2
TxP^P FAS^suIf onyl
TX P_P FAS_u re thane
TxP_PFAS_urea
TxP_P FAS^nitroso
TxP_PFAS_azo
TxP_PFAS_imino
TxP_PFAS_nitrile
TxP_PFAS_nitno
P FAS_ami ne_quaternary
TxP_P F AS_ami ne_te rtiary
;P_P FAS_ami ne_secondary
TxP_P FAS_ami ne_primary
TxP_PFAS_amine
TxP_PFAS_diol
TxP_PFAS_aIcohoI
TxP_P FAS_oxi dehydroxy
TxP_PFAS_acy lhali de
TxP_P FAS_al dehyde
TX P_P FAS_keto rte
TX P__P FAS__ac rylate
TxP_PFAS_ester
TxP__PFAS__carboxylic acid
TX P__P FAS__e p oxide
TxP_PFAS_ether
TxP_P FAS_ca rboxamide
TxP__PFAS carbony l__thio
TxP PFAS carboxamidine
100
200
300
400
500
600