v>EPA
United States
Environmental
Protection Agency
EPA Publication Number 601B24001 | September 2024
§¦ H
I
|C •
PCt#
W' J
1 1$
-1 -
/>
z.:m
# JIT
J V
I
Office of Research and Development
Center for Computational Toxicology and Exposure
Toxicity Forecaster
(ToxCast™)
Assay Description
Documentation
-------
EPA Publication Number 601B24001
TOXICITY FORECASTER (TOXCAST)
ASSAY DESCRIPTION DOCUMENTATION
September 2024
Madison Feshuk1, Ashley Ko12, Manasvinee Mayil Vahanan13,
Kelly Cartsens1, Alison Harrill1, Katie Paul Friedman1
1Center for Computational Toxicology and Exposure, Office of Research and Development, US EPA, Research Triangle Park, NC
2Oak Ridge Associated Universities (ORAU) National Student Services Contract at US EPA
3Oak Ridge Institute for Science and Education (ORISE) at US EPA
-------
Overview
The Toxicity Forecaster fToxCast™) program at the US Environmental Protection Agency (US EPA)'s
makes in vitro medium- and high-throughput screening assay data publicly available for prioritization and
hazard characterization of thousands of chemicals. Please review the vignette for the ToxCast Data Analysis
Pipeline (tcpl) R package for comprehensive documentation describing ToxCast data processing, retrieval,
and interpretation. Given ToxCast includes a heterogeneous set of assays across a diverse biological space,
annotations in the database help users flexibly aggregate and differentiate processed data whereas assay
documentation aligned with international standardization efforts can make ToxCast data more useful and
interpretable for use in decision-making.
This documentation for the ToxCast assay endpoints is in a format outlined by the OECD Guidance
Document 211 (GD211) for describing non-guideline in vitro test methods and their interpretation. The
intent of GD 211 is to harmonize non-guideline, in vitro method descriptions to allow assessment of the
relevance of the test method for biological responses of interest and the quality of the data produced.
This document contains reports for 809 assay endpoints accompanying the invitrodb v4.2 release
(September 2024). Please utilize the Table of Contents, Ctrl+F, or the Bookmarks panel to navigate to
specific assay sources and endpoints of interest. These reports are a work in progress and will be
iteratively updated as more information becomes available.
For additional questions or concerns, please contact Madison Feshuk (feshuk.madison@epa.gov).
Disclaimer
This report does not reflect the views or policies of the US Environmental Protection Agency. Company
or product names do not constitute endorsement by US EPA.
duration Summary
For this effort, existing database information from ToxCast's invitrodb was first reviewed to populate as
many GD211 stipulated fields. Assay element and auxiliary annotations were leveraged, though this
information is often short in a standardized format or using controlled vocabulary. Missing fields were
identified and selected for curation. This curated information has no character limit therefore can
provide users with the most robust description of the assay technology and its relevance. All curated
information has also been databased in the updated "assay descriptions" table of invitrodb. Fields and
their descriptions, modeled after their corresponding GD211 sections, are provided in the table below,
but also available in invitrodb's data dictionary:
Field Description
aeid Assay component endpoint ID
assay_title 1.1 Assay Name (title): Short and descriptive title for the assay
assay_objectives 2.1 Purpose of the test method: Inserted after assay_component_target_desc; The
claimed purpose and rationale for intended use of the method (e.g. alternative to an
existing method, screening, provision of novel information in regulatory decision-making,
mechanistic information, adjunct test, replacement, etc.) should be explicitly described
and documented. The response measured in the assay should be put in the context of
the biology/physiology leading to the in vivo response or effect.
-------
If the biological activity or response refers to a key event or molecular initiating
event (MIE), provide a short description indicating firstly what key event within
an existing or developing AOP, or in relation to a mechanism or mode of action,
the assay is aiming to characterize (i.e. which level of biological organization the
assay may be attributed (e.g. sub-cellular, cellular, tissue, organ or individual),
and secondly where the assay might fit in the context of an existing regulatory
hazard (i.e. adverse outcome).
In the absence of any AOP, provide an indication of the plausible linkage
between the mechanism(s) the assay is measuring and the resulting hazard
endpoint.
assay_ throughput 1.10 Information about the throughput of the assay: Information about the throughput
of the assay: indicate the throughput of the assay to provide an indication of likely
resource intensity e.g. low (manual assay, one chemical tested at a time), low-moderate,
moderate, moderate-high, high throughput (e.g. in 96 well-plate and higher) and qualify
with e.g. approximate number of chemicals/concentrations per run. If appropriate
indicate whether a manual assay could be run in a higher throughput mode
scientific_ principles 2.2 Scientific principle of the method: provide the scientific rationale, supported by
bibliographic references to articles, for the development of the assay. A summary
description of the scientific principle including the biological/physiological basis and
relevance (e.g. modeling of a specific organ) and/or mechanistic basis (e.g. modeling a
particular mechanism by biochemical parameters) should be described. If possible,
indicate what the anchor point is within an AOP.
2.6 Response and Response Measurement: response here refers to any biological effect,
process or activity that can be measured. Specify precisely and describe the response
and its measurement.
3.2 Data analysis: Comment on the response value in terms of a boundary or range to
provide a context for interpretation. E.g. putting into context what a negative value or
>100% value might represent in a binding inhibition assay.
2.5 Description of the experimental system exposure regime: provide a summary
description of the essential information pertaining to the exposure regime (dosage and
exposure time including observation frequency) of the test compounds to the
experimental system including information on metabolic competence if appropriate;
number of doses/concentrations tested or testing range, number of replicates, the use of
control(s) and vehicle. Also, describe any specialized equipment needed to perform the
assay and measure the response. Indicate whether there might be potential solubility
issues with the test system, and solutions proposed to address the issue.
2.3 Tissue, cells or extracts utilized in the assay and the species source: indicate the
experimental system for the activity or response being measured. Provide information on
whether materials are readily available commercially or whether materials are developed
in the laboratory (e.g. cell suspensions from tissue). Indicate source/manufacturer of
biological material used. Indicated whether cryopreserved biological material can be
used or only freshly prepared.
2.4 Metabolic competence of the test system: describe and discuss the extent to which
the test system can be considered metabolically competent, either by itself, or with the
addition of an enzymatic fraction, if appropriate. Provide reference if available.
1.9 Availability of information about the assay in relation to proprietary elements:
indicate whether the assay is proprietary or non-proprietary (to what extent is the assay
method transferable or contains proprietary elements) and specify (if possible) what kind
of information about the assay cannot be disclosed or is not available (e.g., chemical
reference sets (training or test sets), prediction model).
* Key information has been emboldened.
biologica l_
responses
analytical
description
basic_procedures
experimental
system
xenobiotic_
biotransformation
proprietary_
elements
-------
Interpretation of Robustness Metrics
To assess test method performance, quantitative metrics were derived for the processed multi-
concentration response data to examine the assay's performance relative to controls. This function is
available in the tcpl vignette, under Data Retrieval in invitrodb>Review MC assay quality. A summary of
each of the metrics and their interpretation is provided below:
NEUTRAL CONTROL
(well type = "n")
Neutral control well median response
value, by plate: nmed
Neutral control median absolute
deviation, by plate: nmad
Coefficient of variation (CV%) in
neutral control wells:
(nmad/nmed)*100
POSITIVE CONTROL
(well type = "p")
Positive control well median response
value, by plate: pmed
Positive control well median absolute
deviation, by plate: pmad
Z-Prime Factor for median positive and
neutral control across all plates:
1 — ((3 * (pmad + nmad))
abs(pmed — nmed)
DESCRIPTION
This is a robust measure of central tendency in the neutral control wells
(i.e., vehicle control or wells not expected to cause biological change).
Often serves as baseline or background response observed without
chemical treatment. Value should be considered in context of the
endpoint's response units and is calculated as the median of responses
in neutral control wells.
This is a robust measure of the variability in neutral control wells. Value
should be considered with response units and is calculated as the
median of the absolute deviations (from the median), multiplied by the
scaling factor constant of 1.4826:
1.4826*median(|yi-y|)
where y, is the ith observation of all wells within neutral control wells
and y is the median across all yi's
The coefficient of variation (CV), expressed as a percentage, compares
the relative variability of neutral control responses against the median
of neutral controls wells. CV% >20% may indicate high variability
however interpretation is assay dependent.
This is a measure of central tendency in the positive control wells. Value
should be considered in context of the endpoint's normalized response
units.
This is a measure of the variability in positive control wells. Value
should be considered with response units.
Z-prime factor is a robust measure of signal-to-background difference
(Zhang et al, 1999). Measuring the degree of separation between
neutral and positive controls, each with their own variability, can be
indicative of likelihood of false positives or negatives. The Z'-factor has
the range of -°° to 1, and is traditionally interpreted as follows:
• Z =l: Ideal. This is approached when the assay has wide
dynamic range with a small median absolute deviation across
controls. In this situation, the separation band is almost as long
as the dynamic range.
• 1.0 > T > 0.5: Excellent. Assay shows good separation between
controls.
• 0.5 > T > 0: Marginal. Assay shows an acceptable degree of
separation between controls.
• Z'=0: Nominal: Good only for a yes/no response
• Z'<0: Unacceptable. Use caution with given overlap in
response between controls.
Note that these categories are not imposed in presented metrics.
-------
Strictly standardized mean difference
(SSMD) for positive compared to
neutral control wells:
(pmed — nmed)
yjpmad2 + nmad2
Strictly standardized mean difference (SSMD, often denoted as P) is a
robust measure of effect size and was developed to address limitations
in the Z1 factor for experiments using controls of moderate strength.
Acceptable screening values for SSMD depend on the strength of the
positive controls used. A higher SSMD may correspond to stronger
controls. Table of suggested interpretation of values by control strength
from Advanced Assay Development Guidelines for Image-based High
Content Screening and Analysis
Quality Type
1 Moderate Control
2 Strong Control
3 Very Strong Control
•1 Extremely Strong Control
Excellent
p>2
|!>3
IV
-J
Good
2> p> 1
3>P42
5>p>3
7>p>5
Inferior
1 > p > 0 S
2 > pa l
3>p>2
S>p>3
Poor
p<0.5
p T > 0.5: Excellent. Assay shows good separation between
controls.
• 0.5 > T > 0: Marginal. Assay shows an acceptable degree of
separation between controls.
• Z'=0: Nominal: Good only for a yes/no response
• Z'<0: Unacceptable. Use caution with given overlap in
response between controls.
Note that these categories are not imposed in presented metrics.
Strictly standardized mean difference (SSMD, often denoted as P) is a
robust measure of effect size and was developed to address limitations
in the Z1 factor for experiments using controls of moderate strength.
-------
(mmed — nmed)
\lmmad2 + nmad2
Acceptable screening values for SSMD depend on the strength of the
positive controls used. Table of suggested interpretation of values by
control strength from Advanced Assay Development Guidelines for
Image-based High Content Screening and Analysis.
Signal-to-noise (median across all
plates, using negative control wells):
(mmed-nmed)/nmad)
Signal-to-background (median across
all plates, using negative control
wells): (mmed/nmed)
The signal-to-noise ratio (S/N) gives a measure of the degree of
confidence that a difference in signal noise in negative controls
compared to background response is real.
The signal-to-background ratio (S/B) is a simple comparison of the
median negative control signal to the median neutral controls, i.e.
background response. It does not contain any information about
variability of the data.
-------
Table of Contents
The following endpoints are included in this iteration of the ToxCast Assay Description Documentation.
Selecting a hyperlinked endpoint from this Table of Contents will direct users to the individual endpoint-
specific PDF. Please utilize Ctrl+F or the Bookmarks panel to navigate to specific assay sources and
endpoints of interest within this document.
AEID8 APR HepG2 MicrotubuleCSK Ihr ~
AEID26 APR HepG2 Cell Loss 24 hr ~
AEID12 APR HepG2 MitoMembPot Ihr ~
AEID24 APR HepG2 CellCvcleArrest 24hr ~
AEID6 APR HepG2 CellLoss Ihr ~
AEID30 APR HepG2 MitoMass 24hr ~
AEID16 APR HepG2 NuclearSize Ihr ~
AEID14 APR HepG2 MitoticArrest Ihr ~
AEID20 APR HepG2 p53Act Ihr ~
AE1D32 APR HepG2 MitoMembPot 24hr ~
AEID22 APR HepG2 StressKinase Ihr ~
AEID10 APR HepG2 MitoMass Ihr ~
AEID28 APR HepG2 MicrotubuleCSK 24hr ~
AE1D18 APR HepG2 P-H2AX Ihr ~
AEID52. APR HepG2 MitoMembPot ?2hr ~
AEID2 ACEA ER 80hr ~
AEID46 APR HepG2 CellLoss ?2hr ~
AEID56 APR HepG2 NuclearSize ?2hr ~
AEID4 APR HepG2 CellCvcleArrest Ihr ~
AEID40 APR HepG2 p53Act 24hr ~
AEID34 APR HepG2 MitoticArrest 24hr ~
AEID36 APR HepG2 NuclearSize 24hr ~
AEID67 ATG C EBP CIS ~
AEID63 ATG Ahr CIS ~
AE1D48 APR HepG2 MicrotubuleCSK 72hr ~
AEID65 ATG AP 2. CIS ~
AEID64 ATG AP 1 CIS ~
AEID44 APR HepG2 CellCvcleArrest ?2hr ~
AEID54 APR HepG2 MitoticArrest ?2.hr ~
AEID58 APR HepG2 P-H2AX ?2hr ~
AEID50 APR HepG2 MitoMass ?2hr ~
AE1D42 APR HepG2 StressKinase 24hr ~
AEID60 APR HepG2 p53Act ?2hr ~
AEID69 ATG CRE CIS ~
AEID62 APR HepG2 StressKinase ?2hr ~
AE1D72 ATG E Box CIS ~
AEID66 ATG BRE CIS ~
-------
AEIDE )G2 P-H2AX 24 hr ~
5 RAR t
AE1D75 ATG ERE CIS ~
AE1D74 ATG EGR CIS ~
AEID68 ATG CMV CIS ~
AEID82. ATG HIFla CIS ~
AEID79 ATG GATA CIS ~
AEID76 ATG Ets CIS ~
AEID80 ATG GLI CIS ~
AEID78 ATG FoxO CIS ~
AEID77 ATG FoxA2 CIS ~
AEID70 ATG DR4 LXR CIS ~
AEID73 ATG E2.F CIS ~
AEID86 ATG ISRE CIS ~
AEID85 ATG IR1 CIS ~
AEID83 ATG HNF6 CIS ~
AEID81 ATG GRE CIS ~
AEID84 ATG HSE CIS ~
AEID88 ATG M 19 CIS ~
AEID91 ATG MRE CIS ~
AEID90 ATG M 61 CIS ~
AEID89 ATG M 32. CIS ~
AEID87 ATG M 06 CIS ~
AEID92. ATG Mvb CIS ~
AEID94 ATG NF kB CIS ~
AEID97 ATG NRF2 ARE CIS ~
AEID98 ATG Oct MLP CIS ~
AEID93 ATG Mvc CIS ~
AEID100 ATG Pax6 CIS ~
AEID96 ATG NRF1 CIS ~
AEID95 ATG NFI CIS ~
AEID101 ATG PBREM CIS ~
AEID103 ATG PXRE CIS ~
AEID99 ATG p53 CIS ~
AEID106 ATG Sol CIS ~
AEID108 ATG STAT3 CIS ~
AEID104 ATG RORE CIS ~
AEID109 ATG TA CIS ~
AEID107 ATG SREBP CIS ~
AEID105 ATG Sox CIS ~
AEID111 ATG TCF b cat CIS ~
AEID110 ATG TAL CIS ~
AEID102. ATG PPRE CIS ~
AEID113 ATG VDRE CIS ~
-------
AEID116
ATG
*,NS ~
AEID118
ATG
ERRa TRANS ~
AEID114 ATG Xbol CIS ~
AEID117
ATG
ERa TRANS ~
AEID112
ATG
TGFb CIS ~
AEID115
ATG
AR TRANS ~
AEID119 ATG ERRg TRANS ~
AEID123
ATG
HNF4a TRANS ~
AEID120
ATG
FXR TRANS ~
AEID122
ATG
GR TRANS ~
AEID126
ATG
LXRb TRANS ~
AEID124 ATG Hpa5 TRANS ~
AEID125
ATG
LXRa TRANS ~
AEID127
ATG
M 06 TRANS ~
AEID130
ATG
M 61 TRANS ~
AEID129
ATG
M 32 TRANS ~
AEID128
ATG
M 19 TRANS ~
AEID121
ATG
GAL4 TRANS ~
AEID131
ATG
NURR1 TRANS ~
AEID133
ATG
PPARd TRANS ~
AEID134 ATG PPARg TRANS ~
AEID132
ATG
PPARa TRANS ~
AEID137
ATG
RARb TRANS ~
AEID135
ATG
PXR TRANS ~
AEID136
ATG
RARa TRANS ~
AEID143
ATG
THRal TRANS ~
AEID139
ATG
RORb TRANS ~
AEID141
ATG
RXRa TRANS ~
AEID144
ATG
VDR TRANS ~
AEID140 ATG RORe TRANS ~
AEID138 ATG RARg TRANS ~
AEID142
ATG
RXRb TRANS ~
AEID150
BSK
3C ICAM i ~
AEID154
BSK
3C MCP1 ~
AEID146
BSK
3C Eselectin ~
AEID148
BSK
3C HLADR ~
AEID152
BSK
3C IL8 ~
AEID160
BSK
3C SRB ~
AEID156
BSK
3C MIG ~
AEID162
BSK
3C Thrombomodulin ~
AEID166
BSK
3C uPAR ~
AEID164
BSK
3C TissueFactor ~
AEID158
BSK
3C Proliferation ~
AEID168
BSK
3C VCAM1 ~
-------
AEID170
BSK
3C
Vis ~
AEID172
BSK
4H
Eotaxin3 ~
AEID174
BSK
4H
MCP1 ~
AEID180
BSK
4H
uPAR ~
AEID182
BSK
4H
VCAM1 ~
AEID184
BSK
4H
VEGFRII ~
AEID176
BSK
4H
Pselectin ~
AEID186
BSK
BE3C HLADR~
AEID178
BSK
4H
SRB ~
AEID188
BSK
BE3C Ilia ~
AEID194
BSK
BE3C MMP1 ~
AEID196 BSK BE3C PA11 ~
AEID198 BSK BE3C SRB ~
AEID190 BSK BE3C IP10 ~
AEID218 BSK CASM3C MCSF ~
AEID214 BSK CASM3C LDLR ~
AEID204 BSK BE3C tiPA ~
AEID200 BSK BE3C TGFbl ~
AEID206 BSK BE3C uPAR ~
AEID192 BSK BE3C M1G ~
AEID212 BSK CASM3C 1L8 ~
AEID210 BSK CASM3C 1L6 ~
AEID202 BSK BE3C tPA ~
AEID224 BSK CASM3C SAA ~
AEID230 BSK CASM3C TissueFactor ~
AEID220 BSK CASM3C MIG ~
AEID234 BSK CASM3C VCAM1 ~
AEID232 BSK CASM3C uPAR ~
AEID246 BSK hDFCGF MIG ~
AEID250 BSK hDFCGF PA11 ~
AEID222 BSK CASM3C Proliferation ~
AEID238 BSK hDFCGF EGFR ~
AEID264 BSK KF3CT IP 10 ~
AEID228 BSK CASM: mbomodulin ~
AEID244 BSK hDFCGF MCSI
AEID266 BSK KF3CT MCP1 ~
AEID276 BSK KF3CT uPA ~
AEID280
BSK
LPS Eselectin ~
AEID282
BSK
LPS ILla ~
AEID270
BSK
KF3CT SRB ~
AEID272
BSK
KF3CT TGFbl ~
AEID268
BSK
KF3CT MMP9 ~
AEID262
BSK
KF3CT ILla ~
AEID292
BSK
LPS SRB ~
-------
AEID286 BSK LPS MCP1 ~
AEID290 BSK LPS PGE2 ~
AEID304 BSK SAg CD69 ~
AEID298 BSK LPS VCAM1 ~
AEID300 BSK SAg CD38 ~
AEID739 OT AR ARELUC AG 1440 ~
AEID308 BSK SAg IL8 ~
AEID302 BSK SAg CD40 ~
AEID1913 ATG chAR XSP1 ~
AE1D751 OT ERa GFPERaERE 0480 ~
AEID306 BSK SAg Eselectin ~
AEID296 BSK LPS IN Fa ~
AEID756 OT NURR1 NURR1RXI
AEID316 BSK SAg Proliferation ~
AE1D310 BSK SAg MCP1 ~
AE1D744 OT ER ERaERb 0480 ~
AEID795 TOX21 GR BLA Antagonist viability ~
AE1D746 OT ER ERbERb 0480 ~
AEID314 BSK SAg PBMCCytotoxicity ~
AE1D740 OT AR ARSRC1 0480 ~
AEID318 BSK SAg SRB ~
AE1D743 OT ER ERaERa 1440 ~
AEID312 BSK SAg MIG ~
AEID755 OT NURR1 NURRlRXRa 0480 ~
AEID784 TOX21 ERa BLA Agonist ch2. ~
AE1D750 OT ERa GFPERaERE 0120 ~
AE1D753 OT FXR FXRSRC1 0480 ~
AE1D745 OT ER ERaERb 1440 ~
AEID782 TOX21 ELG1 LUC Agonist viability ~
AE1D742. OT ER ERaERa 0480 ~
AEID783 TOX21 ERa BLA Agonist chl ~
AEID800 TOX21 PPARg BLA Agonist chl ~
AEID794 TOX21 GR BLA Antagonist ratio ~
AEID786 TOX21 ERa BLA Antagonist ratio ~
AEID899 CEETOX H295R CORTIC ~
AEID909 CEETOX H295R ESTRONE ~
AEID790 TOX21 ERa LUC VM7 Antagonist 0.5nM E2 viability ~
AE1D897 CEETOX H295R ANDR ~
AEID801 TOX21 PPARg BLA Agonist ch2. ~
AEID240
BSK
hDFCGF 11.8 ~
AEID242
BSK
hDFCGF IP1G ~
AEID260
BSK
KF3CT ICAM1 ~
AEID256
BSK
hDFCGF TIM PI ~
-------
AEIDE (21 AhR LUC Agonist viability ~
AEID252 BSK hDFCGF Proliferation ~
AEID236 BSK hDFCGF Collagenlll ~
AEID893 CEETOX H295R OHPREG ~
AEID248 BSK hDFCGF MMP1 ~
AEID907 CEETOX H295R ESTRADIOL ~
AE1D785 TOX21 ERa BLA Agonist ratio ~
AE1D901 CEETOX H295R CORTISOL ~
AEID254 BSK hDFCGF SRB ~
AEID913 CEETOX H295R PROG ~
AE1D1U7 TOX21 FXR BLA Agonist chl ~
AEID915 CEETOX H295R TESTO ~
AEID891 CEETOX H295R 11DCORT ~
AEID905 CEETOX H295R DOC ~
AEID1109 TOX21 ARE BLA Agonist ch2. ~
AEID1108 TOX21 ARE BLA Agonist chl ~
AEID1122 TOX21 PPARd BLA Agonist chl ~
AE1D895 CEETOX H295R OH PROG ~
AEID1120 TOX21 FXR BLA Antagonist ratio ~
AEID1125 TOX21 PPARd BLA Antagonist ratio ~
AEID1123 TOX21 PPARd BLA Agonist ch2. ~
AEID1110 TOX21 ARE BLA Agonist ratio ~
AE1D1U8 TOX21 FXR BLA Agonist ch2. ~
AEID1119 TOX21 FXR BLA Agonist ratio ~
AEID1124 TOX21 PPARd BLA Agonist ratio ~
AEID1193 TOX21 GR BLA Antagonist chl ~
AEID1121 TOX21 FXR BLA Antagonist viability ~
AEID1188 TOX21 FXR BLA agonist viability ~
AEID1189 TOX21 ERa BLA Antagonist chl ~
AEID1196 TOX21 PPARd BLA Antagonist chl ~
AEID1128 TOX21 PPARg BLA Antagonist viabi
AEID1126 TOX21 PPARd BLA Antagonist viability ~
AEID1191 TOX21 FXR BLA Antagonist chl ~
AEID1192 TOX21 FXR BLA Antagonist ch2. ~
AEID1190 TOX21 ERa BLA Antagonist ch2. ~
AEID1343 TOX21 ESRE BLA Agonist viability ~
AEID1194 TOX21 GR BLA Antagonist ch2. ~
AEID1195 TOX21 PPARd BLA Agonist viability ~
AEID1198 TOX21 PPARg BLA Antagonist chl ~
AE1D3 atoxicity ~
AEID1199 TOX21 PPARg BLA Antagonist ch2. ~
AEID1185 TOX21 ARE BLA agonist viability ~
AE1DU97 TOX21 PPARd BLA Antagonist ch2 ~
AEID1354 ATG HNF4g TRANS2 ~
-------
AEID1359 ATG TR4 TRANS2 ~
AEID1350 ATG COUP TF2 TRANS2 ~
AEID1348 ATG NUR77 TRANS2 ~
AEID1202 TOX21 AR BLA Antagonist chl ~
AEID1341 TOX2.1 ESRE BLA Agonist ch2. ~
AEID1370 ATG EAR2 TRANS2. ~
AEID1361 ATG Rev ERB B TRANS2 ~
AEID1203 TOX21 AR BLA Antagonist ch2 ~
AEID1365 ATG SF 1 TRANS2. ~
AEID1356 ATG MR TRANS2 ~
AEID1351 ATG PNR TRANS2. ~
AEID1357 ATG COUP TF1 TRANS2 ~
AEID1352 ATG LRH1 TRANS2. ~
AEID1349 ATG GCNF TRANS2. ~
AEID1355 ATG ERRb TRANS2. ~
AEID1360 ATG DAX1 TRANS2. ~
AEID1358 ATG NOR1 TRANS2. ~
AEID1374 Tanguav ZF 120hpf AXIS legacy ~
AEID1369 ATG THRb TRANS2. ~
AEID1366 ATG SHP TRANS2. ~
AEID1368 ATG TLX TRANS2 ~
AEID1363 ATG PR TRANS2 ~
AEID1383 Tanguav ZF 120hpf CFIN legacy ~
AEID1389 Tanguay ZF 120hpf TR legacy ~
AEID1372 Tanguav ZF 120hpf MORT legacy ~
AEID1385 Tanguav ZF 120hpf CIRC legacy ~
AEID1375 Tanguav ZF 120hpf EYE legacy ~
AEID1376 Tanguav ZF 120hpf SNOU legacy ~
hpf TERATOSCORE ~
AE1D162.6 CLP UGT1A1 6hr ~
AEID1630 CLP ACT IN 24hr ~
AEIP2.74 BSK KF3CT TIMP2 ~
AEIP1613 CLP ABCG2. 6hr ~
AE1P1614 CLP ACT IN 6hr ~
AEIP1623 CLP HMGCS2 6hr ~
AE1P162.8 CLP ABCB11 24hr ~
AE1P162.4 CLP SLCQ1B1 6hr ~
AEID1629 CLP ABCG2 24hr ~
AEID1388 Tanguav ZF 120hpf NC legacy ~
AEID284 BSK LPS 1L8 ~
AEIP1611 CLP ABCB1 6hr ~
AEIP1621 CLP GAPDH 6hr ~
AE1P1618 CLP CYP2C19 8hr~
-------
AEID1616 CLP CYP1A2 6hr ~
AEID1617 CLP CYP2B6 6hr ~
AE1P1643 CLP ABCB1 48hr ~
AE1P1640 CLP SLC01B1 24 hr ~
AEID294 BSK LPS TissueFactor ~
6hr ~
AEIP1635 CLP CYP2.C9 24hr ~
AEID288 BSK LPS MCSF ~
AEID278 BSK LPS CP40 ~
AE IP1652
CLP
CYP3A4 48hr~
AEIP1644
CLP
ABCB11 48hr~
AE IP 1642
CLP
UGT1A1 24hr ~
AEIP1653
CLP
GAPPH 48hr~
AEIP1631
CLP
CYP1A1 24hr ~
AE IP 1647
CLP
CYP1A1 48hr~
AEIP1634
CLP
CYP2C19 24hr ~
AEIP1639
CLP
HMGCS2 24hr ~
AEIP1650
CLP
CYP2C19 48hr ~
AEIP1651
CLP
CYP2C9 48hr ~
AEIP1654
CLP
GSTA2 48hr ~
AEIP1636
CLP
CYP3A4 24hr ~
AEIP1648
CLP
CYP1A2 48hr ~
AEIP1646
CLP
ACTIN 48hr ~
AEIP1638
CLP
GSTA2 24hr ~
AEIP1633
CLP
CYP2B6 24hr ~
AEIP1641
CLP
SULT2A 24hr ~
AE IP 1746
ATG
GPCR APORA2A TRANS ~
AE IP1752
ATG
GPCR APRA2B TRANS ~
AE IP 1748
ATG
GPCR APORA2B TRANS ~
AEIP1649
CLP
CYP2B6 48hr ~
AEID1664 CEETOX H295 1 viabili
3H 24hr ~
AEID1750 ATG GPCR ADRA1A TRANS ~
AEID1660 TOX2.1 RAR LUC Agonist viability ~
AEID1756 ATG GPCR ADRB3 TRANS ~
AE IP 1764 ATG GPCR EDNRA TRANS ~
AEIP1659 TOX2.1 RAR LUC Agonist ~
AEIP1658 CLP UGT1A1 48hr ~
AEIP1682
STM
H9 CvstinelSnorm perc ~
AEiP1655
CLP
HMGCS2 48hr ~
AEIP1688
STM
H9 OmithinelSnorm perc ~
AEIP1657
CLP
SULT2A 48hr ~
AE IP1796
ATG
GPCR PTGIR TRANS ~
AEIP1758
ATG
GPCR CHRM3 TRANS ~
-------
AEID 1844 T0X21 API BLA Agonist chl ~
AEID1827 ArunA Migration hNP ~
AEID1835 ArunA NOG NeuritesPerNeuron ~
AEID1846 TOX2.1 API BLA Agonist ratio ~
AEID1831 ArunA NOG NucleusCount ~
AEID1826 ArunA CellTiter hNC ~
AE ID 1847 TOX21 API BLA Agonist viability ~
AEID1823 TOX21 AR LUC MDAKB2 Agonist 3uM Nilutamide viabilil
)X21 API BLA Agonist
AEID1840 TOX21 RAR i I l> Antagonist viability ~
AEID1933 ATG chERa XSP1 ~
AEID1855 ACEA AR agonist 80hr ~
AEID1858 SIM H9 NormalizedViabilitv ~
AEID1925 ATG zfERl XSP1 ~
AEID1921 ATG zfAR XSP1 ~
AEID1856 AC antagonist 80hr ~
AEID1923 ATG frERl XSPl ~
AE ID 185 2. ACEA ER AUC viability ~
AEID1939 ATG HERb XSPl ~
AE1D741 OT AR ARSRC1 0960 ~
AEID1937 ATG trERa XSPl ~
AE1D1957 ATG mPXR XSPl ~
AEID1955 ATG zfPPARg XSPl ~
AE1D1951 ATG hPPARg XSPl ~
AE1D1941 ATG GAL4 XSPl ~
AEID1963 ATG trTRa XSPl ~
AEID1943 ATG M 06 XSPl ~
AE1D1953 ATG mPPARg XSPl ~
AE ID 1947 ATG M 32. XSPl ~
AE1D1949 ATG M 61 XSPl ~
AEID1945 ATG M 19 XSPl ~
AE1D1959 ATG frTRa XSPl ~
AE ID1961 ATG hTRa XSPl ~
AE1D754 OT FXR FXRSRC1 1440 ~
AE1D747 OT ER ERbERb 1440 ~
AE1D759 TOX2.1 AR BLA Agonist chl ~
AE1D757 OT PPARg PPARgSRCl 0480 ~
AE1D760 TOX2.1 AR BLA Agonist ch2. ~
AE1D758 OT PPARg PPARgSRCl 1440 ~
AE1D791 TOX2.1 GR BLA Agonist chl ~
AE1D762. TOX2.1 AR BLA Antagonist ratio ~
AEID765 TOX21 AR LUC MDAKB2. Antagonist IQnM R1881 ~
AEID1971 ATG XTT Cytotoxicity XSPl ~
AE1D761 TOX2.1 AR BLA Agonist ratio ~
-------
AEID1969 ATG zfTRb XSP1 ~
AE1D788 T0X21 ERa LUC VM7 Agonist ~
AEID763 TOX21 AR BLA Antagonist viability ~
AE1D766 TOX21 AR LUC MDAKB2. Antagonist IQnM R1881 viability ~
AEID 1965 ATG zfT'Ra XSP1 ~
AEID208 BSK CASM3C HLADR ~
AE ID1967 ATG hTRb XSP1 ~
AEID1981 ATG GAL4 XSP2 ~
AEID764 TOX21 AR LUC MDAKB2 Agonist ~
AEID1973 ATG M 06 XSP2 ~
AE ID 1987 ATG zfER2a XSP2 ~
AE ID1977 ATG hERb XSP2 ~
AEID1989 ATG HAR XSP2 ~
AEID1991 ATG chAR XSP2 ~
AE ID 1979 ATG trAR XSP2 ~
AEID1975 ATG trERa XSP2 ~
AEID1985 ATG chERa XSP2 ~
AEID2001 ATG HERa XSP2 ~
AEID1993 ATG frERl XSP2 ~
AEID2011 ATG zfPPARg XSP2 ~
AEID2005 ATG M 32 XSP2 ~
AE ID 1997 ATG zfAR XSP2. ~
AE1D781 TOX21 ELG1 LUC Agonist ~
AEID2003 ATG M 19 XSP2. ~
AEID1995 ATG frAR XSP2. ~
AEID216 BSK CASM3C MCP1 ~
AElDi (21 GR BLA Agonist ratio ~
AMI1 v I >><21 ERa BLA Antagonist viability ~
AE1D789 TOX21 ERa LUC VM7 Antagonist 0.5nM E2 ~
AEID2023 ATG M 61 XSP2 ~
AE1D2025 ATG frTRa XSP2 ~
AEID2029 ATG hTRb XSP2 ~
AE ID 2.017 ATG trTRa XSP2 ~
AE ID 2.031 ATG XTT Cytotoxicity XSP2. ~
AEID226 BSK CASM3C SRB ~
AEID258 BSK HDFCGF VCAM1 ~
AEID806 TOX2.1 AhR LUC Agonist ~
AEID2055 TOX21 ERR LUC Agonist ~
AEID2118 TOX21 ERb BLA Antagonist ch2 ~
AEID2059 TOX21 ERR LUC viability ~
AEID2116 TOX21 ERb BLA Agonist viability ~
AE1D2U7 TOX2.1 ERb BLA Antagonist chl ~
AEID2120 TOX21 ERb BLA Antagonist viability ~
AEID2053 TOX21 ERa LUC VM7 Antagonist O.lnM E2. ~
-------
AEID2114 T0X2.1 ERb BLA Agonist ch2 ~
AEID2113 TOX2.1 ERb BLA Agonist chl ~
AEID2126 TOX21 PR BLA Antagonist ch2 ~
AEID2128 TOX21 PR BLA Antagonist viability ~
AEID2221 TOX21 PR BLA Followup Antagonist viability ~
AEID2222 TOX21 PR LUC Followup Agonist ~
AEID2225 TOX21 PR LUC Followup Antagonist viability ~
AEID2224 TOX21 PR LUC Followup Agonist viability ~
AEID2125 TOX21 PR BLA Antagonist chl ~
AEID2123 TOX21 PR BLA Agonist ratio ~
AEID2223 TOX21 PR LUC Followup Antagonist ~
AEID2124 TOX21 PR BLA Agonist viability ~
AEID2470 CCTE Shafer MEA acute per network burst spike number mean ~
AEID2472 CCTE Shafer MEA acute per network burst spike number stci ~
AEID2309 CCTE GLTED hDIOl ~
AEID2220 TOX21 PR BLA Followup Agonist viability ~
AEID2127 TOX2.1 PR BLA Antagonist ratio ~
AEID2151 CEETOX H295R CORTIC noMTC ~
AEID2121 TOX21 PR BLA Agonist chl ~
AEID2167 CEETOX H295R TESTO noMTC ~
AEID2165 CEETOX H295R PROG noMTC ~
AEID2161 CEETOX H295R ESTRONE noMTC ~
AEID2153 CEETOX H295R CORTISOL noMTC ~
AEID2157 CEETOX H295R DOC noMTC ~
AE1D2.2.18 TOX2.1 PR BLA Followup Agonist ratio ~
AEID2219 TOX21 PR BLA Followup Antagonist ratio ~
AEID2159 CEETOX H295R ESTRADIOL noMTC ~
AEID1127 TOX2.1 PPARg BLA Antagonist ratio ~
AEID2149 CEETOX H295R ANDR noMTC ~
AEID2122 TOX21 PR BLA Agonist ch2. ~
AEID2468 CCTE Shafer MEA acute burst percentage std ~
AEID2464 CCTE Shafer MEA acute interburst interval mean ~
AEID2460 CCTE Shafer MEA acute burst duration mean ~
AEID2462 CCTE Shafer MEA acute per burst spike number mean ~
AEID2466 CCTE Shafer MEA acute burst percentage mean ~
AEID2363 TOX21 PXR LUC Agonist ~
A ' fer MEA acute per network burst electrodes number mean ~
AEID2458 CCTE Shafer MEA acute burst number ~
AEID2476 CCTE Shafer MEA acute network burst percentage ~
AE ID 2480 CCTE Shafer MEA acute cross correlation HWHM ~
AEID2478 CCTE Shafer MEA acute cross correlation area ~
AE ID 2486 CCTE Deisenroth AIME 96WELL LUC Inactive ~
AEID2484 CCTE Deisenroth AIME 96WELL LUC Active ~
AEID2496 CCTE Shafer MEA dev burst rate ~
-------
AEID2490 CCTE Deisenroth AIME 384WELL LUC Inactive ~
AEID2494 CCTE Shafer MEA dev firing rate mean ~
AEID2491 CCTE Deisenroth AIME 384WELL CTox Inactive ~
AEID2492 CCTE Deisenroth AIME 384WELL LUC Shift ~
AEID2488 CCTE Deisenroth AIME 384WELL LUC Active ~
AEID2487 CCTE Deisenroth AIME 96WELL CTox Inactive ~
AEID2489 CCTE Deisenroth AIME 384WELL CTox Active ~
AEID2500 CCTE Shafer MEA dev bursting electrodes number ~
AEID2504 CCTE Shafer MEA dev per burst spike percent ~
AEID2502 CCTE Shafer MEA dev per burst interspike interval ~
AEID2498 CCTE Shafer MEA dev active electrodes number ~
AEID2506 CCTE Shafer MEA dev burst duration mean ~
AEID2508 CCTE Shafer MEA dev interburst interval mean ~
AEID2510 CCTE Shafer MEA dev network spike number ~
AEID2512 CCTE Shafer MEA dev network spike peak ~
AEID2520 CCTE Shafer MEA dev per network spike spike number mean ~
AEID2.518 CCTE Shafer MEA dev inter network spike interval mean ~
AEID2516 CCTE Shafer MEA dev network spike duration std ~
AEID2532 CCTE GLTED hDIQ2. ~
AEID2540 CCTE Shafer MEA acute LDH ~
AEID2514 CCTE Shafer MEA dev spike duration mean ~
AEID2530
CCTE
Shafer
MEA dev AB ~
AEID2529
CCTE
Shafer
MEA dev LDH ~
AEID2533
CCTE
GLTED
hDI03 ~
AEID2526
CCTE
Shafer
MEA dev mutual information norm i
AEID2541
CCTE
Shafer
MEA acute AB ~
AEID2697 UKN2 HCS IMR90 neural migration ~
AEID2522 CCTE Shafer MEA dev per network spike spike percent ~
AEID2524 CCTE Shafer MEA dev correlation coefficient mean ~
AE ID 2779 CCTE Mundv HCI Cortical NOG NeuriteLength ~
AEID2782 CCTE Mundy HCI Cortical Synap Neur Matur CellBodySpotCount ~
AEID2778 CCTE Mundv HCI Cortical NOG NeuriteCount ~
AEID2699 UKN2 HCS IMR90 cell viability ~
AEID2547 UKN5 HCS SBAD2 cell viability ~
AE ID 2.777 CCTE Mundv HCI Cortical NOG BPCount ~
AEID2545 UKN5 HCS SBAD2 neurite outgrowth ~
AEID2783 CCTE Mundv HCI Cortical Synap Neur Matur NeuriteCount ~
AEID2791 CCTE Mundv HCI hN2 NOG NeuriteLength ~
AE ID 2.773 IUF NPCla proliferation area 72hr ~
AEID2703 UKN4 HCS LUHMES cell viability ~
AEID2786 CCTE Mundv HCI Cortical Synap Neur Matur NeuriteSpotCountPerNeun
AEID2789 CCTE Mundv HCI hN2 NOG BPCou
AEID2701 UKN4 HCS LUHMES neurite outgrowth ~
AEID2797 CCTE Mundv HCI hNPl Pro ResponderAvglnten ~
-------
AEID2.788 CCTE Muiidv HCI Cortical Svnap Neur Matur SvnapseCount ~
AEID2790 CCTE Mundy HCI HN2 NOG NeuriteCount ~
AEID2841 BSK MvoF MMP1 ~
AEID2837 BSK MvoF 1L8 ~
AEID2847 BSK MvoF TIMP1 ~
AEID2817 BSK BT xltlt/T ~
AEID2821 BSK BT xlL6 ~
AEID2831 BSK MvoF Collagen! ~
AE ID 2.771 1UF NPClb proliferation BrdU 72hr ~
AEID2835 BSK MvoF CollagenlV ~
AEID2855 BSK BF4T ICAM1 ~
AEID2873 BSK BF4T SRB ~
AEID2839 BSK MvoF Decorin ~
AE ID 2.775 1UF NPC1 viability 72hr ~
AEID2853 BSK BF4T VCAM1 ~
AEID2865 BSK BF4T MMP1 ~
AE ID 2.877 BSK BF4T uPA ~
AEID2863 BSK BF4T Keratin818 ~
AEID2849 BSK BF4T MCP1 ~
AEID2843 BSK MvoF PA11 ~
AEID2871 BSK BF4T PA11 ~
AEID2897 BSK hDFCGF Collagenl ~
AEID2851 BSK BF4T Eotaxin3 ~
AE ID 2.913
BSK IMphe VCAMi ~
AEID2879
BSK
BE3C
AEID2869
BSK
BF4T MMPf) ~
AEID2891
BSK
CASM3C PAI1 ~
AEID2867
BSK
BF4T MMP3 ~
AEID2883
BSK
BE3C 11.8 ~
AEID2885
BSK
BE3C EGFR ~
AEID2889
BSK
BE3C MMPf) ~
AEID2861
BSK
BF4T Ilia ~
AEID2893
BSK
hDFCGF MCP1 ~
AEID2857
BSK
BF4T CD90 ~
AEID2911 BSK IMphe MlPla ~
AEID2887 BSK BE3C KeratinS 18 ~
AEID2859 BSK BF4T 1L8 ~
AEID2925 BSK IMphe MCSF ~
AEID2921 BSK IMphe 118 ~
AEID2919 BSK IMphg CD69 ~
AEID2895 BSK hDFCGF 1CAM1 ~
AEID2935 BSK LPS CD69 ~
AEID3068 CCTE Mundy HCI iCellGluta NOG NeuriteCount ~
AEID2899 BSK hDFCGF ITAC ~
-------
AEID2907 BSK KF3CT PA11 ~
AEID2903 BSK KF3CT IL8 ~
AEID2942 1UF NPC2b neuronal migration 120hr ~
AEID2944 1UF NPC2c oligodendrocyte migration 120hr ~
AEID2.915 BSK IMphg CD40 ~
AEID2901 BSK hDFCGF TIMP2 ~
AE ID 2.917 BSK IMphg ESelectin ~
AE ID 2.931 BSK IMphg SRB.Mphg ~
AEID3025 VALA TUBIPS Antagonist CellCount ~
AEID3022. VALA TUBHUV Antagonist TubuleLength ~
AEID3069 CCTE Mundv HCI iCellGluta NOG NeuriteLength ~
AEIDE 3HUV1 ScratchOnlv CellCount ~
AEID3070 CCTE Mundv HCI iCellGluta NOG NeuronCount ~
AEID2946 IUF NPC3 neuronal differentiation 120hr ~
AEID2954 IUF NPC2-5 cytotoxicity 72hr ~
AEID2960 IUF-NPC2-5 viability 120hr ~
AEID2956 IUF NPC2-5 cytotoxicity 120hr ~
AEID2958 IUF NPC2-5 cell number 120hr ~
AEID3074 CCTE Deisenroth 5AR NBTE donor ~
AEID2948 IUF NPC4 neurite length 120hr ~
AEID3067 CCTE Mundv HCI iCellGluta NOG BPCount ~
AEID3031 VALA MIGHUV2 WoundArea ~
AEID3030 VALA MIGHUV2 Bcatenin ~
AEID3028 VALA MIGHUV1 ScratchOnlv WoundArea ~
AEID3029 VALA MIGHUV2. CellCount ~
AEID3032 CCTE GLTED hlYD ~
AEID3078 CCTE Deisenroth 5AR NBTE ratio ~
AEID3087 IUF NPC1 cytotoxicity 72hr ~
AEID3090 CCTE GLTED hTPO ~
AEID3095 CCTE Deisenroth DEVTOX-GLR
legacy Sox2 ~
AEID31Q1 ATG rtGR EcoTox2 ~
AEID3096 CCTE Deisenroth DEVTOX-GLR
legacy Bra ~
AEID3091 CCTE GLTED xDI03 ~
AEID31G3 ATG imGR EcoTox2. ~
AEID3098 CCTE Deisenroth DEVTOX-GLR
legacy CellCount ~
AEID3105 ATG zfGR EcoTox2. ~
AEID3111 ATG rtPPARa EcoTox2. ~
AEID3099 ATG frGR EcoTox2. ~
AEID3092 CCTE GLTED xlYD ~
Aj : onist r;
AEID3115 ATG zfPPARa EcoTox2 ~
AEID3109 ATG frPPARa EcoTox2. ~
AEID3119 ATG frPPARg EcoTox2 ~
Aj mPPARa EcoTo
-------
AEID3129 ATG frRXRb EcoTox2 ~
AEID3131
ATG
rtRXRb EcoTox2 ~
AEID3127
ATG
HRXRb EcoTox2 ~
AEID3135
ATG
zfRXRb EcoTox2 ~
AEID3117 ATG hPPARg EcoTox2 ~
AEID3141
ATG
frERl EcoTox2 ~
AEID3123
ATG
imPPARg EcoTo)
AEID3145
ATG
zfAR EcoTox2 ~
AEID3133 ATG imRXRb EcoTox2 ~
AEID3137
ATG
hERa EcoTox2 ~
AEID3125 ATG zfPPARg EcoTox2 ~
AEID1367
ATG
ERb TRANS2 ~
AEID3139
ATG
zfERl EcoTok2 ~
AEID1371
ATG
TR2 TRANS2 ~
AEID1362
ATG
RORa TRANS2, ~
AEID3147
ATG
M 61 EcoTox2 ~
AE1D1384 Tanguav ZF 120hpf PIG legacy ~
AEID1373 Tanguav ZF 120hpf YSE legacy ~
AEID1378 Tanguav ZF 120hpf OTIC legacy ~
AEID1377 Tanguav ZF 120hpf JAW legacy ~
AEID1380 Tanguav ZF 120hpf BRAI legacy ~
AEID1381 Tanguav ZF 120hpf SOMI legacy ~
AEID1387 Tanguav ZF 120hpf SWIM legacy ~
AEID1379 Tanguav ZF 120hpf PE legacy ~
AEID1619 CLD CYP2C9 8hr~
AEID3155
ATG
HGR EcoTox2 ~
AEID3151
ATG
M 32 EcoTox2 ~
AEID1386 Tanguav ZF 120hpf TRUN legacy ~
AEID3167 CCTE Mundv HCI iCellGABA NOG NeuronCount ~
AE1D162.7 CLD ABCB1 24hr ~
AEID3163 CCTE Deisenroth H295R-HTRF 384WELL CTOX ~
AEID3162 CCTE Deisenroth H295R-HTRF 384WELL TESTOSTERONE ~
AEID3161 CCTE Deisenroth H295R-HTRF 384WELL ESTRADIOL ~
AEID3196 Tanguav ZF 120hpf SM2.4 ~
AEID3153 ATG M 19 EcoTox2. ~
AEID3165 CCTE Mundv HCI iCellGABA NOG NeuriteCount ~
AEID3164 CCTE Mundv HCI iCellGABA NOG BPCount ~
AEID3166 CCTE Mundv HCI iCellGABA NOG NeuriteLength ~
AEID3194 Tanguav ZF 120hpf MQ2.4 ~
AEID3202 Tanguav ZF 120hpf LTRK ~
AEID3149 ATG M 06 EcoTox2. ~
AEID3197 Tanguav ZF 120hpf MORT ~
AEID3198 Tanguav ZF 120hpf CRAN ~
-------
AEIDE iieuav ZF 120hpf ANY ~
AEID3204 Tanguav ZF 120hpf SKIN ~
AEID3195 Tanguav ZF 120hpf DP24 ~
AEID 1632. CLP CYP1A2 24hr ~
AEID3203 Tanguav ZF 120hpf BRN ~
AEID3211 CCTE Padilla ZF Score.Living ~
AEID3216 CCTE Padilla ZF Score.Edema ~
AEID3200 Tanguav ZF 120hpf EDEM ~
AEiD32.15 CCTE Padilla ZF Score.Craniofacial ~
AEID3206 Tanguav ZF 120hpf TCHR ~
AE1D162.5 CLP SULT2.A 6hr ~
AEID3228 CCTE Deisenroth DEVTOX-GLR Meso SOX2 ~
AE ID 1766 ATG GPCR GCGR TRANS ~
AEID3222 CCTE Padilla ZF Score.Anv ~
AEID3219 CCTE Padilla ZF Score.Position ~
AEID3214 CCTE Padilla ZF Score.Swirri bladder ~
AEID3229 CCTE Deisenroth DEVTOX-GLR Meso BRA ~
AE ID 1768 ATG GPCR GPBAR1 TRANS ~
AEID3220 CCTE Padilla ZF Score.Tail ~
AEID3217 CCTE Padilla ZF Score.Spine ~
AEID3224 CCTE Deisenroth DEVTOX-GLR Endo SOX2 ~
AE ID 1790 ATG GPCR MC3R TRANS ~
AEID3226 CCTE Deisenroth DEVTOX-GLR Endo CellCount ~
AEID3199 Tanguav ZF 120hpf AXIS ~
AE ID 1776 ATG GPCR GS TRANS ~
AEID1772 ATG GPCR GQ TRANS ~
AE ID 1770 ATG GPCR GPR40 TRANS ~
AE1D32.18 CCTE Padilla ZF Score.Pigmentation ~
AEID3264 CCTE GLTED hTTR 0.125uM ~
AE1D32.2.7 CCTE Deisenroth DEVTOX-GLR Meso SC «M ' ^
AEID1762 ATG GPCR DRD5 TRANS ~
AEID3223 CCTE Deisenroth DEVTOX-GLR Endo SOX17 ~
AE ID 1786 ATG GPCR MC1R TRANS ~
AE ID 1780 ATG GPCR HTR6 TRANS ~
AE ID 1784 ATG GPCR LPAR4 TRANS ~
AEID1792 ATG GPCR MC4R TRANS ~
AE1D32.37 CCTE Deisenroth DEVTOX-GLR Pluri BRx t."
AE ID1760 ATG GPCR DRD1 TRANS ~
AEID3236 CCTE Deisenroth DEVTOX-GLR Pluri SOX2. ~
AE1D1816 TOX21 AR LUC MDAKB2 Antagonist 0.5nM R1881 ~
AE1D32.31 CCTE Deisenroth DEVTOX-GLR Ecto SOX17 ~
)X21 AR LUC MDAKB2 Antagonist 0.5nM R1881 viabili
AEID3230 CCTE Deisenr X-GLR Meso CellCount ~
AEID1690 STM H9 OrnCvsslSnorm ratio ~
-------
AE ID 1645
CLD
ABCG2 48hr ~
AEID3233
CCTE
Deisenroth DEVTOX-GLR
Ecto
B
AE1D1656
CLD
SLCOIBI 48hr~
AEID3238
CCTE
Deisenroth DEVTOX-GLR
Pluri
CellCount ~
AE ID1794
ATG
GPCR PTGDR TRANS ~
AEID3234
CCTE
Deisenroth DEVTOX-GLR
Ecto
CellCount ~
AE ID1778
ATG
GPCR HRH1 TRANS ~
AE ID1774
ATG
GPCR GS1 TRANS ~
AE ID 1782
ATG
GPCR HTR7 TRANS~
AEID1825 AruiiA CellTiter hNP ~
AEID1838 ArunA NOG BranchPointsPerNeurite ~
AEID1931 ATG frER2 XSP1 ~
AEID1822 TOX21 AR LUC MDAKB2 Agonist 3uM Nilutamide ~
AEID1839 TC Antagonist ~
AEID1829 ArunA Migration hNC ~
AEiD1915 ATG frAR XSP1 ~
AEID1927
ATG
zfER2a XSPI ~
AE ID1917
ATG
hAR XSPi ~
AEID1919
ATG
trAR XSPI ~
AEID1850 AC I A AR agonist AUC viability ~
AEID1929 ATG zfER2b XSP1 ~
AEID1999 ATG zfER2b XSP2 ~
AEID2007 ATG frER2 XSP2 ~
AEID1983 ATG zfERl XSP2 ~
AEID1857 AC I A AR antagonist AUC viability ~
AE ID 2.015 ATG rriPXR XSP2 ~
AE1D2.009 ATG mPPARg XSP2 ~
AE ID 2.013 ATG hPPARg XSP2 ~
AE ID 2.019 ATG zfT'Ra XSP2. ~
AE1D2.02.1 ATG zfTRb XSP2. ~
AE1D2.02.7 ATG hTRa XSP2. ~
AE1D2.057 TOX2.1 ERR LUC Antagonist ~
AEID2054 TOX21 ERa LUC VM7 Antagonist O.lnM E2 viability ~
AE1D2U5 TOX2.1 ERb BLA Agonist ratio ~
AEID2143 CEETOX H295R 11DCORT noMTC ~
AE1D2.2.14 TOX2.1 PR BLA Followup Agonist chl ~
AEID2119 TOX21 ERb BLA Antagonist ratio ~
AEID2216 TOX21 PR BLA Followup Agonist ch2. ~
AEID2212 TOX21 ERa LUC VM7 Agoni VI ICI182780 viabil
AEID2456 CCTE Shafer MEA acute firing rate mean ~
AEID2454 CCTE Shafer MEA acute spike number ~
AE1D2.2.15 TOX2.1 PR BLA Followup Antagonist chl ~
AEID2147 CEETOX H295R OHPROG noMTC ~
AEID1340 TOX21 ES onis
-------
AEID2211 T0X21 ERa LUC VM7 Agonist IQiiM ICI182780 ~
AEID2217 TOX2.1 PR BLA Followup Antagonist ch2. ~
AEID2482 CCTE Shafer MEA acute synchrony index ~
AEID2485 CCTE Deisenroth AIME 96WELL CTox Active ~
AEID2.793 CCTE Mundv HCI hNPl Casp3 7 ~
AEiD2780 CCTE Mundv HCI Cortical NOG NeuronCount ~
AE1D2.792. CCTE Mundv HCI HN2 NOG NeuronCount ~
AEID2362 TOX21 PXR LUC Agonist viability ~
AE1D2.781 CCTE Mundv HCI Cortical Synap Neur Matur BPCount ~
AE ID 2.794 CCTE Mundv HCI hNPl CellTiter ~
A > CCTE Mundv HCI Cortical Synap Neur Matur NeuriteSpotCountPerNeuriteLength
~
AEID2813 BSK BT slgG ~
AE1D2.784 CCTE Mundv HCI Cortical Synap Neur Matur NeuriteLength ~
AEID2823 BSK BT xTNFa ~
AEID2809 BSK BT Bcell Proliferation ~
AEID2811 BSK BT PBMCCytotoxicity ~
AEID2827 BSK MvoF bFGF ~
AEID2819 BSK BT x!L2 ~
AEID2825 BSK MvoF ACTA1 ~
AEID2815 BSK BT xfLUAj»
AEID2845 BSK MvoF SRB ~
AEID2833 BSK MvoF Collagenlll ~
AEID2829 BSK MvoF VCAM1 ~
AEID2875 BSK BF4T tPA ~
AEID2905 BSK KF3CT MIG ~
AEID2929 BSK IMphg SRB ~
AEID2938 1UF NPC2a radial glia migration 72hr ~
AEID2909 BSK IMphg MCPl ~
AE1D2.92.7 BSK IMphg IL10 ~
AEID2933 BSK LPS Thrombomodulin ~
AEID2923 BSK IMphg ILla ~
AE1D3019 VALA TUBHUV Agonist CellCount ~
AE1D302.1 VALA TUBHUV Antagonist CellCount ~
AE1D302.3 VALA TUBIPS Agonist CellCount ~
AEID3088 CCTE GLTED hTBG ~
AEID2950 IUF NPC4 neurite area 120hr ~
AEID2940 IUF NPC2a radial glia migration 120hr ~
AEID302.4 VALA TUBIPS Agonist TubuleLength ~
AEID302.0 VALA TUBHUV Agonist TubuleLength ~
AEID3107 ATG hPPARa EcoTox2. ~
AEID2952 IUF N godendrocvte differentiation 120hr ~
AEID3089 CCTE GLTED hTTR O.SuM ~
AEID3143 ATG hAR EcoTox2. ~
-------
AEID3076 CCTE Deisenroth 5AR NBTE acceptor ~
AEID1612 CLP ABCB11 6hr ~
AEID3072 CCTE Deisenroth 5AR NBTE autofluor ~
AEID1364 ATG RXRg TRANS2. ~
AEID3094 CCTE Deisenroth DEVTOX-GLR legacy Soxl7 ~
AEID1382 Taneuav ZF 120hpf PFIN legacy ~
AE1D162.0 CLP CYP3A4 6hr ~
AEID3205 Taneuav ZF 120hpf NC ~
AE1D3235 CCTE Deisenroth DEVTOX-GLR Pluri SOX17 ~
AEID3221 CCTE Padilla ZF Score.Blood pooling ~
AEID3213 CCTE Padilla ZF Score.General
~
AEID3225 CCTE Deisenroth DEVTOX-GLR
Endo BRA ~
AEID 1754 ATG GPCR ADRB2 TRANS ~
AEID3232 CCTE Deisenroth DEVTOX-GLR
Ecto SOX2 ~
AE ID 1788 ATG GPCR MC2R TRANS ~
AEID1833 ArunA NOG NeuriteLength ~
AEID1797 Tanguay ZF 12Qhpf ActivityScore legacy ~
At H iO<21 GR BLA Agonist ch2. ~
AEID2145 CEETOX H295R OHPREG noMTC ~
AEID1935 ATG hERa XSP1 ~
AEID2787 CCTE Mundv HCI Cortical Synap Neur Matur NeuronCount ~
AEID1353 ATG Rev ERB A TRANS2. ~
AEID2881 BSK BE3C 1TAC ~
AEID3212 CCTE Padilla ZF Score.Hatched ~
AEID3201 Taneuav ZF 120hpf MUSC ~
-------
Assay EndpointlD:2
ACEA_ER_80hr
1. General Information
1.1 Assay Title: ACEA Biosciences xCELLigence Real-Time Cell Analysis on Estrogen Receptor Agonism for
Proliferation
1.2 Assay Summary: ACEA_ER is a cell-based, single-readout assay that uses T47D, a human breast cell line, with
measurements taken at 80 hours after chemical dosing in a 96-well plate, although T02 (mcO.srcf) used a 384-
well plate. Differences in plate size can be ignored given data normalization. ACEA_ER_80hr is one of two assay
component(s) measured or calculated from the ACEA_ER assay. It is designed to make measurements of real-
time cell-growth kinetics, a form of growth reporter, as detected with electrical impedance signals by Real-Time
Cell Electrode Sensor (RT-CES) technology. Data from the assay component ACEA_ER_80hr was analyzed into 1
assay endpoint. This assay endpoint, ACEA_ER_80hr_Positive, was analyzed in the positive analysis fitting
direction relative to DMSO as the negative control and baseline of activity. Using a type of growth reporter,
measures of the cells for gain-of-signal activity can be used to understand the signaling at the pathway-level as
they relate to the gene ESR1. Furthermore, this assay endpoint can be referred to as a primary readout, because
this assay has produced multiple assay endpoints where this one serves a signaling function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended
target family, where the subfamily is steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: ACEA Biosciences, Inc. (ACEA) is a privately owned biotechnology company that developed a
realtime, label free, cell growth assay system called xCELLigence based on a microelectronic impedance readout.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; xCELLigence RTCA software and biosensor technology are
available from ACEA Biosciences, Inc. and T-47D cells are commercially available from American Type Culture
Collection (ATCC HTB-133) with signed Material Transfer Agreement (MTA).
1.9 Assay Throughput: 96-well plate. The assay is conducted on 96-well plates with each plate containing positive
controls for proliferation (17 beta-estradiol) and cytotoxicity (MG132), negative controls (assay media, RPMI
1640), and two concentrations (0.5 percent and 0.125 percent) of DMSO solvent controls. Following a 24-hour
incubation period, the cells are exposed to test chemicals for 80 hours and response is monitored no less than
once per hour.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Electrical impedance is used to quantify changes to the growth of the cells where increase
impedance is positively correlated with increased cell growth.
The ACEA_ER assay exposed human breast carcinoma cell (T-47D) cultures to the ToxCast library of diverse
environmental chemicals using an eight-point, 1:4 dilution series concentration-response format (starting at a
maximum final concentration of lOOuM), using MG132 (cytotoxicity) and Estradiol (E2) (proliferation) as positive
-------
controls and assay media and DMSO as a negative control and solvent control, respectively. All control chemicals
were tested in quadruplicate on each plate. The ACEA_ER assay analyzed changes in cell adhesion and
morphology at the electrode: solution interface (located on the bottom of culture wells) using electronic
microsensors. Changes in electrical impedance were monitored in real-time at the plate surface to investigate
the potential activation of the estrogen signaling pathway and subsequent increases in growth or changes in
cell structure following 80-hour incubation with the test chemicals. The electrical signal produced by the
experimental system can be used to detect changes in cell number, morphology and adhesion which occur in
response to xenoestrogenic activation of ER-mediated pathways, and concentration-response curves were
modeled for each chemical to determine half-maximal activity levels.
2.2 Scientific Principles: Endocrine disrupting chemicals (EDCs) interfere with normal hormone biosynthesis,
signaling or metabolism and impact regulatory pathways in humans and wildlife. Many EDCs interfere with
normal steroidal activity by impacting estrogenic signaling pathways. The estrogen receptor mediates gene
expression in response to estrogen exposure, and modulates the activity for a wide variety of physiological
processes. The activity of estrogenic chemicals is generally probed in vitro by monitoring ligand-binding in
experimental systems, however estrogenic potency is also a function of interaction with transcriptional
machinery and other signaling pathways. This assay was designed to identify chemical perturbagens which can
affect a cell proliferation response in human breast carcinoma cells by acting as xenoestrogenic compounds
which impact estrogen signaling pathways. While cell proliferation rates can be altered via multiple pathways,
growth responses in T47D cells are considered to be particularly reliable indicators of estrogenic activation. This
assay is intended for use as a part of an integrated testing strategy, to screen a large structurally diverse chemical
library for compounds which potentially affect endocrine systems in exposed populations by interacting with
estrogen receptor mediated signaling pathways. There is strong evidence that estrogen receptor activity in early
life is a molecular initiating event (MIE) in a developing Adverse Outcome Pathways (AOP) leading to breast
cancer in both animal and human models and to endometrial carcinoma in the mouse, and ER agonism is the
leading to reproductive dysfunction in oviparous vertebrates, and there is some evidence that estrogen receptor
activation is the MIE for putative adverse outcome pathways leading to reduced survival due to renal failure
and leading to skewed sex ratios due to altered sexual differentiation in males. ER antagonism has strong
evidence as the MIE for an AOP describing reduction of vitellogenin synthesis in liver, which can lead to reduced
cumulative fecundity in repeat-spawning fish species. Chemical-activity profiles derived from this assay can
inform prioritization decisions for compound selection in more resource intensive in vivo studies to further
investigate the involvement of ER interference in pathways leading to hazardous outcomes in biological systems.
2.3 Experimental System: adherent T47D cell line used. T-47D human breast carcinoma ductal cell line, originally
derived in 1974 from pleural effusion of a 57-year-old patient, which exhibits epithelial-like morphology
(Horwitz et al. 1978, Keydar et al. 1979).
2.4 Metabolic Competence: T-47D cells contain specific high affinity receptors for estradiol, progesterone,
glucocorticoid and androgen (Horwitz et al. 1978). Some potential for P450 mediated metabolism is present,
e.g. CYP1A1, CYP1A2, CYP1B1 (Angus et al. 1999, Hevir et al. 2011, MacPherson and Matthews 2010, Spink et
al. 2002, Spink et al. 1998), CYP2B6 (Lo et al. 2010), CYP3A4 (Nagaoka et al. 2006) and CYP2C8(Mitra et al. 2011),
as well as some experimental evidence for the capacity to retain expression of some phase II metabolizing
enzymes, e.g., UGTs (Harrington et al. 2006, Hevir et al. 2011), GSTs (Hevir et al. 2011) and sulphotransferases
e.g., SULTlA3(Miki et al. 2006), SULT1E1, SULT2B1 (Hevir et al. 2011).
2.5 Exposure Regime: The xCELLigence system Multi-E-Plate stations were used to measure the time-dependent
response to chemicals. Each compound was tested in an eight-point, 1:4 serial dilution series starting at a
maximum final concentration of 100 uM. A maximum starting concentration of 0.5% DMSO was present in the
100 uM chemical samples and was diluted along with the test article dilution series. The screen was performed
in biological duplicate using two separate, 96-well, E-Plates 96 for each dilution series (n = 2). Positive controls
(MG132 and E2) and a negative control (assay media) were tested in quadruplicate on each testing plate. Then,
0.5% and 0.125% DMSO were tested in duplicates in each plate to serve as solvent controls for the 2 highest
concentrations of testing compounds: 100 uM and 25 uM. Reference compounds were tested with 8
-------
concentrations with 1:5 serial dilutions. All screening was carried out by ACEA Biosciences, Inc. (San Diego, CA).
T-47D cells purchased from ATCC were maintained in RPMI1640 media supplemented with 10% characterized
fetal bovine serum (FBS). Before screening, T-47D cells were preconditioned in assay medium: phenol red-free
RPMI1640 supplemented with 10% charcoal-stripped FBS. Cells were then detached and seeded in E-Plates 96
in assay medium. After overnight monitoring of growth once every hour, compounds were added toT-47D cells
and remained in the medium until the end of the experiment. Cellular responses were then recorded once every
5 min for the first 5 h, and once every hour for an additional 100 h.
Baseline median absolute deviation for the assay (bmad): 8.497
Response cutoff threshold used to determine hit calls: 25.492
Detection technology used: RT-CES (Label Free Technology)
2.6 Response: Increased cell proliferation in response to xenoestrogenic interference with ER-mediated pathways
as measured by monitoring electrical impedance at the cell-plate interface. One possible effect of endocrine
disrupting chemicals is increased cell growth through perturbation of pathways linked to cell cycle regulation.
Activation of the estrogen receptor (ER) signaling pathway, for example, is one possible mechanism that
underlies cell proliferation in hormonally sensitive tissues such as mammary and endometrial tissue. The role of
steroid hormones in the regulation of some mammary tumors has been well established (Russo and Russo 2006,
Yager and Davidson 2006) and has motivated the development of estrogen pathway-based chemotherapeutics.
This assay was designed to identify those chemicals with the potential to affect cell growth by activating the
estrogen receptor-mediated cell proliferation pathway. The assay uses electronic microsensors located at the
bottom of the cell culture well to detect changes in cell number, morphology, and adhesion through electrical
impedance measurement at the electrode-solution interface following 80-hour incubation with test chemicals.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
ToxCast ER Pathway Model: Estrogen receptor assays used in ToxCast ER Pathway model
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
8
Standard minimum concentration tested:
25 nM
Key positive control:
17b-Estradiol
Target (nominal) number of replicates:
3
Standard maximum concentration tested:
250 nM
Neutral vehicle control:
DMSO
-------
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Data were collected from the xCELLigence system which converts raw impedance values into the
Cell Index (CI) value; this is a measure of adhesion where CI = (impedance at time point n - impedance in the
absence of cells)/nominal impedance value. These data were then converted to a Normalized Cell Index
according to the equation NCI(Ti) = CI(Ti)/CI(Tk), where i = 1,2,3,....N where CI(Tk) is the last time point before
chemical addition, CI(Ti) is the cell index at the i-th measured time point, and N is the total number of time
points. Data were grouped by chemical and smoothed to combine replicates using a simple moving average (as
the replicates were assessed in duplicate on separate plates so the time points were not identical). DMSO
controls were considered as baseline for activity, and 17-beta-Estradiol was used as a positive control and 100
percent activity for all the test chemicals on that plate. For cell loss, the NCI value at the time of compound
administration was considered to represent complete (100%) viability. MG132 (2 uM), a proteasome inhibitor
and known cytotoxic agent, was used as the positive control for cell loss and was tested in quadruplicate on
each plate. The minimum average response on each plate was used as a positive control for cell loss for all the
test chemicals on the corresponding plate. If a chemical sample was run on two different plates, then the
minimum NCI values for MG132 were averaged. If an NCI value for MG132 fell below zero, the response was
considered to be below the limit of detection and was replaced with the minimum value greater than zero across
all plates. All smoothened NCI values were then converted to a percentage of positive control, which was
considered to represent no (0%) viability. Concentration response curves were generated using smoothed NCI
values and all statistical analyses were conducted using R programming language, employing tcpl package to
generate model parameters and confidence intervals.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
3: pval.apid.medpcbyconc.max (Calculate the positive control value (pval) as the plate-wise maximum,
by assay plate ID (apid), of the medians of the corrected values (cval) forgain-of-signal single- or multiple-
concentration negative control wells (wilt = m or o) by apid, well type, and concentration.), 5: resp.pc
(Calculate the normalized response (resp) as a percent of control, i.e. the ratio of the difference between
the corrected (cval) and baseline (bval) values divided the difference between the positive control (pval)
and baseline (bval) values multiplied by 100; resp = (cval-bval)/(pval-bval)*100.), 17:
bval.apid.nwllslowconc.med (Calculate the baseline value (bval) as the plate-wise median, by assay plate
ID (apid), of the corrected values (cval) of test compound wells (wilt = t) with a concentration index (cndx)
of 1 or 2 or neutral control wells (wilt = n).), 18: resp.shiftneg.3bmad (Shift all the normalized response
values (resp) less than -3 multiplied by the baseline median absolute deviation (bmad) to 0; if resp < -
3*bmad, resp = 0.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
-------
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
1: bmad3 (Add a cutoff value of 3 multiplied by the baseline median absolute deviation (bmad) as defined
at Level 4.), 2: pc20 (Add a cutoff value of 20. Typically for percent of control data.), 28:
ow_bidirectional_gain (Multiply winning model hitcall (hitc) by -1 for models fit in the negative analysis
direction. Typically used for endpoints where only positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 3395 Number of chemicals tested: 3183
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
401
Inactive hit count: 0
-------
power(pow) model:
linear-polynomial (polyl) model:
939
116
quadratic-polynomial(poly2) model: 421
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
4
774
17
147
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed
-------
1.645
Neutral control median absolute deviation, by plate: nmad 0.109
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 6.98%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed 2.993
Positive control well median absolute deviation, by plate: pmad 0.203
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: 5.416
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed 1
Negative control well median absolute deviation value, by plate: mmad 0
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: -5.314
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 147.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
-------
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Xing JZ, Zhu L, Gabos S, Xie L. Microelectronic cell sensor assay for detection of cytotoxicity and
prediction of acute toxicity. Toxicol In Vitro. 2006 Sep;20(6):995-1004. Epub 2006 Feb 14. PubMed PMID:
16481145., Rotroff DM, Dix DJ, Houck KA, Kavlock RJ, Knudsen TB, Martin MT, Reif DM, Richard AM, Sipes NS,
Abassi YA, Jin C, Stampfl M, Judson RS. Real-time growth kinetics measuring hormone mimicry for ToxCast
chemicals in T-47D human ductal carcinoma cells. Chem Res Toxicol. 2013 Jul 15;26(7):1097-107.
doi:10.1021/tx400117y. Epub 2013 Jun 10. PubMed PMID: 23682706.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1850
ACEA_AR_agon ist_AU C_via bi lity
1. General Information
1.1 Assay Title: ACEA Biosciences xCELLigence Real-Time Cell Analysis on Androgen Receptor Agonism for Viability
1.2 Assay Summary: ACEA_AR_agonist is a cell-based, single-readout assay that uses 22Rvl, a human prostate
cancer cell line, with measurements taken at 80 hours after chemical dosing in a 384-well plate, although T05
and T06 (mcO.srcf) used a 96-well plate. Differences in plate size can be ignored given data normalization.
ACEA_AR_80hr is one of two assay component(s) measured or calculated from the ACEA_AR assay. It is designed
to make measurements of real-time cell-growth kinetics, a form of growth reporter, as detected with electrical
impedance signals by Real-Time Cell Electrode Sensor (RT-CES) technology. Data from the assay component
ACEA_AR_AUC_viability was analyzed in the positive analysis fitting direction relative to DMSO as the negative
control and baseline of activity. Using a type of growth reporter, loss-of-signal activity can be used to
understand changes in the viability. Furthermore, this assay endpoint can be referred to as a secondary readout,
because this assay has produced multiple assay endpoints where this one serves a viability function. To
generalize the intended target to other relatable targets, this assay endpoint is annotated to the cell cycle
intended target family, where the subfamily is cytotoxicity.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: ACEA Biosciences, Inc. (ACEA) is a privately owned biotechnology company that developed a
realtime, label free, cell growth assay system called xCELLigence based on a microelectronic impedance readout.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; xCELLigence RTCA software and biosensor technology are
available from ACEA Biosciences, Inc. and 22Rvl cells are commercially available from American Type Culture
Collection (ATCC HTB-133) with signed Material Transfer Agreement (MTA).
1.9 Assay Throughput: 384-well plate. The assay is conducted on 96-well plates with each plate containing positive
controls for proliferation (testosterone) and cytotoxicity (MG132), negative controls (assay media, RPMI1640),
and two concentrations (0.5 percent and 0.125 percent) of DMSO solvent controls. Following a 24-hour
incubation period, the cells are exposed to test chemicals for 80 hours and response is monitored no less than
once per hour.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Electrical impedance is used to quantify changes to the growth of the cells where increase
impedance is positively correlated with increased cell growth
The ACEA_AR assay exposed human prostate cell (22Rvl) cultures to the ToxCast library of diverse
environmental chemicals using an eight-point, 1:4 dilution series concentration-response format (starting at a
maximum final concentration of lOOuM), using MG132 (cytotoxicity) and testosterone (proliferation) as positive
controls and assay media and DMSO as a negative control and solvent control, respectively. All control chemicals
were tested in quadruplicate on each plate. The ACEA_AR assay analyzed changes in cell adhesion and
-------
morphology at the electrode: solution interface (located on the bottom of culture wells) using electronic
microsensors. Changes in electrical impedance were monitored in real-time at the plate surface to investigate
the potential activation of the androgen signaling pathway and subsequent increases in growth or changes in
cell structure following 80-hour incubation with the test chemicals. The electrical signal produced by the
experimental system can be used to detect changes in cell number, morphology and adhesion which occur in
response to xenoestrogenic activation of AR-mediated pathways, and concentration-response curves were
modeled for each chemical to determine half-maximal activity levels.
2.2 Scientific Principles: Endocrine disrupting chemicals (EDCs) interfere with normal hormone biosynthesis,
signaling or metabolism and impact regulatory pathways in humans and wildlife. Androgens, such as
testosterone, are widely recognized for their importance in sexual development and differentiation but also
play roles in metabolism, growth, development, and behavior and act as an intercellular signal (Bhasin et al.,
2007; Monks & Holmes, 2018; Sumpter, 2005). Agonism of the androgen receptor is listed as a molecular
initiating event in AOP #23, leading to reproductive dysfunction in fish (Villeneuve, 2021).
2.3 Experimental System: adherent 22Rvl cell line used. 22Rvl is a human prostate carcinoma epithelial cell line
derived from a xenograft that was serially propagated in mice.
2.4 Metabolic Competence: The 22Rvl cell line expresses androgen receptor (AR) and prostate-specific antigen
(PSA), both of which are markers of prostate cancer. The presence of these markers in 22Rvl cells confirms their
origin from prostate cancer tissue and highlights their relevance in studying the disease. Importantly, the 22Rvl
cell line is unique in that it expresses both full-length and truncated forms of ARs. This mixed expression pattern
is commonly observed in androgen deprivation resistant prostate cancers, making the 22Rvl cell line a valuable
model for studying the mechanisms underlying resistance to hormonal therapies. Morphologically, 22Rvl cells
exhibit epithelial characteristics and are cultured as adherent monolayers, providing a convenient system for in
vitro experimentation.
2.5 Exposure Regime: The xCELLigence system Multi-E-Plate stations were used to measure the time-dependent
response to chemicals. Each compound was tested in an eight-point, 1:4 serial dilution series starting at a
maximum final concentration of 100 uM. A maximum starting concentration of 0.5% DMSO was present in the
100 uM chemical samples and was diluted along with the test article dilution series. The screen was performed
in biological duplicate using two separate, 96-well, E-Plates 96 for each dilution series (n = 2). Positive controls
(MG132 and testosterone) and a negative control (assay media) were tested in quadruplicate on each testing
plate. Then, 0.5% and 0.125% DMSO were tested in duplicates in each plate to serve as solvent controls for the
2 highest concentrations of testing compounds: 100 uM and 25 uM. Reference compounds were tested with 8
concentrations with 1:5 serial dilutions. All screening was carried out by ACEA Biosciences, Inc. (San Diego, CA).
22Rvl cells purchased from ATCC were maintained in media supplemented with 10% fetal bovine serum (FBS).
Before screening, 22Rvl cells were preconditioned in assay medium. Cells were then detached and seeded in
E-Plates 96 in assay medium. After overnight monitoring of growth once every hour, compounds were added to
T-47D cells and remained in the medium until the end of the experiment. Cellular responses were then recorded
once every 5 min for the first 5 h, and once every hour for an additional 100 h.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
0.144 uM
Key positive control:
MG 132
Target (nominal) number of replicates:
3
Standard maximum concentration tested:
105 uM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 8.956
Response cutoff threshold used to determine hit calls: 26.869
Detection technology used: RT-CES (Label Free Technology)
-------
2.6 Response: Increased cell proliferation in response to xenoestrogenic interference with AR-mediated pathways
as measured by monitoring electrical impedance at the cell-plate interface. One possible effect of endocrine
disrupting chemicals is increased cell growth through perturbation of pathways linked to cell cycle regulation.
Activation of the androgen receptor (AR) signaling pathway, for example, is one possible mechanism that
underlies cell proliferation in hormonally sensitive tissues such as mammary and endometrial tissue. This assay
was designed to identify those chemicals with the potential to affect cell growth by activating the androgen
receptor-mediated cell proliferation pathway. The assay uses electronic microsensors located at the bottom of
the cell culture well to detect changes in cell number, morphology, and adhesion through electrical impedance
measurement at the electrode-solution interface following 80-hour incubation with test chemicals.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Cytotoxicity Burst: Assays used to defne the cytotoxicity burst region
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Data were collected from the xCELLigence system which converts raw impedance values into the
Cell Index (CI) value; this is a measure of adhesion where CI = (impedance at time point n - impedance in the
absence of cells)/nominal impedance value. These data were then converted to a Normalized Cell Index
according to the equation NCI(Ti) = CI(Ti)/CI(Tk), where i = 1,2,3,....N where CI(Tk) is the last time point before
chemical addition, CI(Ti) is the cell index at the i-th measured time point, and N is the total number of time
points. Data were grouped by chemical and smoothed to combine replicates using a simple moving average (as
the replicates were assessed in duplicate on separate plates so the time points were not identical). DMSO
controls were considered as baseline for activity, and testosterone was used as a positive control and 100
percent activity for all the test chemicals on that plate. For cell loss, the NCI value at the time of compound
administration was considered to represent complete (100%) viability. MG132 (2 uM), a proteasome inhibitor
and known cytotoxic agent, was used as the positive control for cell loss and was tested in quadruplicate on
each plate. The minimum average response on each plate was used as a positive control for cell loss for all the
test chemicals on the corresponding plate. If a chemical sample was run on two different plates, then the
minimum NCI values for MG132 were averaged. If an NCI value for MG132 fell below zero, the response was
considered to be below the limit of detection and was replaced with the minimum value greater than zero across
all plates. All smoothened NCI values were then converted to a percentage of positive control, which was
considered to represent no (0%) viability. Concentration response curves were generated using smoothed NCI
values and all statistical analyses were conducted using R programming language, employing tcpl package to
generate model parameters and confidence intervals.
-------
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
5: resp.pc (Calculate the normalized response (resp) as a percent of control, i.e. the ratio of the difference
between the corrected (cval) and baseline (bval) values divided the difference between the positive
control (pval) and baseline (bval) values multiplied by 100; resp = (cval-bval)/(pval-bval)*100.), 6:
resp.multnegl (Multiply the normalized response value (resp) by -1; -l*resp.), 15:
pval.apid.medncbyconc.min (Calculate the positive control value (pval) as the plate-wise minimum, by
assay plate ID (apid), of the medians of the corrected values (cval) for gain-of-signal single- or multiple-
concentration negative control wells (wilt = m or o) by apid, well type, and concentration.), 17:
bval.apid.nwllslowconc.med (Calculate the baseline value (bval) as the plate-wise median, by assay plate
ID (apid), of the corrected values (cval) of test compound wells (wilt = t) with a concentration index (cndx)
of 1 or 2 or neutral control wells (wilt = n).)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
1: bmad3 (Add a cutoff value of 3 multiplied by the baseline median absolute deviation (bmad) as defined
at Level 4.), 2: pc20 (Add a cutoff value of 20. Typically for percent of control data.), 27:
ow_bidirectional_loss (Multiply winning model hitcall (hitc) by -1 for models fit in the positive analysis
direction. Typically used for endpoints where only negative responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
-------
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 19: viability.gnls (Flag series with
an active hit call (hitc >= 0.9) if denoted as cell viability assay with winning model is gain-loss (gnls); if hitc
>= 0.9, modl=="gnls" and cell_viability_assay == 1, then flag.), 20: no.med.gt.3bmad (Flag series where
no median response values are greater than baseline as defined by 3 times the baseline median absolute
deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg is the
number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1845 Number of chemicals tested: 1830
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 173.792
Neutral control median absolute deviation, by plate: nmad 6.742
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 4.03%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed 223.133
Positive control well median absolute deviation, by plate: pmad 20.616
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: 2.438
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed 73.087
Negative control well median absolute deviation value, by plate: mmad 2.686
-------
Z Prime Factor for median negative and neutral control across all plates:
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
NA
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells:
((mmed - nmed) /sqrt(mmad2 + nmad2)
-12.678
Signal-to-noise (median across all plates, using negative control wells):
((mmed-nmed)/nmad)
NA
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 188.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Judson R, Houck K, Paul Friedman K, Brown J, Browne P, Johnston PA, Close DA, Mansouri K,
Kleinstreuer N. Selecting a minimal set of androgen receptor assays for screening chemicals. Regul Toxicol
Pharmacol. 2020 Nov; 117:104764. doi: 10.1016/j.yrtph.2020.104764. Epub 2020 Aug 14. PMID: 32798611;
PMCID: PMC8356084.
-------
7. Supporting Information:
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1852
ACEA_ER_AUC_viability
1. General Information
1.1 Assay Title: ACEA Biosciences xCELLigence Real-Time Cell Analysis on Estrogen Receptor Agonism for Viability
1.2 Assay Summary: ACEA_ER is a cell-based, single-readout assay that uses T47D, a human breast cell line, with
measurements taken at 80 hours after chemical dosing in a 96-well plate, although T02 (mcO.srcf) used a 384-
well plate. Differences in plate size can be ignored given data normalization. ACEA_ER_AUC_viability is one of
two assay component(s) measured or calculated from the ACEA_ER assay. It is designed to make measurements
of real-time cell-growth kinetics, a form of growth reporter, as detected with electrical impedance signals by
Real-Time Cell Electrode Sensor (RT-CES) technology. Data from the assay component ACEA_ER_AUC_viability
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of growth reporter, loss-of-signal activity can be used to understand changes in the
viability. Furthermore, this assay endpoint can be referred to as a secondary readout, because this assay has
produced multiple assay endpoints where this one serves a viability function. To generalize the intended target
to other relatable targets, this assay endpoint is annotated to the cell cycle intended target family, where the
subfamily is cytotoxicity.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: ACEA Biosciences, Inc. (ACEA) is a privately owned biotechnology company that developed a
realtime, label free, cell growth assay system called xCELLigence based on a microelectronic impedance readout.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; xCELLigence RTCA software and biosensor technology are
available from ACEA Biosciences, Inc. and T-47D cells are commercially available from American Type Culture
Collection (ATCC HTB-133) with signed Material Transfer Agreement (MTA).
1.9 Assay Throughput: 96-well plate. The assay is conducted on 96-well plates with each plate containing positive
controls for proliferation (17 beta-estradiol) and cytotoxicity (MG132), negative controls (assay media, RPMI
1640), and two concentrations (0.5 percent and 0.125 percent) of DMSO solvent controls. Following a 24-hour
incubation period, the cells are exposed to test chemicals for 80 hours and response is monitored no less than
once per hour.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Electrical impedance is used to quantify changes to the growth of the cells where increase
impedance is positively correlated with increased cell growth
The ACEA_ER assay exposed human breast carcinoma cell (T-47D) cultures to the ToxCast library of diverse
environmental chemicals using an eight-point, 1:4 dilution series concentration-response format (starting at a
maximum final concentration of lOOuM), using MG132 (cytotoxicity) and Estradiol (E2) (proliferation) as positive
controls and assay media and DMSO as a negative control and solvent control, respectively. All control chemicals
were tested in quadruplicate on each plate. The ACEA_ER assay analyzed changes in cell adhesion and
-------
morphology at the electrode: solution interface (located on the bottom of culture wells) using electronic
microsensors. Changes in electrical impedance were monitored in real-time at the plate surface to investigate
the potential activation of the estrogen signaling pathway and subsequent increases in growth or changes in
cell structure following 80-hour incubation with the test chemicals. The electrical signal produced by the
experimental system can be used to detect changes in cell number, morphology and adhesion which occur in
response to xenoestrogenic activation of ER-mediated pathways, and concentration-response curves were
modeled for each chemical to determine half-maximal activity levels.
2.2 Scientific Principles: Endocrine disrupting chemicals (EDCs) interfere with normal hormone biosynthesis,
signaling or metabolism and impact regulatory pathways in humans and wildlife. Many EDCs interfere with
normal steroidal activity by impacting estrogenic signaling pathways. The estrogen receptor mediates gene
expression in response to estrogen exposure, and modulates the activity for a wide variety of physiological
processes. The activity of estrogenic chemicals is generally probed in vitro by monitoring ligand-binding in
experimental systems, however estrogenic potency is also a function of interaction with transcriptional
machinery and other signaling pathways. This assay was designed to identify chemical perturbagens which can
affect a cell proliferation response in human breast carcinoma cells by acting as xenoestrogenic compounds
which impact estrogen signaling pathways. While cell proliferation rates can be altered via multiple pathways,
growth responses in T47D cells are considered to be particularly reliable indicators of estrogenic activation. This
assay is intended for use as a part of an integrated testing strategy, to screen a large structurally diverse chemical
library for compounds which potentially affect endocrine systems in exposed populations by interacting with
estrogen receptor mediated signaling pathways. There is strong evidence that estrogen receptor activity in early
life is a molecular initiating event (MIE) in a developing Adverse Outcome Pathways (AOP) leading to breast
cancer in both animal and human models and to endometrial carcinoma in the mouse, and ER agonism is the
leading to reproductive dysfunction in oviparous vertebrates, and there is some evidence that estrogen receptor
activation is the MIE for putative adverse outcome pathways leading to reduced survival due to renal failure
and leading to skewed sex ratios due to altered sexual differentiation in males. ER antagonism has strong
evidence as the MIE for an AOP describing reduction of vitellogenin synthesis in liver, which can lead to reduced
cumulative fecundity in repeat-spawning fish species. Chemical-activity profiles derived from this assay can
inform prioritization decisions for compound selection in more resource intensive in vivo studies to further
investigate the involvement of ER interference in pathways leading to hazardous outcomes in biological systems.
2.3 Experimental System: adherent T47D cell line used. T-47D human breast carcinoma ductal cell line, originally
derived in 1974 from pleural effusion of a 57-year-old patient, which exhibits epithelial-like morphology
(Horwitz et al. 1978, Keydar et al. 1979).
2.4 Metabolic Competence: T-47D cells contain specific high affinity receptors for estradiol, progesterone,
glucocorticoid and androgen (Horwitz et al. 1978). Some potential for P450 mediated metabolism is present,
e.g. CYP1A1, CYP1A2, CYP1B1 (Angus et al. 1999, Hevir et al. 2011, MacPherson and Matthews 2010, Spink et
al. 2002, Spink et al. 1998), CYP2B6 (Lo et al. 2010), CYP3A4 (Nagaoka et al. 2006) and CYP2C8(Mitra et al. 2011),
as well as some experimental evidence for the capacity to retain expression of some phase II metabolizing
enzymes, e.g., UGTs (Harrington et al. 2006, Hevir et al. 2011), GSTs (Hevir et al. 2011) and sulphotransferases
e.g., SULTlA3(Miki et al. 2006), SULT1E1, SULT2B1 (Hevir et al. 2011).
2.5 Exposure Regime: The xCELLigence system Multi-E-Plate stations were used to measure the time-dependent
response to chemicals. Each compound was tested in an eight-point, 1:4 serial dilution series starting at a
maximum final concentration of 100 uM. A maximum starting concentration of 0.5% DMSO was present in the
100 uM chemical samples and was diluted along with the test article dilution series. The screen was performed
in biological duplicate using two separate, 96-well, E-Plates 96 for each dilution series (n = 2). Positive controls
(MG132 and E2) and a negative control (assay media) were tested in quadruplicate on each testing plate. Then,
0.5% and 0.125% DMSO were tested in duplicates in each plate to serve as solvent controls for the 2 highest
concentrations of testing compounds: 100 uM and 25 uM. Reference compounds were tested with 8
concentrations with 1:5 serial dilutions. All screening was carried out by ACEA Biosciences, Inc. (San Diego, CA).
T-47D cells purchased from ATCC were maintained in RPMI1640 media supplemented with 10% characterized
-------
fetal bovine serum (FBS). Before screening, T-47D cells were preconditioned in assay medium: phenol red-free
RPMI1640 supplemented with 10% charcoal-stripped FBS. Cells were then detached and seeded in E-Plates 96
in assay medium. After overnight monitoring of growth once every hour, compounds were added to T-47D cells
and remained in the medium until the end of the experiment. Cellular responses were then recorded once every
5 min for the first 5 h, and once every hour for an additional 100 h.
Baseline median absolute deviation for the assay (bmad): 5.861
Response cutoff threshold used to determine hit calls: 20
Detection technology used: RT-CES (Label Free Technology)
2.6 Response: Increased cell proliferation in response to xenoestrogenic interference with ER-mediated pathways
as measured by monitoring electrical impedance at the cell-plate interface. One possible effect of endocrine
disrupting chemicals is increased cell growth through perturbation of pathways linked to cell cycle regulation.
Activation of the estrogen receptor (ER) signaling pathway, for example, is one possible mechanism that
underlies cell proliferation in hormonally sensitive tissues such as mammary and endometrial tissue. The role of
steroid hormones in the regulation of some mammary tumors has been well established (Russo and Russo 2006,
Yager and Davidson 2006) and has motivated the development of estrogen pathway-based chemotherapeutics.
This assay was designed to identify those chemicals with the potential to affect cell growth by activating the
estrogen receptor-mediated cell proliferation pathway. The assay uses electronic microsensors located at the
bottom of the cell culture well to detect changes in cell number, morphology, and adhesion through electrical
impedance measurement at the electrode-solution interface following 80-hour incubation with test chemicals.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Cytotoxicity Burst: Assays used to defne the cytotoxicity burst region, ToxCast ER Pathway Model: Estrogen
receptor assays used in ToxCast ER Pathway model
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
8
Standard minimum concentration tested:
25 nM
Key positive control:
MG 132
Target (nominal) number of replicates:
3
Standard maximum concentration tested:
250 nM
Neutral vehicle control:
DMSO
-------
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Data were collected from the xCELLigence system which converts raw impedance values into the
Cell Index (CI) value; this is a measure of adhesion where CI = (impedance at time point n - impedance in the
absence of cells)/nominal impedance value. These data were then converted to a Normalized Cell Index
according to the equation NCI(Ti) = CI(Ti)/CI(Tk), where i = 1,2,3,....N where CI(Tk) is the last time point before
chemical addition, CI(Ti) is the cell index at the i-th measured time point, and N is the total number of time
points. Data were grouped by chemical and smoothed to combine replicates using a simple moving average (as
the replicates were assessed in duplicate on separate plates so the time points were not identical). DMSO
controls were considered as baseline for activity, and 17-beta-Estradiol was used as a positive control and 100
percent activity for all the test chemicals on that plate. For cell loss, the NCI value at the time of compound
administration was considered to represent complete (100%) viability. MG132 (2 uM), a proteasome inhibitor
and known cytotoxic agent, was used as the positive control for cell loss and was tested in quadruplicate on
each plate. The minimum average response on each plate was used as a positive control for cell loss for all the
test chemicals on the corresponding plate. If a chemical sample was run on two different plates, then the
minimum NCI values for MG132 were averaged. If an NCI value for MG132 fell below zero, the response was
considered to be below the limit of detection and was replaced with the minimum value greater than zero across
all plates. All smoothened NCI values were then converted to a percentage of positive control, which was
considered to represent no (0%) viability. Concentration response curves were generated using smoothed NCI
values and all statistical analyses were conducted using R programming language, employing tcpl package to
generate model parameters and confidence intervals.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
5: resp.pc (Calculate the normalized response (resp) as a percent of control, i.e. the ratio of the difference
between the corrected (cval) and baseline (bval) values divided the difference between the positive
control (pval) and baseline (bval) values multiplied by 100; resp = (cval-bval)/(pval-bval)*100.), 6:
resp.multnegl (Multiply the normalized response value (resp) by -1; -l*resp.), 15:
pval.apid.medncbyconc.min (Calculate the positive control value (pval) as the plate-wise minimum, by
assay plate ID (apid), of the medians of the corrected values (cval) for gain-of-signal single- or multiple-
concentration negative control wells (wilt = m or o) by apid, well type, and concentration.), 17:
bval.apid.nwllslowconc.med (Calculate the baseline value (bval) as the plate-wise median, by assay plate
ID (apid), of the corrected values (cval) of test compound wells (wilt = t) with a concentration index (cndx)
of 1 or 2 or neutral control wells (wilt = n).)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
-------
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
1: bmad3 (Add a cutoff value of 3 multiplied by the baseline median absolute deviation (bmad) as defined
at Level 4.), 2: pc20 (Add a cutoff value of 20. Typically for percent of control data.), 27:
ow_bidirectional_loss (Multiply winning model hitcall (hitc) by -1 for models fit in the positive analysis
direction. Typically used for endpoints where only negative responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 19: viability.gnls (Flag series with
an active hit call (hitc >= 0.9) if denoted as cell viability assay with winning model is gain-loss (gnls); if hitc
>= 0.9, modl=="gnls" and cell_viability_assay == 1, then flag.), 20: no.med.gt.3bmad (Flag series where
no median response values are greater than baseline as defined by 3 times the baseline median absolute
deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg is the
number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 3395 Number of chemicals tested: 3183
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
895
Inactive hit count: Oihitc 0.9
1432
WINING MODEL SELECTION
NA hit count: hitc^O
1068
Number of sample-assay endpoints with winning hill model:
gain-loss (gnls) model:
115
418
-------
power(pow) model:
linear-polynomial (polyl) model:
251
918
quadratic-polynomialfpoly2) model: 732
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
246
39
545
126
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
-------
Neutral control well median response value, by plate: nmed
108.849
Neutral control median absolute deviation, by plate: nmad 4.323
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 4.2%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed 146.774
Positive control well median absolute deviation, by plate: pmad 7.057
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: 4.208
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed 18.516
Negative control well median absolute deviation value, by plate: mmad 3.052
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: -14.527
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 246.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
-------
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Xing JZ, Zhu L, Gabos S, Xie L. Microelectronic cell sensor assay for detection of cytotoxicity and
prediction of acute toxicity. Toxicol In Vitro. 2006 Sep;20(6):995-1004. Epub 2006 Feb 14. PubMed PMID:
16481145., Rotroff DM, Dix DJ, Houck KA, Kavlock RJ, Knudsen TB, Martin MT, Reif DM, Richard AM, Sipes NS,
Abassi YA, Jin C, Stampfl M, Judson RS. Real-time growth kinetics measuring hormone mimicry for ToxCast
chemicals in T-47D human ductal carcinoma cells. Chem Res Toxicol. 2013 Jul 15;26(7):1097-107.
doi:10.1021/tx400117y. Epub 2013 Jun 10. PubMed PMID: 23682706.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1855
ACEA_AR_agon ist_80h r
1. General Information
1.1 Assay Title: ACEA Biosciences xCELLigence Real-Time Cell Analysis on Androgen Receptor Agonism for
Proliferation
1.2 Assay Summary: ACEA_AR_agonist is a cell-based, single-readout assay that uses 22Rvl, a human prostate
cancer cell line, with measurements taken at 80 hours after chemical dosing in a 384-well plate, although T05
and T06 (mcO.srcf) used a 96-well plate. Differences in plate size can be ignored given data normalization.
ACEA_AR_agonist_80hr is one of two assay component(s) measured or calculated from the ACEA_AR assay. It
is designed to make measurements of real-time cell-growth kinetics, a form of growth reporter, as detected
with electrical impedance signals by Real-Time Cell Electrode Sensor (RT-CES) technology. Data from the assay
component ACEA_AR_agonist_80hr was analyzed in the positive analysis fitting direction relative to DMSO as
the negative control and baseline of activity. Using a type of growth reporter, measures of the cells for gain-of-
signal activity can be used to understand the signaling at the pathway-level as they relate to the geneAR
Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has produced
multiple assay endpoints where this one serves a signaling function. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the
subfamily is steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: ACEA Biosciences, Inc. (ACEA) is a privately owned biotechnology company that developed a
realtime, label free, cell growth assay system called xCELLigence based on a microelectronic impedance readout.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; xCELLigence RTCA software and biosensor technology are
available from ACEA Biosciences, Inc. and 22Rvl cells are commercially available from American Type Culture
Collection (ATCC HTB-133) with signed Material Transfer Agreement (MTA).
1.9 Assay Throughput: 384-well plate. The assay is conducted on 96-well plates with each plate containing positive
controls for proliferation (testosterone) and cytotoxicity (MG132), negative controls (assay media, RPMI1640),
and two concentrations (0.5 percent and 0.125 percent) of DMSO solvent controls. Following a 24-hour
incubation period, the cells are exposed to test chemicals for 80 hours and response is monitored no less than
once per hour.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Electrical impedance is used to quantify changes to the growth of the cells where increase
impedance is positively correlated with increased cell growth
The ACEA_AR assay exposed human prostate cell (22Rvl) cultures to the ToxCast library of diverse
environmental chemicals using an eight-point, 1:4 dilution series concentration-response format (starting at a
maximum final concentration of lOOuM), using MG132 (cytotoxicity) and testosterone (proliferation) as positive
-------
controls and assay media and DMSO as a negative control and solvent control, respectively. All control chemicals
were tested in quadruplicate on each plate. The ACEA_AR assay analyzed changes in cell adhesion and
morphology at the electrode: solution interface (located on the bottom of culture wells) using electronic
microsensors. Changes in electrical impedance were monitored in real-time at the plate surface to investigate
the potential activation of the androgen signaling pathway and subsequent increases in growth or changes in
cell structure following 80-hour incubation with the test chemicals. The electrical signal produced by the
experimental system can be used to detect changes in cell number, morphology and adhesion which occur in
response to xenoestrogenic activation of AR-mediated pathways, and concentration-response curves were
modeled for each chemical to determine half-maximal activity levels.
2.2 Scientific Principles: Endocrine disrupting chemicals (EDCs) interfere with normal hormone biosynthesis,
signaling or metabolism and impact regulatory pathways in humans and wildlife. Androgens, such as
testosterone, are widely recognized for their importance in sexual development and differentiation but also
play roles in metabolism, growth, development, and behavior and act as an intercellular signal (Bhasin et al.,
2007; Monks & Holmes, 2018; Sumpter, 2005). Agonism of the androgen receptor is listed as a molecular
initiating event in AOP #23, leading to reproductive dysfunction in fish (Villeneuve, 2021).
2.3 Experimental System: adherent 22Rvl cell line used. 22Rvl is a human prostate carcinoma epithelial cell line
derived from a xenograft that was serially propagated in mice.
2.4 Metabolic Competence: The 22Rvl cell line expresses androgen receptor (AR) and prostate-specific antigen
(PSA), both of which are markers of prostate cancer. The presence of these markers in 22Rvl cells confirms their
origin from prostate cancer tissue and highlights their relevance in studying the disease. Importantly, the 22Rvl
cell line is unique in that it expresses both full-length and truncated forms of ARs. This mixed expression pattern
is commonly observed in androgen deprivation resistant prostate cancers, making the 22Rvl cell line a valuable
model for studying the mechanisms underlying resistance to hormonal therapies. Morphologically, 22Rvl cells
exhibit epithelial characteristics and are cultured as adherent monolayers, providing a convenient system for in
vitro experimentation.
2.5 Exposure Regime: The xCELLigence system Multi-E-Plate stations were used to measure the time-dependent
response to chemicals. Each compound was tested in an eight-point, 1:4 serial dilution series starting at a
maximum final concentration of 100 uM. A maximum starting concentration of 0.5% DMSO was present in the
100 uM chemical samples and was diluted along with the test article dilution series. The screen was performed
in biological duplicate using two separate, 96-well, E-Plates 96 for each dilution series (n = 2). Positive controls
(MG132 and testosterone) and a negative control (assay media) were tested in quadruplicate on each testing
plate. Then, 0.5% and 0.125% DMSO were tested in duplicates in each plate to serve as solvent controls for the
2 highest concentrations of testing compounds: 100 uM and 25 uM. Reference compounds were tested with 8
concentrations with 1:5 serial dilutions. All screening was carried out by ACEA Biosciences, Inc. (San Diego, CA).
22Rvl cells purchased from ATCC were maintained in media supplemented with 10% fetal bovine serum (FBS).
Before screening, 22Rvl cells were preconditioned in assay medium. Cells were then detached and seeded in
E-Plates 96 in assay medium. After overnight monitoring of growth once every hour, compounds were added to
T-47D cells and remained in the medium until the end of the experiment. Cellular responses were then recorded
once every 5 min for the first 5 h, and once every hour for an additional 100 h.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
0.144 uM
Key positive control:
testosterone
Target (nominal) number of replicates:
3
Standard maximum concentration tested:
105 uM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 7.544
Response cutoff threshold used to determine hit calls: 22.633
-------
Detection technology used: RT-CES (Label Free Technology)
2.6 Response: Increased cell proliferation in response to xenoestrogenic interference with AR-mediated pathways
as measured by monitoring electrical impedance at the cell-plate interface. One possible effect of endocrine
disrupting chemicals is increased cell growth through perturbation of pathways linked to cell cycle regulation.
Activation of the androgen receptor (AR) signaling pathway, for example, is one possible mechanism that
underlies cell proliferation in hormonally sensitive tissues such as mammary and endometrial tissue. This assay
was designed to identify those chemicals with the potential to affect cell growth by activating the androgen
receptor-mediated cell proliferation pathway. The assay uses electronic microsensors located at the bottom of
the cell culture well to detect changes in cell number, morphology, and adhesion through electrical impedance
measurement at the electrode-solution interface following 80-hour incubation with test chemicals.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Data were collected from the xCELLigence system which converts raw impedance values into the
Cell Index (CI) value; this is a measure of adhesion where CI = (impedance at time point n - impedance in the
absence of cells)/nominal impedance value. These data were then converted to a Normalized Cell Index
according to the equation NCI(Ti) = CI(Ti)/CI(Tk), where i = 1,2,3,....N where CI(Tk) is the last time point before
chemical addition, CI(Ti) is the cell index at the i-th measured time point, and N is the total number of time
points. Data were grouped by chemical and smoothed to combine replicates using a simple moving average (as
the replicates were assessed in duplicate on separate plates so the time points were not identical). DMSO
controls were considered as baseline for activity, and testosterone was used as a positive control and 100
percent activity for all the test chemicals on that plate. For cell loss, the NCI value at the time of compound
administration was considered to represent complete (100%) viability. MG132 (2 uM), a proteasome inhibitor
and known cytotoxic agent, was used as the positive control for cell loss and was tested in quadruplicate on
each plate. The minimum average response on each plate was used as a positive control for cell loss for all the
test chemicals on the corresponding plate. If a chemical sample was run on two different plates, then the
minimum NCI values for MG132 were averaged. If an NCI value for MG132 fell below zero, the response was
considered to be below the limit of detection and was replaced with the minimum value greater than zero across
all plates. All smoothened NCI values were then converted to a percentage of positive control, which was
considered to represent no (0%) viability. Concentration response curves were generated using smoothed NCI
-------
values and all statistical analyses were conducted using R programming language, employing tcpl package to
generate model parameters and confidence intervals.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
3: pval.apid.medpcbyconc.max (Calculate the positive control value (pval) as the plate-wise maximum,
by assay plate ID (apid), of the medians of the corrected values (cval) forgain-of-signal single- or multiple-
concentration negative control wells (wilt = m or o) by apid, well type, and concentration.), 5: resp.pc
(Calculate the normalized response (resp) as a percent of control, i.e. the ratio of the difference between
the corrected (cval) and baseline (bval) values divided the difference between the positive control (pval)
and baseline (bval) values multiplied by 100; resp = (cval-bval)/(pval-bval)*100.), 17:
bval.apid.nwllslowconc.med (Calculate the baseline value (bval) as the plate-wise median, by assay plate
ID (apid), of the corrected values (cval) of test compound wells (wilt = t) with a concentration index (cndx)
of 1 or 2 or neutral control wells (wilt = n).), 18: resp.shiftneg.3bmad (Shift all the normalized response
values (resp) less than -3 multiplied by the baseline median absolute deviation (bmad) to 0; if resp < -
3*bmad, resp = 0.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
1: bmad3 (Add a cutoff value of 3 multiplied by the baseline median absolute deviation (bmad) as defined
at Level 4.), 2: pc20 (Add a cutoff value of 20. Typically for percent of control data.), 28:
ow_bidirectional_gain (Multiply winning model hitcall (hitc) by -1 for models fit in the negative analysis
direction. Typically used for endpoints where only positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
-------
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1845 Number of chemicals tested: 1830
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 4.02
Neutral control median absolute deviation, by plate: nmad 0.204
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 5.14%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed 6.915
Positive control well median absolute deviation, by plate: pmad 0.853
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: 2.698
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed 1
-------
Negative control well median absolute deviation value, by plate: mmad 0
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: -14.766
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 106.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6.
Bibliography: Judson R, Houck K, Paul Friedman K, Brown J, Browne P, Johnston PA, Close DA, Mansouri K,
Kleinstreuer N. Selecting a minimal set of androgen receptor assays for screening chemicals. Regul Toxicol
-------
Pharmacol. 2020 Nov; 117:104764. doi: 10.1016/j.yrtph.2020.104764. Epub 2020 Aug 14. PMID: 32798611;
PMCID: PMC8356084.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1856
ACEA_AR_antagonist_80hr
1. General Information
1.1 Assay Title: ACEA Biosciences xCELLigence Real-Time Cell Analysis on Androgen Receptor Antagonism for
Proliferation
1.2 Assay Summary: ACEA_AR_antagonist is a cell-based, single-readout assay that uses 22Rvl, a human prostate
cancer cell line, with measurements taken at 80 hours after chemical dosing in a 384-well plate, although T05
and T06 (mcO.srcf) used a 96-well plate. Differences in plate size can be ignored given data normalization.
ACEA_AR_antagonist_80hr is one of two assay component(s) measured or calculated from the ACEA_AR assay.
It is designed to make measurements of real-time cell-growth kinetics, a form of growth reporter, as detected
with electrical impedance signals by Real-Time Cell Electrode Sensor (RT-CES) technology. Data from the assay
component ACEA_AR_antagonist_80hr was analyzed in the positive analysis fitting direction relative to DMSO
as the negative control and baseline of activity. Using a type of growth reporter, measures of the cells for loss-
of-signal activity can be used to understand the signaling at the pathway-level as they relate to the gene AR.
Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has produced
multiple assay endpoints where this one serves a signaling function. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the
subfamily is steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: ACEA Biosciences, Inc. (ACEA) is a privately owned biotechnology company that developed a
realtime, label free, cell growth assay system called xCELLigence based on a microelectronic impedance readout.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; xCELLigence RTCA software and biosensor technology are
available from ACEA Biosciences, Inc. and 22Rvl cells are commercially available from American Type Culture
Collection (ATCC HTB-133) with signed Material Transfer Agreement (MTA).
1.9 Assay Throughput: 384-well plate. The assay is conducted on 96-well plates with each plate containing positive
controls for proliferation (testosterone) and cytotoxicity (MG132), negative controls (assay media, RPMI1640),
and two concentrations (0.5 percent and 0.125 percent) of DMSO solvent controls. Following a 24-hour
incubation period, the cells are exposed to test chemicals for 80 hours and response is monitored no less than
once per hour.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Electrical impedance is used to quantify changes to the growth of the cells where increase
impedance is positively correlated with increased cell growth
The ACEA_AR assay exposed human prostate cell (22Rvl) cultures to the ToxCast library of diverse
environmental chemicals using an eight-point, 1:4 dilution series concentration-response format (starting at a
maximum final concentration of lOOuM), using MG132 (cytotoxicity) and testosterone (proliferation) as positive
-------
controls and assay media and DMSO as a negative control and solvent control, respectively. All control chemicals
were tested in quadruplicate on each plate. The ACEA_AR assay analyzed changes in cell adhesion and
morphology at the electrode: solution interface (located on the bottom of culture wells) using electronic
microsensors. Changes in electrical impedance were monitored in real-time at the plate surface to investigate
the potential activation of the androgen signaling pathway and subsequent increases in growth or changes in
cell structure following 80-hour incubation with the test chemicals. The electrical signal produced by the
experimental system can be used to detect changes in cell number, morphology and adhesion which occur in
response to xenoestrogenic activation of AR-mediated pathways, and concentration-response curves were
modeled for each chemical to determine half-maximal activity levels.
2.2 Scientific Principles: Endocrine disrupting chemicals (EDCs) interfere with normal hormone biosynthesis,
signaling or metabolism and impact regulatory pathways in humans and wildlife. Androgens, such as
testosterone, are widely recognized for their importance in sexual development and differentiation but also
play roles in metabolism, growth, development, and behavior and act as an intercellular signal (Bhasin et al.,
2007; Monks & Holmes, 2018; Sumpter, 2005). Agonism of the androgen receptor is listed as a molecular
initiating event in AOP #23, leading to reproductive dysfunction in fish (Villeneuve, 2021).
2.3 Experimental System: adherent 22Rvl cell line used. 22Rvl is a human prostate carcinoma epithelial cell line
derived from a xenograft that was serially propagated in mice.
2.4 Metabolic Competence: The 22Rvl cell line expresses androgen receptor (AR) and prostate-specific antigen
(PSA), both of which are markers of prostate cancer. The presence of these markers in 22Rvl cells confirms their
origin from prostate cancer tissue and highlights their relevance in studying the disease. Importantly, the 22Rvl
cell line is unique in that it expresses both full-length and truncated forms of ARs. This mixed expression pattern
is commonly observed in androgen deprivation resistant prostate cancers, making the 22Rvl cell line a valuable
model for studying the mechanisms underlying resistance to hormonal therapies. Morphologically, 22Rvl cells
exhibit epithelial characteristics and are cultured as adherent monolayers, providing a convenient system for in
vitro experimentation.
2.5 Exposure Regime: The xCELLigence system Multi-E-Plate stations were used to measure the time-dependent
response to chemicals. Each compound was tested in an eight-point, 1:4 serial dilution series starting at a
maximum final concentration of 100 uM. A maximum starting concentration of 0.5% DMSO was present in the
100 uM chemical samples and was diluted along with the test article dilution series. The screen was performed
in biological duplicate using two separate, 96-well, E-Plates 96 for each dilution series (n = 2). Positive controls
(MG132 and testosterone) and a negative control (assay media) were tested in quadruplicate on each testing
plate. Then, 0.5% and 0.125% DMSO were tested in duplicates in each plate to serve as solvent controls for the
2 highest concentrations of testing compounds: 100 uM and 25 uM. Reference compounds were tested with 8
concentrations with 1:5 serial dilutions. All screening was carried out by ACEA Biosciences, Inc. (San Diego, CA).
22Rvl cells purchased from ATCC were maintained in media supplemented with 10% fetal bovine serum (FBS).
Before screening, 22Rvl cells were preconditioned in assay medium. Cells were then detached and seeded in
E-Plates 96 in assay medium. After overnight monitoring of growth once every hour, compounds were added to
T-47D cells and remained in the medium until the end of the experiment. Cellular responses were then recorded
once every 5 min for the first 5 h, and once every hour for an additional 100 h.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
0.144 uM
Key positive control:
NA
Target (nominal) number of replicates:
3
Standard maximum concentration tested:
105 uM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.122
Response cutoff threshold used to determine hit calls: 0.366
-------
Detection technology used: RT-CES (Label Free Technology)
2.6 Response: Increased cell proliferation in response to xenoestrogenic interference with AR-mediated pathways
as measured by monitoring electrical impedance at the cell-plate interface. One possible effect of endocrine
disrupting chemicals is increased cell growth through perturbation of pathways linked to cell cycle regulation.
Activation of the androgen receptor (AR) signaling pathway, for example, is one possible mechanism that
underlies cell proliferation in hormonally sensitive tissues such as mammary and endometrial tissue. This assay
was designed to identify those chemicals with the potential to affect cell growth by activating the androgen
receptor-mediated cell proliferation pathway. The assay uses electronic microsensors located at the bottom of
the cell culture well to detect changes in cell number, morphology, and adhesion through electrical impedance
measurement at the electrode-solution interface following 80-hour incubation with test chemicals.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Data were collected from the xCELLigence system which converts raw impedance values into the
Cell Index (CI) value; this is a measure of adhesion where CI = (impedance at time point n - impedance in the
absence of cells)/nominal impedance value. These data were then converted to a Normalized Cell Index
according to the equation NCI(Ti) = CI(Ti)/CI(Tk), where i = 1,2,3,....N where CI(Tk) is the last time point before
chemical addition, CI(Ti) is the cell index at the i-th measured time point, and N is the total number of time
points. Data were grouped by chemical and smoothed to combine replicates using a simple moving average (as
the replicates were assessed in duplicate on separate plates so the time points were not identical). DMSO
controls were considered as baseline for activity, and testosterone was used as a positive control and 100
percent activity for all the test chemicals on that plate. For cell loss, the NCI value at the time of compound
administration was considered to represent complete (100%) viability. MG132 (2 uM), a proteasome inhibitor
and known cytotoxic agent, was used as the positive control for cell loss and was tested in quadruplicate on
each plate. The minimum average response on each plate was used as a positive control for cell loss for all the
test chemicals on the corresponding plate. If a chemical sample was run on two different plates, then the
minimum NCI values for MG132 were averaged. If an NCI value for MG132 fell below zero, the response was
considered to be below the limit of detection and was replaced with the minimum value greater than zero across
all plates. All smoothened NCI values were then converted to a percentage of positive control, which was
considered to represent no (0%) viability. Concentration response curves were generated using smoothed NCI
-------
values and all statistical analyses were conducted using R programming language, employing tcpl package to
generate model parameters and confidence intervals.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
7: resp.log2 (Transform the response values to log-scale (base 2).), 9: resp.fc (Calculate the normalized
response (resp) as the fold change, i.e. the ratio of the corrected (cval) and baseline (bval) values; resp =
cval/bal.), 17: bval.apid.nwllslowconc.med (Calculate the baseline value (bval) as the plate-wise median,
by assay plate ID (apid), of the corrected values (cval) of test compound wells (wilt = t) with a
concentration index (cndx) of 1 or 2 or neutral control wells (wilt = n).)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
1: bmad3 (Add a cutoff value of 3 multiplied by the baseline median absolute deviation (bmad) as defined
at Level 4.), 27: ow_bidirectional_loss (Multiply winning model hitcall (hitc) by -1 for models fit in the
positive analysis direction. Typically used for endpoints where only negative responses are biologically
relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
-------
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1850 Number of chemicals tested: 1835
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 6.926
Neutral control median absolute deviation, by plate: nmad 0.984
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 14.38%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed 4.202
Positive control well median absolute deviation, by plate: pmad 0.198
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: -2.524
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed 1
Negative control well median absolute deviation value, by plate: mmad 0
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: -5.936
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
-------
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 205.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Judson R, Houck K, Paul Friedman K, Brown J, Browne P, Johnston PA, Close DA, Mansouri K,
Kleinstreuer N. Selecting a minimal set of androgen receptor assays for screening chemicals. Regul Toxicol
Pharmacol. 2020 Nov; 117:104764. doi: 10.1016/j.yrtph.2020.104764. Epub 2020 Aug 14. PMID: 32798611;
PMCID: PMC8356084.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
-------
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1857
AC E A_AR_a ntago n i st_AU C_vi a b i I ity
1. General Information
1.1 Assay Title: ACEA Biosciences xCELLigence Real-Time Cell Analysis on Androgen Receptor Antagonism for
Viability
1.2 Assay Summary: ACEA_AR_antagonist is a cell-based, single-readout assay that uses 22Rvl, a human prostate
cancer cell line, with measurements taken at 80 hours after chemical dosing in a 384-well plate, although T05
and T06 (mcO.srcf) used a 96-well plate. Differences in plate size can be ignored given data normalization.
ACEA_AR_80hr is one of two assay component(s) measured or calculated from the ACEA_ER assay. It is designed
to make measurements of real-time cell-growth kinetics, a form of growth reporter, as detected with electrical
impedance signals by Real-Time Cell Electrode Sensor (RT-CES) technology. Data from the assay component
ACEA_AR_antagonist_AUC_viability was analyzed in the positive analysis fitting direction relative to DMSO as
the negative control and baseline of activity. Using a type of growth reporter, loss-of-signal activity can be used
to understand changes in the viability. Furthermore, this assay endpoint can be referred to as a secondary
readout, because this assay has produced multiple assay endpoints where this one serves a viability function.
To generalize the intended target to other relatable targets, this assay endpoint is annotated to the cell cycle
intended target family, where the subfamily is cytotoxicity.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: ACEA Biosciences, Inc. (ACEA) is a privately owned biotechnology company that developed a
realtime, label free, cell growth assay system called xCELLigence based on a microelectronic impedance readout.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; xCELLigence RTCA software and biosensor technology are
available from ACEA Biosciences, Inc. and 22Rvl cells are commercially available from American Type Culture
Collection (ATCC HTB-133) with signed Material Transfer Agreement (MTA).
1.9 Assay Throughput: 384-well plate. The assay is conducted on 96-well plates with each plate containing positive
controls for proliferation (testosterone) and cytotoxicity (MG132), negative controls (assay media, RPMI1640),
and two concentrations (0.5 percent and 0.125 percent) of DMSO solvent controls. Following a 24-hour
incubation period, the cells are exposed to test chemicals for 80 hours and response is monitored no less than
once per hour.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Electrical impedance is used to quantify changes to the growth of the cells where increase
impedance is positively correlated with increased cell growth
The ACEA_AR assay exposed human prostate cell (22Rvl) cultures to the ToxCast library of diverse
environmental chemicals using an eight-point, 1:4 dilution series concentration-response format (starting at a
maximum final concentration of lOOuM), using MG132 (cytotoxicity) and testosterone (proliferation) as positive
controls and assay media and DMSO as a negative control and solvent control, respectively. All control chemicals
-------
were tested in quadruplicate on each plate. The ACEA_AR assay analyzed changes in cell adhesion and
morphology at the electrode: solution interface (located on the bottom of culture wells) using electronic
microsensors. Changes in electrical impedance were monitored in real-time at the plate surface to investigate
the potential activation of the androgen signaling pathway and subsequent increases in growth or changes in
cell structure following 80-hour incubation with the test chemicals. The electrical signal produced by the
experimental system can be used to detect changes in cell number, morphology and adhesion which occur in
response to xenoestrogenic activation of AR-mediated pathways, and concentration-response curves were
modeled for each chemical to determine half-maximal activity levels.
2.2 Scientific Principles: Endocrine disrupting chemicals (EDCs) interfere with normal hormone biosynthesis,
signaling or metabolism and impact regulatory pathways in humans and wildlife. Androgens, such as
testosterone, are widely recognized for their importance in sexual development and differentiation but also
play roles in metabolism, growth, development, and behavior and act as an intercellular signal (Bhasin et al.,
2007; Monks & Holmes, 2018; Sumpter, 2005). Agonism of the androgen receptor is listed as a molecular
initiating event in AOP #23, leading to reproductive dysfunction in fish (Villeneuve, 2021).
2.3 Experimental System: adherent 22Rvl cell line used. 22Rvl is a human prostate carcinoma epithelial cell line
derived from a xenograft that was serially propagated in mice.
2.4 Metabolic Competence: The 22Rvl cell line expresses androgen receptor (AR) and prostate-specific antigen
(PSA), both of which are markers of prostate cancer. The presence of these markers in 22Rvl cells confirms their
origin from prostate cancer tissue and highlights their relevance in studying the disease. Importantly, the 22Rvl
cell line is unique in that it expresses both full-length and truncated forms of ARs. This mixed expression pattern
is commonly observed in androgen deprivation resistant prostate cancers, making the 22Rvl cell line a valuable
model for studying the mechanisms underlying resistance to hormonal therapies. Morphologically, 22Rvl cells
exhibit epithelial characteristics and are cultured as adherent monolayers, providing a convenient system for in
vitro experimentation.
2.5 Exposure Regime: The xCELLigence system Multi-E-Plate stations were used to measure the time-dependent
response to chemicals. Each compound was tested in an eight-point, 1:4 serial dilution series starting at a
maximum final concentration of 100 uM. A maximum starting concentration of 0.5% DMSO was present in the
100 uM chemical samples and was diluted along with the test article dilution series. The screen was performed
in biological duplicate using two separate, 96-well, E-Plates 96 for each dilution series (n = 2). Positive controls
(MG132 and testosterone) and a negative control (assay media) were tested in quadruplicate on each testing
plate. Then, 0.5% and 0.125% DMSO were tested in duplicates in each plate to serve as solvent controls for the
2 highest concentrations of testing compounds: 100 uM and 25 uM. Reference compounds were tested with 8
concentrations with 1:5 serial dilutions. All screening was carried out by ACEA Biosciences, Inc. (San Diego, CA).
22Rvl cells purchased from ATCC were maintained in media supplemented with 10% fetal bovine serum (FBS).
Before screening, 22Rvl cells were preconditioned in assay medium. Cells were then detached and seeded in
E-Plates 96 in assay medium. After overnight monitoring of growth once every hour, compounds were added to
T-47D cells and remained in the medium until the end of the experiment. Cellular responses were then recorded
once every 5 min for the first 5 h, and once every hour for an additional 100 h.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
0.144 uM
Key positive control:
MG 132
Target (nominal) number of replicates:
3
Standard maximum concentration tested:
105 uM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 9.65
Response cutoff threshold used to determine hit calls: 28.951
Detection technology used: RT-CES (Label Free Technology)
-------
2.6 Response: Increased cell proliferation in response to xenoestrogenic interference with AR-mediated pathways
as measured by monitoring electrical impedance at the cell-plate interface. One possible effect of endocrine
disrupting chemicals is increased cell growth through perturbation of pathways linked to cell cycle regulation.
Activation of the androgen receptor (AR) signaling pathway, for example, is one possible mechanism that
underlies cell proliferation in hormonally sensitive tissues such as mammary and endometrial tissue. This assay
was designed to identify those chemicals with the potential to affect cell growth by activating the androgen
receptor-mediated cell proliferation pathway. The assay uses electronic microsensors located at the bottom of
the cell culture well to detect changes in cell number, morphology, and adhesion through electrical impedance
measurement at the electrode-solution interface following 80-hour incubation with test chemicals.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Cytotoxicity Burst: Assays used to defne the cytotoxicity burst region
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Data were collected from the xCELLigence system which converts raw impedance values into the
Cell Index (CI) value; this is a measure of adhesion where CI = (impedance at time point n - impedance in the
absence of cells)/nominal impedance value. These data were then converted to a Normalized Cell Index
according to the equation NCI(Ti) = CI(Ti)/CI(Tk), where i = 1,2,3,....N where CI(Tk) is the last time point before
chemical addition, CI(Ti) is the cell index at the i-th measured time point, and N is the total number of time
points. Data were grouped by chemical and smoothed to combine replicates using a simple moving average (as
the replicates were assessed in duplicate on separate plates so the time points were not identical). DMSO
controls were considered as baseline for activity, and testosterone was used as a positive control and 100
percent activity for all the test chemicals on that plate. For cell loss, the NCI value at the time of compound
administration was considered to represent complete (100%) viability. MG132 (2 uM), a proteasome inhibitor
and known cytotoxic agent, was used as the positive control for cell loss and was tested in quadruplicate on
each plate. The minimum average response on each plate was used as a positive control for cell loss for all the
test chemicals on the corresponding plate. If a chemical sample was run on two different plates, then the
minimum NCI values for MG132 were averaged. If an NCI value for MG132 fell below zero, the response was
considered to be below the limit of detection and was replaced with the minimum value greater than zero across
all plates. All smoothened NCI values were then converted to a percentage of positive control, which was
considered to represent no (0%) viability. Concentration response curves were generated using smoothed NCI
-------
values and all statistical analyses were conducted using R programming language, employing tcpl package to
generate model parameters and confidence intervals.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
5: resp.pc (Calculate the normalized response (resp) as a percent of control, i.e. the ratio of the difference
between the corrected (cval) and baseline (bval) values divided the difference between the positive
control (pval) and baseline (bval) values multiplied by 100; resp = (cval-bval)/(pval-bval)*100.), 6:
resp.multnegl (Multiply the normalized response value (resp) by -1; -l*resp.), 15:
pval.apid.medncbyconc.min (Calculate the positive control value (pval) as the plate-wise minimum, by
assay plate ID (apid), of the medians of the corrected values (cval) for gain-of-signal single- or multiple-
concentration negative control wells (wilt = m or o) by apid, well type, and concentration.), 17:
bval.apid.nwllslowconc.med (Calculate the baseline value (bval) as the plate-wise median, by assay plate
ID (apid), of the corrected values (cval) of test compound wells (wilt = t) with a concentration index (cndx)
of 1 or 2 or neutral control wells (wilt = n).)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
1: bmad3 (Add a cutoff value of 3 multiplied by the baseline median absolute deviation (bmad) as defined
at Level 4.), 2: pc20 (Add a cutoff value of 20. Typically for percent of control data.), 27:
ow_bidirectional_loss (Multiply winning model hitcall (hitc) by -1 for models fit in the positive analysis
direction. Typically used for endpoints where only negative responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
-------
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 19: viability.gnls (Flag series with
an active hit call (hitc >= 0.9) if denoted as cell viability assay with winning model is gain-loss (gnls); if hitc
>= 0.9, modl=="gnls" and cell_viability_assay == 1, then flag.), 20: no.med.gt.3bmad (Flag series where
no median response values are greater than baseline as defined by 3 times the baseline median absolute
deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg is the
number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1850 Number of chemicals tested: 1835
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 216.845
Neutral control median absolute deviation, by plate: nmad 21.373
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 8.91%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed 170.663
Positive control well median absolute deviation, by plate: pmad 6.345
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: -2.199
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed
72.977
-------
Negative control well median absolute deviation value, by plate: mmad 2.556
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: -7.083
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 199.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6.
Bibliography: Judson R, Houck K, Paul Friedman K, Brown J, Browne P, Johnston PA, Close DA, Mansouri K,
Kleinstreuer N. Selecting a minimal set of androgen receptor assays for screening chemicals. Regul Toxicol
-------
Pharmacol. 2020 Nov; 117:104764. doi: 10.1016/j.yrtph.2020.104764. Epub 2020 Aug 14. PMID: 32798611;
PMCID: PMC8356084.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:4
APR_HepG2_Cel ICycleArrest_lh r
1. General Information
1.1 Assay Title: Aprendica 1-hour HepG2 Human Liver Cell Assay for Cell Cyle Arrest
1.2 Assay Summary: APR_HepG2_lhr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 1 hour after chemical dosing in a 384-well plate.
APR_HepG2_CellCycleArrest_lhr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_lhr assay. It is designed to make measurements of cell phenotype, a form of morphology reporter,
as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the assay
component APR_HepG2_CellCycleArrest_lhr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_CellCycleArrest_lhr, was analyzed with bidirectional fitting relative to DMSO as the negative
control and baseline of activity. Using a type of morphology reporter, measures of all nuclear dna for gain or
loss-of-signal activity can be used to understand the signaling at the pathway-level as they relate to the gene .
Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has produced
multiple assay endpoints where this one serves a signaling function. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the cell cycle intended target family, where the subfamily
is arrest.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Hoechst-33342 dye is used as a stain for DNA to understand morphology in the system.
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
-------
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano, 2006), which applies automated
image analysis techniques to capture multiple cytological features using fluorescent labels, to measure the
concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully metabolically
capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated capacity to
predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien et al. 2006;
Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-state
trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test period.
The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
-------
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.072
Response cutoff threshold used to determine hit calls: 0.725
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity level of Hoechst-33342 stained DNA indicates cell phenotypes which can be
used to identify cell cycle arrest.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.391 nM
Key positive control:
Camptothecin;Anisomycin
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
200 nM
Neutral vehicle control:
DMSO
NA
-------
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
-------
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 320 Number of chemicals tested: 310
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
-------
{(mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 35.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
-------
7. Supporting Information:
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:6
APR_HepG2_CellLoss_lhr
1. General Information
1.1 Assay Title: Aprendica 1-hour HepG2 Human Livery Cell Assay for Cell Loss
1.2 Assay Summary: APR_HepG2_lhr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 1 hour after chemical dosing in a 384-well plate. APR_HepG2_CellLoss_lhr is
one of 10 assay component(s) measured or calculated from the APR_HepG2_lhr assay. It is designed to make
measurements of cell number, a form of viability reporter, as detected with fluorescence intensity signals by
HCS Fluorescent Imaging technology. Data from the assay component APR_HepG2_CellLoss_lhr was analyzed
into 1 assay endpoint. This assay endpoint, APR_HepG2_CellLoss_lhr, was analyzed with bidirectional fitting
relative to DMSO as the negative control and baseline of activity. Using a type of viability reporter, measures of
all nuclear dna for gain or loss-of-signal activity can be used to understand the viability at the cellular-level.
Furthermore, this assay endpoint can be referred to as a secondary readout, because this assay has produced
multiple assay endpoints where this one serves a viability function. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the cell cycle intended target family, where the subfamily
is proliferation.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Hoechst-33342 dye is used as a stain for DNA to understand viability in the system.
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2
Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
-------
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
-------
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations: Target (nominal) number of replicates:
10 2
Standard minimum concentration tested: Standard maximum concentration tested:
0.391 nM 200 nM
Key positive control: Neutral vehicle control:
Camptothecin;Anisomycin;Paclitaxel;CCCP DMSO
Baseline median absolute deviation for the assay (bmad): 0.056
Response cutoff threshold used to determine hit calls: 0.557
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity level of Hoechst-33342 stained DNA indicates cell phenotypes which can be
used to identify cell cycle arrest.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
-------
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
-------
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 320 Number of chemicals tested: 310
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inaclive hit count: 0
-------
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
-------
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 16.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
-------
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:8
APR_HepG2_MicrotubuleCSK_lhr
1. General Information
1.1 Assay Title: Aprendica 1-hour HepG2 Human Liver Cell Assay for Microtubule CSK Stabilty
1.2 Assay Summary: APR_HepG2_lhr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 1 hour after chemical dosing in a 384-well plate.
APR_HepG2_MicrotubuleCSK_lhr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_lhr assay. It is designed to make measurements of protein conformation, a form of conformation
reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the
assay component APR_HepG2_MicrotubuleCSK_lhr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_MicrotubuleCSK_lhr, was analyzed with bidirectional fitting relative to DMSO as the negative
control and baseline of activity. Using a type of conformation reporter, measures of protein for gain or loss-of-
signal activity can be used to understand the signaling at the cellular-level. Furthermore, this assay endpoint
can be referred to as a primary readout, because this assay has produced multiple assay endpoints where this
one serves a signaling function. To generalize the intended target to other relatable targets, this assay endpoint
is annotated to the cell morphology intended target family, where the subfamily is cell conformation.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: anti-a-tubulin antibody is used to tag and quantify the level of tubulin, alpha la protein. Changes
in the signals are indicative of protein expression changes as a cellular response to stress in the system
[GeneSymbokTUBAlA | GenelD:7846 | Uniprot_SwissProt_Accession:Q71U36],
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
-------
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
-------
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations: Target (nominal) number of replicates:
10 1
Standard minimum concentration tested: Standard maximum concentration tested:
0.391 nM 200 nM
Key positive control: Neutral vehicle control:
Camptothecin;Anisomycin;Paclitaxel;CCCP DMSO
Baseline median absolute deviation for the assay (bmad): 0.124
Response cutoff threshold used to determine hit calls: 1.236
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity level of Hoechst-33342 stained DNA indicates cell phenotypes which can be
used to identify cell cycle arrest.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of cell morphology.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
-------
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
-------
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 320 Number of chemicals tested: 310
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
-------
{(mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 19.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
-------
7. Supporting Information:
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 10
APR_HepG2_MitoMass_lhr
1. General Information
1.1 Assay Title: Aprendica 1-hour HepG2 Human Liver Cell Assay for Mitochondrial Mass
1.2 Assay Summary: APR_HepG2_lhr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 1 hour after chemical dosing in a 384-well plate. APR_HepG2_MitoMass_lhr
is one of 10 assay component(s) measured or calculated from the APR_HepG2_lhr assay. It is designed to make
measurements of cell phenotype, a form of morphology reporter, as detected with fluorescence intensity signals
by HCS Fluorescent Imaging technology. Data from the assay component APR_HepG2_MitoMass_lhr was
analyzed into 1 assay endpoint. This assay endpoint, APR_HepG2_MitoMass_lhr, was analyzed with
bidirectional fitting relative to DMSOas the negative control and baseline of activity. Using a type of morphology
reporter, gain or loss-of-signal activity can be used to understand changes in the signaling. Furthermore, this
assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a signaling function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the cell morphology intended target family, where the subfamily is
organelle conformation.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: MitoTracker Red is used as a stain for the morphology of the mitochondria.
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
-------
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
-------
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations: Target (nominal) number of replicates:
10 1
Standard minimum concentration tested: Standard maximum concentration tested:
0.391 nM 200 nM
Key positive control: Neutral vehicle control:
Camptothecin;Anisomycin;Paclitaxel;CCCP DMSO
Baseline median absolute deviation for the assay (bmad): 0.036
Response cutoff threshold used to determine hit calls: 0.356
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity level of Hoechst-33342 stained DNA is used to quantify cell number to report
viability as a marker for cell death.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of cell morphology.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
-------
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
-------
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 320 Number of chemicals tested: 310
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inaclive hit count: 0
-------
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
-------
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 24.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
-------
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 12
APR_HepG2_MitoMembPot_lhr
1. General Information
1.1 Assay Title: Aprendica 1-hour HepG2 Human Liver Cell Assay for Mitochondrial Membrane Potential
1.2 Assay Summary: APR_HepG2_lhr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 1 hour after chemical dosing in a 384-well plate.
APR_HepG2_MitoMembPot_lhr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_lhr assay. It is designed to make measurements of dye binding, a form of membrane potential
reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the
assay component APR_HepG2_MitoMembPot_lhr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_MitoMembPot_lhr, was analyzed with bidirectional fitting relative to DMSO as the negative
control and baseline of activity. Using a type of membrane potential reporter, gain or loss-of-signal activity can
be used to understand changes in the signaling. Furthermore, this assay endpoint can be referred to as a primary
readout, because this assay has produced multiple assay endpoints where this one serves a signaling function.
To generalize the intended target to other relatable targets, this assay endpoint is annotated to the cell
morphology intended target family, where the subfamily is organelle conformation.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: MitoTracker Red is used as a stain for the membrane potential of the mitochondria.
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
-------
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
-------
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations: Target (nominal) number of replicates:
10 1
Standard minimum concentration tested: Standard maximum concentration tested:
0.391 nM 200 nM
Key positive control: Neutral vehicle control:
Camptothecin;Anisomycin;Paclitaxel;CCCP DMSO
Baseline median absolute deviation for the assay (bmad): 0.179
Response cutoff threshold used to determine hit calls: 1.787
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity level of Hoechst-33342 stained DNA is used to quantify cell number to report
viability as a marker for cell death.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of cell morphology.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
-------
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
-------
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 320 Number of chemicals tested: 310
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inaclive hit count: 0
-------
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
-------
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 35.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
-------
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 14
APR_HepG2_MitoticArrest_lhr
1. General Information
1.1 Assay Title: Aprendica 1-hour HepG2 Human Liver Cell Assay for Mitotic Arrest
1.2 Assay Summary: APR_HepG2_lhr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 1 hour after chemical dosing in a 384-well plate.
APR_HepG2_MitoticArrest_lhr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_lhr assay. It is designed to make measurements of cell phenotype, a form of morphology reporter,
as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the assay
component APR_HepG2_MitoticArrest_lhr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_MitoticArrest_lhr, was analyzed with bidirectional fitting relative to DMSO as the negative control
and baseline of activity. Using a type of morphology reporter, measures of protein for gain or loss-of-signal
activity can be used to understand the signaling at the pathway-level as they relate to the gene . Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a signaling function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the cell cycle intended target family, where the subfamily is arrest.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: anti-phospho-histone-H3 antibody is used to tag and quantify the level of phosphorylated H3
histone, family 3A protein. Changes in the signals are indicative of protein expression changes as a cellular
response to stress in the system [GeneSymbol:H3F3A | GenelD:3020 | Uniprot_SwissProt_Accession:P84243],
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
-------
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
-------
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations: Target (nominal) number of replicates:
10 1
Standard minimum concentration tested: Standard maximum concentration tested:
0.391 nM 200 nM
Key positive control: Neutral vehicle control:
Camptothecin;Anisomycin;Paclitaxel;CCCP DMSO
Baseline median absolute deviation for the assay (bmad): 0.118
Response cutoff threshold used to determine hit calls: 1.178
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity level of Hoechst-33342 stained DNA is used to quantify cell number to report
viability as a marker for cell death.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
-------
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
-------
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 320 Number of chemicals tested: 310
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
-------
{(mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 31.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
-------
7. Supporting Information:
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 16
APR_HepG2_NuclearSize_lhr
1. General Information
1.1 Assay Title: Aprendica 1-hour HepG2 Human Liver Cell Assay for Nuclear Size
1.2 Assay Summary: APR_HepG2_lhr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 1 hour after chemical dosing in a 384-well plate. APR_HepG2_NuclearSize_lhr
is one of 10 assay component(s) measured or calculated from the APR_HepG2_lhr assay. It is designed to make
measurements of cell phenotype, a form of morphology reporter, as detected with fluorescence intensity signals
by HCS Fluorescent Imaging technology. Data from the assay component APR_HepG2_NuclearSize_lhr was
analyzed into 1 assay endpoint. This assay endpoint, APR_HepG2_NuclearSize_lhr, was analyzed with
bidirectional fitting relative to DMSOas the negative control and baseline of activity. Using a type of morphology
reporter, measures of all nuclear dna for gain or loss-of-signal activity can be used to understand the signaling
at the nuclear-level. Furthermore, this assay endpoint can be referred to as a primary readout, because this
assay has produced multiple assay endpoints where this one serves a signaling function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the cell morphology intended
target family, where the subfamily is organelle conformation.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Hoechst-33342 dye is used as a stain for DNA to understand morphology in the system.
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
-------
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
-------
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations: Target (nominal) number of replicates:
10 1
Standard minimum concentration tested: Standard maximum concentration tested:
0.391 nM 200 nM
Key positive control: Neutral vehicle control:
Camptothecin;Anisomycin;Paclitaxel;CCCP DMSO
Baseline median absolute deviation for the assay (bmad): 0.008
Response cutoff threshold used to determine hit calls: 0.263
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Protein stabilization of microtubules is identified through fluorescent intensity of anti-a-tubulin
antibody tagged tubulin, alpha la protein, and is a sign of cellular response to stress.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of cell morphology.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
-------
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
-------
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 320 Number of chemicals tested: 310
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inaclive hit count: 0
-------
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
-------
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 38.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
-------
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 18
APR_HepG2_P-H2AX_lhr
1. General Information
1.1 Assay Title: Aprendica 1-hour HepG2 Human Liver Cell Assay for H2AX Phosphorylation
1.2 Assay Summary: APR_HepG2_lhr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 1 hour after chemical dosing in a 384-well plate. APR_HepG2_P-H2AX_lhr is
one of 10 assay component(s) measured or calculated from the APR_HepG2_lhr assay. It is designed to make
measurements of dna content, a form of viability reporter, as detected with fluorescence intensity signals by
HCS Fluorescent Imaging technology. Data from the assay component APR_HepG2_P-H2AX_lhr was analyzed
into 1 assay endpoint. This assay endpoint, APR_HepG2_P-H2AX_lhr, was analyzed with bidirectional fitting
relative to DMSO as the negative control and baseline of activity. Using a type of viability reporter, measures of
protein for gain or loss-of-signal activity can be used to understand the signaling at the pathway-level as they
relate to the gene . Furthermore, this assay endpoint can be referred to as a primary readout, because this
assay has produced multiple assay endpoints where this one serves a signaling function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the dna binding intended target
family, where the subfamily is cellular response to DNA damage stimulus.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Anti-phospho-histone-H2AX antibody is used to tag and quantify the level of phosphorylated H2A
histone family, member X protein. Changes in the signals are indicative of protein expression changes as a
cellular response to stress in the system [GeneSymbol:H2AFX | GenelD:3014 |
Uniprot_SwissProt_Accession:P16104],
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
-------
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
-------
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations: Target (nominal) number of replicates:
10 1
Standard minimum concentration tested: Standard maximum concentration tested:
0.391 nM 200 nM
Key positive control: Neutral vehicle control:
Camptothecin;Anisomycin DMSO
Baseline median absolute deviation for the assay (bmad): 0.077
Response cutoff threshold used to determine hit calls: 0.772
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Protein stabilization of microtubules is identified through fluorescent intensity of anti-a-tubulin
antibody tagged tubulin, alpha la protein, and is a sign of cellular response to stress.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
-------
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
-------
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 320 Number of chemicals tested: 310
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
-------
Z Prime Factor for median negative and neutral control across all plates:
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
NA
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells:
((mmed - nmed) /sqrt(mmad2 + nmad2)
NA
Signal-to-noise (median across all plates, using negative control wells):
((mmed-nmed)/nmad)
NA
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 28.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
-------
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 20
APR_HepG2_p53Act_lhr
1. General Information
1.1 Assay Title: Aprendica 1-hour HepG2 Human Liver Cell Assay for p53 Activation
1.2 Assay Summary: APR_HepG2_lhr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 1 hour after chemical dosing in a 384-well plate. APR_HepG2_p53Act_lhr is
one of 10 assay component(s) measured or calculated from the APR_HepG2_lhr assay. It is designed to make
measurements of dna content, a form of viability reporter, as detected with fluorescence intensity signals by
HCS Fluorescent Imaging technology. Data from the assay component APR_HepG2_p53Act_lhr was analyzed
into 1 assay endpoint. This assay endpoint, APR_HepG2_p53Act_lhr, was analyzed with bidirectional fitting
relative to DMSO as the negative control and baseline of activity. Using a type of viability reporter, measures of
protein for gain or loss-of-signal activity can be used to understand the signaling at the pathway-level as they
relate to the gene TP53. Furthermore, this assay endpoint can be referred to as a primary readout, because this
assay has produced multiple assay endpoints where this one serves a signaling function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the dna binding intended target
family, where the subfamily is tumor suppressor.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: anti-p53 antibody is used to tag and quantify the level of tumor protein p53 protein. Changes in
the signals are indicative of protein expression changes as a cellular response to stress in the system
[GeneSymbol:TP53 | GenelD:7157 | Uniprot_SwissProt_Accession:P04637],
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
-------
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
-------
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.084
Response cutoff threshold used to determine hit calls: 0.843
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Protein stabilization of microtubules is identified through fluorescent intensity of anti-a-tubulin
antibody tagged tubulin, alpha la protein, and is a sign of cellular response to stress.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.391 nM
Key positive control:
Camptothecin;Anisomycin
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
200 nM
Neutral vehicle control:
DMSO
NA
-------
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
-------
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 320 Number of chemicals tested: 310
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
-------
{(mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 21.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
-------
7. Supporting Information:
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 22
APR_HepG2_StressKinase_lhr
1. General Information
1.1 Assay Title: Aprendica 1-hour HepG2 Human Liver Cell Assay for Stress Kinase
1.2 Assay Summary: APR_HepG2_lhr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 1 hour after chemical dosing in a 384-well plate.
APR_HepG2_StressKinase_lhr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_lhr assay. It is designed to make measurements of enzyme activity, a form of enzyme reporter, as
detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the assay
component APR_HepG2_StressKinase_lhr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_StressKinase_lhr, was analyzed with bidirectional fitting relative to DMSO as the negative control
and baseline of activity. Using a type of enzyme reporter, measures of protein for gain or loss-of-signal activity
can be used to understand the signaling at the pathway-level as they relate to the gene . Furthermore, this
assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a signaling function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the cell cycle intended target family, where the subfamily is stress
response.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: anti-phospho-c-jun antibody is used to tag and quantify the level of phosphorylated jun proto-
oncogene protein. Changes in the signals are indicative of protein expression changes as a cellular response to
stress in the system [GeneSymbokJUN | GenelD:3725 | Uniprot_SwissProt_Accession:P05412],
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
-------
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
-------
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations: Target (nominal) number of replicates:
10 1
Standard minimum concentration tested: Standard maximum concentration tested:
0.391 nM 200 nM
Key positive control: Neutral vehicle control:
Camptothecin;Anisomycin DMSO
Baseline median absolute deviation for the assay (bmad): 0.079
Response cutoff threshold used to determine hit calls: 0.791
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of MitoTracker Red stained mitochondria is an indicator of mitochondrial
morphology and cell cycle staging.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
-------
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
-------
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 320 Number of chemicals tested: 310
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
-------
Z Prime Factor for median negative and neutral control across all plates:
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
NA
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells:
((mmed - nmed) /sqrt(mmad2 + nmad2)
NA
Signal-to-noise (median across all plates, using negative control wells):
((mmed-nmed)/nmad)
NA
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 30.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
-------
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:24
APR_HepG2_CellCycleArrest_24hr
1. General Information
1.1 Assay Title: Aprendica 24-hour HepG2 Human Liver Cell Assay for Cell Cyle Arrest
1.2 Assay Summary: APR HepG2 24hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 24 hours after chemical dosing in a 384-well plate.
APR_HepG2_CellCycleArrest_24hr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_24hr assay. It is designed to make measurements of cell phenotype, a form of morphology
reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the
assay component APR_HepG2_CellCycleArrest_24hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_CellCycleArrest_24hr, was analyzed with bidirectional fitting relative to DMSO as the negative
control and baseline of activity. Using a type of morphology reporter, measures of all nuclear dna for gain or
loss-of-signal activity can be used to understand the signaling at the pathway-level as they relate to the gene .
Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has produced
multiple assay endpoints where this one serves a signaling function. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the cell cycle intended target family, where the subfamily
is arrest.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Hoechst-33342 dye is used as a stain for DNA to understand morphology in the system.
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
-------
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
-------
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.07
Response cutoff threshold used to determine hit calls: 0.705
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of MitoTracker Red stained mitochondria is an indicator of mitochondrial
morphology and cell cycle staging.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.58 nM
Key positive control:
Camptothecin;Anisomycin
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
297 nM
Neutral vehicle control:
DMSO
NA
-------
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
-------
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
-------
{(mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 158.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
-------
7. Supporting Information:
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 26
APR_HepG2_Cel I Loss_24hr
1. General Information
1.1 Assay Title: Aprendica 24-hour HepG2 Human Livery Cell Assay for Cell Loss
1.2 Assay Summary: APR HepG2 24hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 24 hours after chemical dosing in a 384-well plate.
APR_HepG2_CellLoss_24hr is one of 10 assay component(s) measured or calculated from the APR_HepG2_24hr
assay. It is designed to make measurements of cell number, a form of viability reporter, as detected with
fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the assay component
APR_HepG2_CellLoss_24hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_CellLoss_24hr, was analyzed with bidirectional fitting relative to DMSO as the negative control and
baseline of activity. Using a type of viability reporter, measures of all nuclear dna for gain or loss-of-signal
activity can be used to understand the viability at the cellular-level. Furthermore, this assay endpoint can be
referred to as a secondary readout, because this assay has produced multiple assay endpoints where this one
serves a viability function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the cell cycle intended target family, where the subfamily is proliferation.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Hoechst-33342 dye is used as a stain for DNA to understand viability in the system.
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2
Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
-------
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
-------
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations: Target (nominal) number of replicates:
10 2
Standard minimum concentration tested: Standard maximum concentration tested:
0.58 nM 297 nM
Key positive control: Neutral vehicle control:
Camptothecin;Anisomycin;Paclitaxel;CCCP DMSO
Baseline median absolute deviation for the assay (bmad): 0.066
Response cutoff threshold used to determine hit calls: 0.662
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of MitoTracker Red stained mitochondria is an indicator of mitochondrial
morphology and cell cycle staging.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
-------
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
-------
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inaclive hit count: 0
-------
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
-------
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 121.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
-------
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 28
APR_HepG2_MicrotubuleCSK_24hr
1. General Information
1.1 Assay Title: Aprendica 24-hour HepG2 Human Liver Cell Assay for Microtubule CSK Stabilty
1.2 Assay Summary: APR HepG2 24hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 24 hours after chemical dosing in a 384-well plate.
APR_HepG2_MicrotubuleCSK_24hr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_24hr assay. It is designed to make measurements of protein conformation, a form of conformation
reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the
assay component APR_HepG2_MicrotubuleCSK_24hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_MicrotubuleCSK_24hr, was analyzed with bidirectional fitting relative to DMSO as the negative
control and baseline of activity. Using a type of conformation reporter, measures of protein for gain or loss-of-
signal activity can be used to understand the signaling at the cellular-level. Furthermore, this assay endpoint
can be referred to as a primary readout, because this assay has produced multiple assay endpoints where this
one serves a signaling function. To generalize the intended target to other relatable targets, this assay endpoint
is annotated to the cell morphology intended target family, where the subfamily is cell conformation.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: anti-a-tubulin antibody is used to tag and quantify the level of tubulin, alpha la protein. Changes
in the signals are indicative of protein expression changes as a cellular response to stress in the system
[GeneSymbokTUBAlA | GenelD:7846 | Uniprot_SwissProt_Accession:Q71U36],
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
-------
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
-------
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.082
Response cutoff threshold used to determine hit calls: 0.818
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of MitoTracker Red stained mitochondria is used to as a membrane potential
reporter for mitochondrial depolarization as indicated by the level of dye binding.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of cell morphology.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.58 nM
Key positive control:
Camptothecin;Anisomycin
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
297 nM
Neutral vehicle control:
DMSO
NA
-------
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
-------
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
-------
{(mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 140.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
-------
7. Supporting Information:
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 30
APR_HepG2_MitoMass_24hr
1. General Information
1.1 Assay Title: Aprendica 24-hour HepG2 Human Liver Cell Assay for Mitochondrial Mass
1.2 Assay Summary: APR HepG2 24hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 24 hours after chemical dosing in a 384-well plate.
APR_HepG2_MitoMass_24hr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_24hr assay. It is designed to make measurements of cell phenotype, a form of morphology
reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the
assay component APR_HepG2_MitoMass_24hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_MitoMass_24hr, was analyzed with bidirectional fitting relative to DMSO as the negative control
and baseline of activity. Using a type of morphology reporter, gain or loss-of-signal activity can be used to
understand changes in the signaling. Furthermore, this assay endpoint can be referred to as a primary readout,
because this assay has produced multiple assay endpoints where this one serves a signaling function. To
generalize the intended target to other relatable targets, this assay endpoint is annotated to the cell morphology
intended target family, where the subfamily is organelle conformation.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: MitoTracker Red is used as a stain for the morphology of the mitochondria.
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
-------
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
-------
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.05
Response cutoff threshold used to determine hit calls: 0.498
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of MitoTracker Red stained mitochondria is used to as a membrane potential
reporter for mitochondrial depolarization as indicated by the level of dye binding.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of cell morphology.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.58 nM
Key positive control:
Paclitaxel;CCCP
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
297 nM
Neutral vehicle control:
DMSO
NA
-------
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
-------
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inaclive hit count: 0
-------
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
-------
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 138.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
-------
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:32
APR_HepG2_MitoMembPot_24hr
1. General Information
1.1 Assay Title: Aprendica 24-hour HepG2 Human Liver Cell Assay for Mitochondrial Membrane Potential
1.2 Assay Summary: APR HepG2 24hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 24 hours after chemical dosing in a 384-well plate.
APR_HepG2_MitoMembPot_24hr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_24hr assay. It is designed to make measurements of dye binding, a form of membrane potential
reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the
assay component APR_HepG2_MitoMembPot_24hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_MitoMembPot_24hr, was analyzed with bidirectional fitting relative to DMSO as the negative
control and baseline of activity. Using a type of membrane potential reporter, gain or loss-of-signal activity can
be used to understand changes in the signaling. Furthermore, this assay endpoint can be referred to as a primary
readout, because this assay has produced multiple assay endpoints where this one serves a signaling function.
To generalize the intended target to other relatable targets, this assay endpoint is annotated to the cell
morphology intended target family, where the subfamily is organelle conformation.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: MitoTracker Red is used as a stain for the membrane potential of the mitochondria.
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
-------
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
-------
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.083
Response cutoff threshold used to determine hit calls: 0.831
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of MitoTracker Red stained mitochondria is used to as a membrane potential
reporter for mitochondrial depolarization as indicated by the level of dye binding.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of cell morphology.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.58 nM
Key positive control:
Paclitaxel;CCCP
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
297 nM
Neutral vehicle control:
DMSO
NA
-------
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
-------
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inaclive hit count: 0
-------
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
-------
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 148.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
-------
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 34
APR_HepG2_MitoticArrest_24hr
1. General Information
1.1 Assay Title: Aprendica 24-hour HepG2 Human Liver Cell Assay for Mitotic Arrest
1.2 Assay Summary: APR HepG2 24hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 24 hours after chemical dosing in a 384-well plate.
APR_HepG2_MitoticArrest_24hr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_24hr assay. It is designed to make measurements of cell phenotype, a form of morphology
reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the
assay component APR_HepG2_MitoticArrest_24hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_MitoticArrest_24hr, was analyzed with bidirectional fitting relative to DMSO as the negative
control and baseline of activity. Using a type of morphology reporter, measures of protein for gain or loss-of-
signal activity can be used to understand the signaling at the pathway-level as they relate to the gene .
Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has produced
multiple assay endpoints where this one serves a signaling function. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the cell cycle intended target family, where the subfamily
is arrest.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: anti-phospho-histone-H3 antibody is used to tag and quantify the level of phosphorylated H3
histone, family 3A protein. Changes in the signals are indicative of protein expression changes as a cellular
response to stress in the system [GeneSymbol:H3F3A | GenelD:3020 | Uniprot_SwissProt_Accession:P84243],
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
-------
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
-------
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.11
Response cutoff threshold used to determine hit calls: 1.102
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of anti-phospho-histone-H3 antibody tagged phosphorylated H3 histone is
indicative mitotic arrest due to protein expression changes as a cellular response to stress in the system.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.58 nM
Key positive control:
Paclitaxel;CCCP
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
297 nM
Neutral vehicle control:
DMSO
NA
-------
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
-------
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
-------
Z Prime Factor for median negative and neutral control across all plates:
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
NA
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells:
((mmed - nmed) /sqrt(mmad2 + nmad2)
NA
Signal-to-noise (median across all plates, using negative control wells):
((mmed-nmed)/nmad)
NA
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 128.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
-------
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 36
APR_HepG2_NuclearSize_24hr
1. General Information
1.1 Assay Title: Aprendica 24-hour HepG2 Human Liver Cell Assay for Nuclear Size
1.2 Assay Summary: APR HepG2 24hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 24 hours after chemical dosing in a 384-well plate.
APR_HepG2_NuclearSize_24hr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_24hr assay. It is designed to make measurements of cell phenotype, a form of morphology
reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the
assay component APR_HepG2_NuclearSize_24hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_NuclearSize_24hr, was analyzed with bidirectional fitting relative to DMSO as the negative control
and baseline of activity. Using a type of morphology reporter, measures of all nuclear dna for gain or loss-of-
signal activity can be used to understand the signaling at the nuclear-level. Furthermore, this assay endpoint
can be referred to as a primary readout, because this assay has produced multiple assay endpoints where this
one serves a signaling function. To generalize the intended target to other relatable targets, this assay endpoint
is annotated to the cell morphology intended target family, where the subfamily is organelle conformation.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Hoechst-33342 dye is used as a stain for DNA to understand morphology in the system.
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
-------
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
-------
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.01
Response cutoff threshold used to determine hit calls: 0.263
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of anti-phospho-histone-H3 antibody tagged phosphorylated H3 histone is
indicative mitotic arrest due to protein expression changes as a cellular response to stress in the system.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of cell morphology.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.58 nM
Key positive control:
Paclitaxel;CCCP
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
297 nM
Neutral vehicle control:
DMSO
NA
-------
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
-------
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inaclive hit count: 0
-------
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
-------
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 156.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
-------
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 38
APR_HepG2_P-H2AX_24h r
1. General Information
1.1 Assay Title: Aprendica 24-hour HepG2 Human Liver Cell Assay for H2AX Phosphorylation
1.2 Assay Summary: APR HepG2 24hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 24 hours after chemical dosing in a 384-well plate. APR_HepG2_P-
H2AX_24hr is one of 10 assay component(s) measured or calculated from the APR_HepG2_24hr assay. It is
designed to make measurements of dna content, a form of viability reporter, as detected with fluorescence
intensity signals by HCS Fluorescent Imaging technology. Data from the assay component APR_HepG2_P-
H2AX_24hr was analyzed into 1 assay endpoint. This assay endpoint, APR_HepG2_P-H2AX_24hr, was analyzed
with bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of
viability reporter, measures of protein for gain or loss-of-signal activity can be used to understand the signaling
at the pathway-level as they relate to the gene . Furthermore, this assay endpoint can be referred to as a
primary readout, because this assay has produced multiple assay endpoints where this one serves a signaling
function. To generalize the intended target to other relatable targets, this assay endpoint is annotated to the
dna binding intended target family, where the subfamily is cellular response to DNA damage stimulus.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Anti-phospho-histone-H2AX antibody is used to tag and quantify the level of phosphorylated H2A
histone family, member X protein. Changes in the signals are indicative of protein expression changes as a
cellular response to stress in the system [GeneSymbol:H2AFX | GenelD:3014 |
Uniprot_SwissProt_Accession:P16104],
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
-------
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
-------
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations: Target (nominal) number of replicates:
10 1
Standard minimum concentration tested: Standard maximum concentration tested:
0.58 nM 297 nM
Key positive control: Neutral vehicle control:
Camptothecin;Anisomycin DMSO
Baseline median absolute deviation for the assay (bmad): 0.082
Response cutoff threshold used to determine hit calls: 0.821
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of anti-phospho-histone-H3 antibody tagged phosphorylated H3 histone is
indicative mitotic arrest due to protein expression changes as a cellular response to stress in the system.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
-------
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
-------
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
-------
Z Prime Factor for median negative and neutral control across all plates:
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
NA
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells:
((mmed - nmed) /sqrt(mmad2 + nmad2)
NA
Signal-to-noise (median across all plates, using negative control wells):
((mmed-nmed)/nmad)
NA
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 140.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
-------
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:40
APR_HepG2_p53Act_24h r
1. General Information
1.1 Assay Title: Aprendica 24-hour HepG2 Human Liver Cell Assay for p53 Activation
1.2 Assay Summary: APR HepG2 24hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 24 hours after chemical dosing in a 384-well plate.
APR_HepG2_p53Act_24hr is one of 10 assay component(s) measured or calculated from the APR_HepG2_24hr
assay. It is designed to make measurements of dna content, a form of viability reporter, as detected with
fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the assay component
APR_HepG2_p53Act_24hr was analyzed into 1 assay endpoint. This assay endpoint, APR_HepG2_p53Act_24hr,
was analyzed with bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using
a type of viability reporter, measures of protein for gain or loss-of-signal activity can be used to understand the
signaling at the pathway-level as they relate to the geneTP53. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
signaling function. To generalize the intended target to other relatable targets, this assay endpoint is annotated
to the dna binding intended target family, where the subfamily is tumor suppressor.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: anti-p53 antibody is used to tag and quantify the level of tumor protein p53 protein. Changes in
the signals are indicative of protein expression changes as a cellular response to stress in the system
[GeneSymbol:TP53 | GenelD:7157 | Uniprot_SwissProt_Accession:P04637],
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
-------
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
-------
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.097
Response cutoff threshold used to determine hit calls: 0.972
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of Hoechst-33342 stained DNA is used to measure nuclear size as an identifier
of cell phenotypes to understand the morphology in the system.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.58 nM
Key positive control:
Camptothecin;Anisomycin
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
297 nM
Neutral vehicle control:
DMSO
NA
-------
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
-------
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
-------
{(mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 176.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
-------
7. Supporting Information:
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 42
APR_HepG2_StressKinase_24hr
1. General Information
1.1 Assay Title: Aprendica 24-hour HepG2 Human Liver Cell Assay for Stress Kinase
1.2 Assay Summary: APR HepG2 24hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 24 hours after chemical dosing in a 384-well plate.
APR_HepG2_StressKinase_24hr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_24hr assay. It is designed to make measurements of enzyme activity, a form of enzyme reporter,
as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the assay
component APR_HepG2_StressKinase_24hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_StressKinase_24hr, was analyzed with bidirectional fitting relative to DMSO as the negative control
and baseline of activity. Using a type of enzyme reporter, measures of protein for gain or loss-of-signal activity
can be used to understand the signaling at the pathway-level as they relate to the gene . Furthermore, this
assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a signaling function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the cell cycle intended target family, where the subfamily is stress
response.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: anti-phospho-c-jun antibody is used to tag and quantify the level of phosphorylated jun proto-
oncogene protein. Changes in the signals are indicative of protein expression changes as a cellular response to
stress in the system [GeneSymbokJUN | GenelD:3725 | Uniprot_SwissProt_Accession:P05412],
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
-------
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
-------
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.092
Response cutoff threshold used to determine hit calls: 0.924
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of Hoechst-33342 stained DNA is used to measure nuclear size as an identifier
of cell phenotypes to understand the morphology in the system.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.58 nM
Key positive control:
Camptothecin;Anisomycin
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
297 nM
Neutral vehicle control:
DMSO
NA
-------
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
-------
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
-------
Z Prime Factor for median negative and neutral control across all plates:
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
NA
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells:
((mmed - nmed) /sqrt(mmad2 + nmad2)
NA
Signal-to-noise (median across all plates, using negative control wells):
((mmed-nmed)/nmad)
NA
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 127.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
-------
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 44
APR_HepG2_CellCycleArrest_72hr
1. General Information
1.1 Assay Title: Aprendica 72-hour HepG2 Human Liver Cell Assay for Cell Cyle Arrest
1.2 Assay Summary: APR HepG2 72hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 72 hours after chemical dosing in a 384-well plate.
APR_HepG2_CellCycleArrest_72hr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_72hr assay. It is designed to make measurements of cell phenotype, a form of morphology
reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the
assay component APR_HepG2_CellCycleArrest_72hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_CellCycleArrest_72hr, was analyzed with bidirectional fitting relative to DMSO as the negative
control and baseline of activity. Using a type of morphology reporter, measures of all nuclear dna for gain or
loss-of-signal activity can be used to understand the signaling at the pathway-level as they relate to the gene .
Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has produced
multiple assay endpoints where this one serves a signaling function. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the cell cycle intended target family, where the subfamily
is arrest.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Hoechst-33342 dye is used as a stain for DNA to understand morphology in the system.
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
-------
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
-------
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.093
Response cutoff threshold used to determine hit calls: 0.927
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of Hoechst-33342 stained DNA is used to measure nuclear size as an identifier
of cell phenotypes to understand the morphology in the system.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.58 nM
Key positive control:
Camptothecin;Anisomycin
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
297 nM
Neutral vehicle control:
DMSO
NA
-------
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
-------
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
-------
{(mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 168.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
-------
7. Supporting Information:
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:46
APR_HepG2_CellLoss_72hr
1. General Information
1.1 Assay Title: Aprendica 72-hour HepG2 Human Livery Cell Assay for Cell Loss
1.2 Assay Summary: APR HepG2 72hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 72 hours after chemical dosing in a 384-well plate.
APR_HepG2_CellLoss_72hr is one of 10 assay component(s) measured or calculated from the APR_HepG2_72hr
assay. It is designed to make measurements of cell number, a form of viability reporter, as detected with
fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the assay component
APR_HepG2_CellLoss_72hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_CellLoss_72hr, was analyzed with bidirectional fitting relative to DMSO as the negative control and
baseline of activity. Using a type of viability reporter, measures of all nuclear dna for gain or loss-of-signal
activity can be used to understand the viability at the cellular-level. Furthermore, this assay endpoint can be
referred to as a secondary readout, because this assay has produced multiple assay endpoints where this one
serves a viability function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the cell cycle intended target family, where the subfamily is proliferation.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Hoechst-33342 dye is used as a stain for DNA to understand viability in the system.
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2
Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
-------
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
-------
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations: Target (nominal) number of replicates:
10 2
Standard minimum concentration tested: Standard maximum concentration tested:
0.58 nM 297 nM
Key positive control: Neutral vehicle control:
Camptothecin;Anisomycin;Paclitaxel;CCCP DMSO
Baseline median absolute deviation for the assay (bmad): 0.089
Response cutoff threshold used to determine hit calls: 0.887
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of anti-p53 antibody is used to tag and quantify the level of tumor protein p53
protein via fluorescent intensity. Changes in the signals are indicative of protein expression changes as a cellular
response to stress in the system.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
-------
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
-------
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
-------
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 133.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
-------
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:48
APR_HepG2_MicrotubuleCSK_72hr
1. General Information
1.1 Assay Title: Aprendica 72-hour HepG2 Human Liver Cell Assay for Microtubule CSK Stabilty
1.2 Assay Summary: APR HepG2 72hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 72 hours after chemical dosing in a 384-well plate.
APR_HepG2_MicrotubuleCSK_72hr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_72hr assay. It is designed to make measurements of protein conformation, a form of conformation
reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the
assay component APR_HepG2_MicrotubuleCSK_72hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_MicrotubuleCSK_72hr, was analyzed with bidirectional fitting relative to DMSO as the negative
control and baseline of activity. Using a type of conformation reporter, measures of protein for gain or loss-of-
signal activity can be used to understand the signaling at the cellular-level. Furthermore, this assay endpoint
can be referred to as a primary readout, because this assay has produced multiple assay endpoints where this
one serves a signaling function. To generalize the intended target to other relatable targets, this assay endpoint
is annotated to the cell morphology intended target family, where the subfamily is cell conformation.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: anti-a-tubulin antibody is used to tag and quantify the level of tubulin, alpha la protein. Changes
in the signals are indicative of protein expression changes as a cellular response to stress in the system
[GeneSymbokTUBAlA | GenelD:7846 | Uniprot_SwissProt_Accession:Q71U36],
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
-------
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
-------
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.104
Response cutoff threshold used to determine hit calls: 1.038
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of anti-p53 antibody is used to tag and quantify the level of tumor protein p53
protein via fluorescent intensity. Changes in the signals are indicative of protein expression changes as a cellular
response to stress in the system.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of cell morphology.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.58 nM
Key positive control:
Camptothecin;Anisomycin
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
297 nM
Neutral vehicle control:
DMSO
NA
-------
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
-------
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1092 Number of chemicals tested: 1051
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
-------
{(mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 144.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
-------
7. Supporting Information:
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 50
APR_HepG2_MitoMass_72hr
1. General Information
1.1 Assay Title: Aprendica 72-hour HepG2 Human Liver Cell Assay for Mitochondrial Mass
1.2 Assay Summary: APR HepG2 72hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 72 hours after chemical dosing in a 384-well plate.
APR_HepG2_MitoMass_72hr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_72hr assay. It is designed to make measurements of cell phenotype, a form of morphology
reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the
assay component APR_HepG2_MitoMass_72hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_MitoMass_72hr, was analyzed with bidirectional fitting relative to DMSO as the negative control
and baseline of activity. Using a type of morphology reporter, gain or loss-of-signal activity can be used to
understand changes in the signaling. Furthermore, this assay endpoint can be referred to as a primary readout,
because this assay has produced multiple assay endpoints where this one serves a signaling function. To
generalize the intended target to other relatable targets, this assay endpoint is annotated to the cell morphology
intended target family, where the subfamily is organelle conformation.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: MitoTracker Red is used as a stain for the morphology of the mitochondria.
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
-------
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
-------
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.068
Response cutoff threshold used to determine hit calls: 0.684
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of anti-p53 antibody is used to tag and quantify the level of tumor protein p53
protein via fluorescent intensity. Changes in the signals are indicative of protein expression changes as a cellular
response to stress in the system.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of cell morphology.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.58 nM
Key positive control:
Paclitaxel;CCCP
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
297 nM
Neutral vehicle control:
DMSO
NA
-------
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
-------
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1092 Number of chemicals tested: 1051
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
-------
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 128.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
-------
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:52
APR_HepG2_MitoMembPot_72hr
1. General Information
1.1 Assay Title: Aprendica 72-hour HepG2 Human Liver Cell Assay for Mitochondrial Membrane Potential
1.2 Assay Summary: APR HepG2 72hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 72 hours after chemical dosing in a 384-well plate.
APR_HepG2_MitoMembPot_72hr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_72hr assay. It is designed to make measurements of dye binding, a form of membrane potential
reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the
assay component APR_HepG2_MitoMembPot_72hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_MitoMembPot_72hr, was analyzed with bidirectional fitting relative to DMSO as the negative
control and baseline of activity. Using a type of membrane potential reporter, gain or loss-of-signal activity can
be used to understand changes in the signaling. Furthermore, this assay endpoint can be referred to as a primary
readout, because this assay has produced multiple assay endpoints where this one serves a signaling function.
To generalize the intended target to other relatable targets, this assay endpoint is annotated to the cell
morphology intended target family, where the subfamily is organelle conformation.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: MitoTracker Red is used as a stain for the membrane potential of the mitochondria.
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
-------
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
-------
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.073
Response cutoff threshold used to determine hit calls: 0.729
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of anti-phospho-histone-H2AX antibody is used to tag and quantify the level of
phosphorylated H2A histone family, member X protein. Changes in the signals are indicative of protein
expression changes as a cellular response to stress in the system.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of cell morphology.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.58 nM
Key positive control:
Paclitaxel;CCCP
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
297 nM
Neutral vehicle control:
DMSO
NA
-------
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
-------
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1092 Number of chemicals tested: 1051
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
-------
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 147.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
-------
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 54
APR_HepG2_MitoticArrest_72hr
1. General Information
1.1 Assay Title: Aprendica 72-hour HepG2 Human Liver Cell Assay for Mitotic Arrest
1.2 Assay Summary: APR HepG2 72hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 72 hours after chemical dosing in a 384-well plate.
APR_HepG2_MitoticArrest_72hr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_72hr assay. It is designed to make measurements of cell phenotype, a form of morphology
reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the
assay component APR_HepG2_MitoticArrest_72hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_MitoticArrest_72hr, was analyzed with bidirectional fitting relative to DMSO as the negative
control and baseline of activity. Using a type of morphology reporter, measures of protein for gain or loss-of-
signal activity can be used to understand the signaling at the pathway-level as they relate to the gene .
Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has produced
multiple assay endpoints where this one serves a signaling function. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the cell cycle intended target family, where the subfamily
is arrest.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: anti-phospho-histone-H3 antibody is used to tag and quantify the level of phosphorylated H3
histone, family 3A protein. Changes in the signals are indicative of protein expression changes as a cellular
response to stress in the system [GeneSymbol:H3F3A | GenelD:3020 | Uniprot_SwissProt_Accession:P84243],
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
-------
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
-------
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.142
Response cutoff threshold used to determine hit calls: 1.419
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of anti-phospho-histone-H2AX antibody is used to tag and quantify the level of
phosphorylated H2A histone family, member X protein. Changes in the signals are indicative of protein
expression changes as a cellular response to stress in the system.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.58 nM
Key positive control:
Paclitaxel;CCCP
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
297 nM
Neutral vehicle control:
DMSO
NA
-------
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
-------
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1092 Number of chemicals tested: 1051
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
-------
Z Prime Factor for median negative and neutral control across all plates:
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
NA
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells:
((mmed - nmed) /sqrt(mmad2 + nmad2)
NA
Signal-to-noise (median across all plates, using negative control wells):
((mmed-nmed)/nmad)
NA
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 156.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
-------
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 56
APR_HepG2_NuclearSize_72hr
1. General Information
1.1 Assay Title: Aprendica 72-hour HepG2 Human Liver Cell Assay for Nuclear Size
1.2 Assay Summary: APR HepG2 72hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 72 hours after chemical dosing in a 384-well plate.
APR_HepG2_NuclearSize_72hr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_72hr assay. It is designed to make measurements of cell phenotype, a form of morphology
reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the
assay component APR_HepG2_NuclearSize_72hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_NuclearSize_72hr, was analyzed with bidirectional fitting relative to DMSO as the negative control
and baseline of activity. Using a type of morphology reporter, measures of all nuclear dna for gain or loss-of-
signal activity can be used to understand the signaling at the nuclear-level. Furthermore, this assay endpoint
can be referred to as a primary readout, because this assay has produced multiple assay endpoints where this
one serves a signaling function. To generalize the intended target to other relatable targets, this assay endpoint
is annotated to the cell morphology intended target family, where the subfamily is organelle conformation.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Hoechst-33342 dye is used as a stain for DNA to understand morphology in the system.
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
-------
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
-------
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.015
Response cutoff threshold used to determine hit calls: 0.263
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of anti-phospho-histone-H2AX antibody is used to tag and quantify the level of
phosphorylated H2A histone family, member X protein. Changes in the signals are indicative of protein
expression changes as a cellular response to stress in the system.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of cell morphology.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.58 nM
Key positive control:
Paclitaxel;CCCP
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
297 nM
Neutral vehicle control:
DMSO
NA
-------
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
-------
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
-------
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 140.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
-------
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 58
APR_HepG2_P-H2AX_72hr
1. General Information
1.1 Assay Title: Aprendica 72-hour HepG2 Human Liver Cell Assay for H2AX Phosphorylation
1.2 Assay Summary: APR HepG2 72hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 72 hours after chemical dosing in a 384-well plate. APR_HepG2_P-
H2AX_72hr is one of 10 assay component(s) measured or calculated from the APR_HepG2_72hr assay. It is
designed to make measurements of dna content, a form of viability reporter, as detected with fluorescence
intensity signals by HCS Fluorescent Imaging technology. Data from the assay component APR_HepG2_P-
H2AX_72hr was analyzed into 1 assay endpoint. This assay endpoint, APR_HepG2_P-H2AX_72hr, was analyzed
with bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of
viability reporter, measures of protein for gain or loss-of-signal activity can be used to understand the signaling
at the pathway-level as they relate to the gene . Furthermore, this assay endpoint can be referred to as a
primary readout, because this assay has produced multiple assay endpoints where this one serves a signaling
function. To generalize the intended target to other relatable targets, this assay endpoint is annotated to the
dna binding intended target family, where the subfamily is cellular response to DNA damage stimulus.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Anti-phospho-histone-H2AX antibody is used to tag and quantify the level of phosphorylated H2A
histone family, member X protein. Changes in the signals are indicative of protein expression changes as a
cellular response to stress in the system [GeneSymbol:H2AFX | GenelD:3014 |
Uniprot_SwissProt_Accession:P16104],
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
-------
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
-------
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations: Target (nominal) number of replicates:
10 1
Standard minimum concentration tested: Standard maximum concentration tested:
0.58 nM 297 nM
Key positive control: Neutral vehicle control:
Camptothecin;Anisomycin DMSO
Baseline median absolute deviation for the assay (bmad): 0.11
Response cutoff threshold used to determine hit calls: 1.097
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of anti-phospho-c-jun antibody is used to tag and quantify the level of
phosphorylated jun proto-oncogene protein. Changes in the signals are indicative of protein expression changes
as a cellular response to stress in the system.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
-------
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
-------
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
-------
Z Prime Factor for median negative and neutral control across all plates:
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
NA
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells:
((mmed - nmed) /sqrt(mmad2 + nmad2)
NA
Signal-to-noise (median across all plates, using negative control wells):
((mmed-nmed)/nmad)
NA
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 154.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
-------
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 60
APR_HepG2_p53Act_72hr
1. General Information
1.1 Assay Title: Aprendica 72-hour HepG2 Human Liver Cell Assay for p53 Activation
1.2 Assay Summary: APR HepG2 72hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 72 hours after chemical dosing in a 384-well plate.
APR_HepG2_p53Act_72hr is one of 10 assay component(s) measured or calculated from the APR_HepG2_72hr
assay. It is designed to make measurements of dna content, a form of viability reporter, as detected with
fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the assay component
APR_HepG2_p53Act_72hr was analyzed into 1 assay endpoint. This assay endpoint, APR_HepG2_p53Act_72hr,
was analyzed with bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using
a type of viability reporter, measures of protein for gain or loss-of-signal activity can be used to understand the
signaling at the pathway-level as they relate to the geneTP53. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
signaling function. To generalize the intended target to other relatable targets, this assay endpoint is annotated
to the dna binding intended target family, where the subfamily is tumor suppressor.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: anti-p53 antibody is used to tag and quantify the level of tumor protein p53 protein. Changes in
the signals are indicative of protein expression changes as a cellular response to stress in the system
[GeneSymbol:TP53 | GenelD:7157 | Uniprot_SwissProt_Accession:P04637],
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
-------
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
-------
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
Baseline median absolute deviation for the assay (bmad): 0.118
Response cutoff threshold used to determine hit calls: 1.182
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of anti-phospho-c-jun antibody is used to tag and quantify the level of
phosphorylated jun proto-oncogene protein. Changes in the signals are indicative of protein expression changes
as a cellular response to stress in the system.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
10
Standard minimum concentration tested:
0.58 nM
Key positive control:
Camptothecin;Anisomycin
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
297 nM
Neutral vehicle control:
DMSO
NA
-------
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
-------
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
-------
{(mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 152.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
-------
7. Supporting Information:
More information on the ToxCast program can be found at: httpsi//www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www,epa,gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 62
APR_HepG2_StressKinase_72hr
1. General Information
1.1 Assay Title: Aprendica 72-hour HepG2 Human Liver Cell Assay for Stress Kinase
1.2 Assay Summary: APR HepG2 72hr is a cell-based, multiplexed-readout assay that uses HepG2, a human liver
cell line, with measurements taken at 72 hours after chemical dosing in a 384-well plate.
APR_HepG2_StressKinase_72hr is one of 10 assay component(s) measured or calculated from the
APR_HepG2_72hr assay. It is designed to make measurements of enzyme activity, a form of enzyme reporter,
as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the assay
component APR_HepG2_StressKinase_72hr was analyzed into 1 assay endpoint. This assay endpoint,
APR_HepG2_StressKinase_72hr, was analyzed with bidirectional fitting relative to DMSO as the negative control
and baseline of activity. Using a type of enzyme reporter, measures of protein for gain or loss-of-signal activity
can be used to understand the signaling at the pathway-level as they relate to the gene . Furthermore, this
assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a signaling function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the cell cycle intended target family, where the subfamily is stress
response.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Apredica, a part of Cyprotex, is a preclinical Contract Research Organization (CRO) that provides
services including the CellCiphr High Content Imaging system.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: Assay is non-proprietary; imaging and analysis software are available from Cellomics, Inc.
1.9 Assay Throughput: 384-well plate. Assay was conducted om human hepatocellular carcinoma cell line HepG2
(HB-8065) on 384-well plates. HCI was used to evaluate the effects of chemicals (in concentrations ranging from
0.4 to 200 uM) on HepG2 cells over a 72-hr exposure period.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: anti-phospho-c-jun antibody is used to tag and quantify the level of phosphorylated jun proto-
oncogene protein. Changes in the signals are indicative of protein expression changes as a cellular response to
stress in the system [GeneSymbokJUN | GenelD:3725 | Uniprot_SwissProt_Accession:P05412],
High-content screening methods in early safety assessments are critical to understand the complex biology
triggered by potentially harmful molecules in cells of target organs. High-content imaging (HCI) allows
simultaneous measurement of multiple cellular phenotypic changes, which can be an important tool for
evaluating the biological activity of chemicals. To analyze dynamic cellular changes, HCI was used to identify the
"tipping point" at which the cells did not show recover towards a normal phenotypic state. The goal of
integrating cellular toxicology models with HCS detection is to generate a platform that can both predict the
-------
safety risk liability of a compound with high specificity and sensitivity while also identifying mechanism(s) of
action of the toxic response.
2.2 Scientific Principles: Researchers used high-content imaging (HCI) (Giuliano et al. 2006), which applies
automated image analysis techniques to capture multiple cytological features using fluorescent labels, to
measure the concentration-dependent dynamic changes in the state of HepG2 cells. Although they are not fully
metabolically capable, HepG2 cells can undergo continuous proliferation in culture and have a demonstrated
capacity to predict hepatotoxicity of pharmaceutical compounds with good sensitivity and specificity (O'Brien
et al. 2006; Abraham et al. 2008). Researchers used computational tools to deconvolute HCI responses into cell-
state trajectories and to analyze them for their propensity to recover to normal (basal) conditions over the test
period. The critical concentrations associated with nonrecoverable cellular trajectories were determined, where
possible, and compiled into a novel chemical classification scheme.
2.3 Experimental System: adherent HepG2 cell line used. Human hepatocellular carcinoma cell line HepG2
(HB8065), used for the Brdll incorporation assay, was purchased from American Type Culture Collection (ATCC)
and used before passage 20. Cells were maintained and expanded in complete media [10% fetal bovine serum
(FBS) in Minimum Essential Medium with Earle's Balanced Salt Solution (MEM/EBSS) supplemented with
penicillin/streptomycin, L-glutamine, and non-essential amino acids]. Cell culture reagents were obtained from
VWR International. HepG2 cells were harvested by trypsinization and plated at different densities in 25 uL of
culture medium, depending on incubation time, in clear-bottom, 384-well microplates (Falcon 3962) that were
coated with rat tail collagen I. The cells were incubated overnight to allow attachment and spreading.
2.4 Metabolic Competence: HepG2 cells are an immortalized cell line with characteristics that differ from those of
normal hepatocytes. For example, these cells easily proliferate in culture but have limited metabolic activity
compared with primary hepatocytes. The HepG2 cell model used was a two-dimensional monoculture that does
not reflect the complex cell-to-cell interactions present in intact organs that have multiple cell types.
2.5 Exposure Regime: Cells were treated with dimethyl sulfoxide (DMSO) as a solvent control at a final concentration
of 0.5% v/v or with compounds in DMSO with a resulting final DMSO concentration of 0.5% v/v. Compound
treatment was done at concentrations of 0.39,0.78,1.56, 3.12, 6.24,12.5, 25, 50,100, and 200 uM in duplicate
on each plate. Cells were treated with compounds for 1, 24, or 72 hr. Carbonyl cyanide m-chloro-
phenylhydrazone (CCCP) and taxol were used as positive controls for mitochondrial function and cytoskeletal
stability, respectively; DMSO served as the negative control for this experiment. Cells were fixed by the direct
addition of 50 uL formaldehyde in Hank's Balanced Salt Solution (HBSS) to a final concentration of 3.7%. After
incubation in the fixation medium for 30 min at room temperature (293-298 K), cells were rinsed twice with
HBSS and treated with cell permeabilization buffer (16 uLof 0.5% Triton X-100) for 10 min at room temperature
(293-298 K) before labeling. For mitochondrial membrane potential and mitochondrial measurements, pre-
fixed cells were incubated with 50 uL of MitoTracker Red CMXRos (Invitrogen) at a concentration of 250 nM for
30 min before fixation. In the remaining cases, post-fixed cells were labeled by incubation with a multiplexed
mixture of primary antibodies in HBSS for 60 min at room temperature (293-298 K) to detect immunoreactivity
of c-Jun (1:500), phospho-histone H3 (1:100), phospho-histone H2A.X (1:200), p53 (1:400), alpha-tubulin (1:200)
and Hoechst 33342 (2 ug/mL). Cells were labeled for multiplexed imaging on two separate plates: a) Hoechst
33342, MitoTracker Red, phospho-histone H3, and alpha-tubulin, and b) Hoechst 33342, phospho-histone
H2A.x, and c-Jun. A final rinse with HBSS (50 uL) was performed before analysis. The primary and secondary
antibodies for the proteins were phospho-histone H3 (rabbit anti-phospho-histone H3 and FITC-donkey anti-
rabbit IgG), phospho-histone H2A.X (mouse anti-phospho-histone H2A.X and FITC-donkey anti-mouse IgG), c-
Jun (rabbit anti-phospho-c-Jun and Cy3-donkey anti-rabbit IgG), p53 (sheep anti-p53 and Cy5-donkey anti-sheep
IgG), alpha-tubulin (mouse anti-alpha-tubulin and Cy5-donkey anti-mouse IgG). These antibodies are available
as the CellCiphr HepG2 assay kit (Millipore). Digital images of each well were captured using a Cellomics
ArrayScan VTI (Thermo Scientific Cellomics) (0.8 NA objective, 0.63x optical coupler, and XF-93 filter set) at 20x
magnification. The images were acquired using the autofocus feature of the ArrayScan instrument, which entails
the following steps. First, the camera focuses on channel 1 (Hoechst 33342), where nuclei are identified. Second,
a Z offset of 1 um is used for capturing mitochondria (MitoTracker Red). Third, a Z offset of -2 um is used for
-------
capturing the cytoskeleton (tubulin). Six digital images were captured in each well and analyzed using
BioApplication software, which was provided with the instrument. All images were analyzed using the
Compartmental Analysis and Cell Cycle Analysis BioApplication software from Cellomics. The Cell Cycle
BioApplication software used the nuclear stain to identify valid cells, to measure nuclear diameter, and to
quantify DNA content. These features were used to calculate the average nuclear size, cell cycle arrest (ratio of
2N/4N), and cell number. The Compartmental Analysis BioApplication software module was used to measure
the average cell intensities for c-Jun phosphorylation, p53 protein activation, phospho-histone H2A.X activation,
mitochondria, and alpha-tubulin. The average intensity of mitochondria was used to define mitochondrial
membrane potential, and the total intensity was used to define mitochondrial mass. Data from cellular features
measured in the nucleus were excluded for wells where there was a significant decrease in nuclear size and
brightness. Detailed documentation about the algorithms and parameters used by the BioApplication software
for this analysis are available upon request. Cellular features were aggregated at the well level to quantify the
following end points: p53 activation, c-Jun activation (stress kinase), phospho-histone H2A.X (DNA damage
produced by oxidative stress), phospho-histone H3 (mitotic arrest), alpha-tubulin (microtubules), mitochondrial
membrane potential, mitochondrial mass, cell cycle arrest, nuclear size, and cell number.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations: Target (nominal) number of replicates:
10 1
Standard minimum concentration tested: Standard maximum concentration tested:
0.58 nM 297 nM
Key positive control: Neutral vehicle control:
Camptothecin;Anisomycin DMSO
Baseline median absolute deviation for the assay (bmad): 0.109
Response cutoff threshold used to determine hit calls: 1.088
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: Fluorescent intensity of anti-phospho-c-jun antibody is used to tag and quantify the level of
phosphorylated jun proto-oncogene protein. Changes in the signals are indicative of protein expression changes
as a cellular response to stress in the system.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
-------
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Concentration response data from the HCI experiment were smoothed and normalized for every
chemical, end point, and time. The raw concentration responses were smoothed using a Hamming window
(Blackman and Tukey 1958) of length 7. Next, the smoothed data (r) for end points measured on each plate
were normalized to the median response (r*) to calculate perturbations as the logarithm (base 2) of fold change
values. The normalized changes (x = log2 r/r*) were also standardized (z = (x - x*)/sigma * x) to evaluate the
importance of perturbations (where sigma * x is the standard deviation of x). The lowest effect concentration
(LEC) for each chemical and end point was calculated as the concentration that produced a fold change
perturbation at least one standard deviation (i.e., sigma * x = 1) above or below the median value. An absolute
perturbation > one standard deviation was called a "hit" (i.e., |sigma * x| >1). The LEC was estimated by
numerically solving for: |z| = 1 (the minimum value was selected if there were multiple solutions). The efficacy
was measured as maximum positive or negative value of x.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 7: bmadlO (Add a cutoff
value of 10 multiplied by the baseline median absolute deviation (bmad). By default, bmad is calculated
using test compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
-------
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 1108 Number of chemicals tested: 1066
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
-------
Z Prime Factor for median negative and neutral control across all plates:
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
NA
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells:
((mmed - nmed) /sqrt(mmad2 + nmad2)
NA
Signal-to-noise (median across all plates, using negative control wells):
((mmed-nmed)/nmad)
NA
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 153.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Giuliano KA, Gough AH, Taylor DL, Vernetti LA, Johnston PA. Early safety assessment using cellular
systems biology yields insights into mechanisms of action. J Biomol Screen. 2010 Aug;15(7):783-97. doi:
10.1177/1087057110376413. Epub 2010 Jul 16. PubMed PMID: 20639501., Shah, I., Setzer, R. W., Jack, J.,
Houck, K. A., Judson, R. S., Knudsen, T. B., Liu, J., Martin, M. T., Reif, D. M., Richard, A. M., Thomas, R. S., Crofton,
-------
K. M., Dix, D. J., & Kavlock, R. J. (2016). Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and
Estimate Toxicological Points of Departure. Environmental health perspectives, 124(7), 910-919.
https://doi.org/10.1289/ehp.1409029
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1825
Aru n A_Cel ITiter_h N P
1. General Information
1.1 Assay Title: Viability Assessment in the ArunA Biomedical's Oris Neural Crest (hNC) Cell Migration Assay
1.2 Assay Summary: ArunA_CellTiter_hNP is a cell-based, single-readout assay that uses human H9-derived
neuroprogenitor stem cells (hNPl).Measurements were taken 72 hours after chemical dosing in a 96-well plate.
ArunA_CellTiter_hNP is an assay component measured from the ArunA_CellTiter_hNP assay. It is designed to
make measurements of viability, a form of viability reporter, as detected with fluorescence intensity signals by
HCS Fluorescent Imaging technology. Data from the assay component ArunA_CellTiter_hNP was analyzed at
the endpoint, ArunA_CellTiter_hNP, in the positive analysis fitting direction relative to DMSO as the negative
control and baseline of activity. Using a type of viability reporter, loss-of-signal activity can be used to
understand the viability. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the cell cycle intended target family, where the subfamily is cytotoxicity.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: ArunA Biomedical is a privately owned biotechnology company and Contract Research
Organization (CRO) formerly providing toxicology screening using neural stem cell-based assays.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: This protocol describes the use of ArunA Biomedical's hNPl Neural Progenitor Cells in
conjunction with an Oris Cell Migration Assembly Kit- FLEX to measure the effect of neuroactive compounds and
biologies that modulate proliferation and migration of neural progenitor cells. Certain uses of these products
may be covered by U.S. Pat. No. 6,200,806; No. 7,531,354,B2 licensed to ARUNA and U.S. Pat. No. 7,842,499;
No. 7,018,838; No. 10/597,118; No. 11/342,413; No. 11/890,740; and No. 12/195,007 licensed to PLATYPUS.
1.9 Assay Throughput: 96-well plate. ArunA systems offer high throughput chemical screening in a 96-well format
for the human neuroprogenitor (hNP) and human neural crest (hNC) migration and cell titer endpoints.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes in fluorescence intensity related to the number of [3H]-thymidine labelled nuceli is
indicative of the viability of the system.
Chemical-induced perturbations to cellular key events across neurogenic outcomes, including migration
(neuroprogenitor and neural crest cells) and neural network formation (neurite length, neurite length, and
branch points for neurites), in vitro can inform on cell-based prioritization of neurodevelopmental hazard
potential.
2.2 Scientific Principles: During the development of the nervous systems, many processes occur to give rise to a
functional and healthy neural network and hence nervous system. These important neurodevelopmental
processes may be disrupted by potential toxicants, resulting in developmental neurotoxicity. Among these
processes is migration of neuroprogenitors and neural crest cells (NCCs). Impaired neuroprogenitor and NCC
-------
migration can lead to cerebral malformations and neurodevelopmental disorders, such as diencephalic-
mesencephalic dysplasia syndrome, cerebral palsy, cerebellar ataxia, and microcephaly.
2.3 Experimental System: adherent hNPl cell line used. The hN2 cell line is derived from neuroepithelial cells of
WA09 hESC (Thomson et al., 1998) origin according to a previously described protocol (Shin et al., 2005, Shin et
al., 2006). Importantly, as opposed to other methods of deriving neural progenitors through three-dimensional
neurosphere and embryoid body formations (Reubinoff et al., 2001, Zhang et al., 2001), these adherent
monolayer cultures are uniformly exposed to growth factors and/or morphogens throughout their propagation.
Neurogenic lineages from human embryonic stem cell line WA09 were locked into three neural differentiation
states: neuroprogenitor (hNPl - Cat no. 7009), neural crest (hNC - Cat no. 7029), and neural network (hNN -
Cat no. hNJL7014). Prior to differentiation into hN2 cells the population was confirmed karyotypically normal,
>95% nestin positive and <3% OCT-4 positive (Shin et al., 2006). The cells were produced in bulk by propagation
for an additional 2 weeks beyond the neuroepithelial stage by removal of bFGF from the media and
cryopreserved (ArunA Biomedical, Athens, GA) for end user applications. The hNP and hNC cell endpoints
consisted of cell titer and migratory measurements whereas hNN cell endpoints consisted of neuron count and
three neurite-specific metrics to assess network formation: neurite length, neurites per neuron, and branch
points for neurites. For this study, ArunA Bio extended the differentiation period of the hNN cells by
approximately two weeks more than in the original hN2 protocol. This allowed for increased neural network cell
axonation leading to better quantitation of network-specific endpoints. The utility of dissociated hN2 cultures
as an in vitro model for neurite outgrowth was assessed using automated high-content image analysis (HCA). In
addition, the molecular phenotype of these cells was examined using immunocytochemical staining.
2.4 Metabolic Competence: H9-derived cells are locked at different neuronal developmental states of interest to
DNT investigations of chemical exposures. Xenobiotic biotransformation potential has not been characterized.
2.5 Exposure Regime: To assess the hNP and hNC migration and cell titer endpoints, 60,000 cells per well were
plated onto Matrigel in basal growth medium with LIF and bFGF in a 96-well plate format. Plates were incubated
for 16 h at 37C followed by a 72 h exposure to chemical in the test medium. For the migration endpoints, cells
were seeded and incubated in presence of 'seeding stoppers' to prevent cell migration and growth into the
detection zone. At the beginning of chemical exposure, stoppers were removed, and growth medium was
replaced with test medium. In the case of the stopper control wells, stoppers remained in place following
replacement of growth medium with test medium. Following 72 -h exposure to the test medium, cells were
stained at 37C for 30-60 min with calcein-AM. Cell viability in the detection zone was quantitated using a
Flexstation3 microplate reader (ex494 nm/em 517 nm). Corresponding cell titer endpoints were assessed for
the hNP and hNC cells using the Promega CellTiter Aqueous One Solution Cell Proliferation Assay (Cat no. G3581;
CellTiter 96). Finally, to gain insight into the mechanisms by which cells migrate into the detection zone, Ki-67
expression was quantified for 10 additional chemicals in the hNP and hNC systems. Additionally, cytochalasin D
was used as a positive control to inhibit cell migration. Supplementing the AB2 Basal Medium: 1.
Decontaminate the external surfaces of all supplement vials and the medium bottle with ethanol or isopropanol.
2. Aseptically open each supplement vial and add the amount indicated below to the basal medium with a
pipette. To make 100 ml of complete medium: AB2 Neural Medium 96 mL, ANS Supplement 2 mL, bFGF (50
ug/mL) 40 uL, LIF (10 ug/mL) 100 uL, L-Glutamine (200 mM) 1 mL, Penicillin (5,000 U/mL)/Streptomycin (5,000
Ug/mL) 1 mL. 3. Supplemented medium should be stored at 2-8C, protected from light. The complete medium
should be given a 2 week expiration date. Dispense the complete medium into aliquots to avoid repeated
heating prior to each use. Plate Coating Protocol for hNPl Neural Progenitor Expansion: To coat dishes perform
the following steps: 1. Thaw BD Matrigel at 2-8C overnight. Matrix will gel rapidly at 22C to 35C. Keep Matrigel
on ice and use pre-cooled pipettes, plates and tubes when preparing. Gelled Matrigel may be re-liquified if
placed at 2-8C on ice for 24 to 48 hours. 2. Handle using aseptic technique in a laminar flow hood. 3. Once BD
Matrigel Matrix is thawed, swirl vial to be sure that material is evenly dispersed. 4. Place thawed vial of BD
Matrigel Matrix in sterile area, decontaminate the external surfaces with ethanol or isopropanol and air dry. BD
Matrigel Matrix may be gently pipetted using a pre-cooled pipette to ensure homogeneity. 5. Dilute Matrigel
1:200 with cooled Dulbecco's Modified Eagle's Medium. Keep on ice. 6. Add 2 mL diluted Matrigel to a 35-mm
dish. Swirl to ensure the entire surface of the 35-mm dish is covered with the Matrigel solution. 7. Place dishes
-------
at 2-8Cfor 1-3 hours. 8. Rinse thoroughly with PBS. 9. Remove PBS and use immediately. Cell Thawing Protocol
for hNPl Neural Progenitor Expansion: To plate the cells perform the following steps: 1. Do not thaw the cells
until the recommended medium and appropriately coated plasticware and/or glassware are on hand. 2. Remove
the vial from liquid nitrogen and incubate in a 37C water bath. Closely monitor until the cells are completely
thawed. Maximum cell viability is dependent on the rapid and complete thawing of frozen cells. IMPORTANT:
Do not vortex the cells. Breaking cells down to single cell suspensions will significantly increase cell death. 3. As
soon as the cells are completely thawed, disinfect the outside of the vial with 70% ethanol or isopropanol.
Proceed immediately to the next step. 4. In a laminar flow hood, use a 1 or 2 mL pipette to transfer the cells to
a sterile 15 mL conical tube. Be careful to not introduce any bubbles during the transfer process. 5. Using a 10
mL pipette, slowly add dropwise 9 mL of fully supplemented AB2 Neural Medium (pre-warmed to 37C) to the
15 mL conical tube. IMPORTANT: Do not add the whole volume of medium at once to the cells. This may result
in decreased cell viability due to osmotic shock. 6. Gently mix the cell suspension by slow pipetting up and down
twice. Be careful to not introduce any bubbles. IMPORTANT: Do not vortex the cells. Breaking cells down to
single cell suspensions will significantly increase cell death. 7. Centrifuge the tube at room temperature at 200
x g for 4 minutes to pellet the cells. 8. Aspirate as much of the supernatant as possible. Steps 4-8 are necessary
to remove residual cryopreservative (DMSO). 9. Resuspend the cells in a total volume of 2 mL of fully
supplemented AB2 Neural Medium (pre-warmed to 37C). 10. Plate the 2 mL cell suspension of hNPl cells onto
a Matrigel-coated 35 mm dish. 11. Incubate the cells at 37C in a 5% C02 humidified incubator. 12. Exchange the
medium with fresh fully supplemented AB2 Neural Medium 24 hours post plating. Exchange with fresh medium
every other day thereafter. Use caution not to dislodge the cells; do not pipette media directly onto the cells
but rather onto the side of the culture dish. 13. Once the hNPl cells reach 100% confluence, they can be
dissociated manually for passaging (e.g., by cell scraping or by gentle and slow pipetting up and down to detach
the cells). The cells should be maintained at a high density at all times - the recommended passaging ratio is
1:2. Subculture of hNPl Cells: 1. Once the hNPl cells reach 100% confluence, carefully remove the medium
from the 35 mm dish. 2. Apply 2 mL fully supplemented AB2 Neural Medium (pre-warmed to 37C) to the cells
so that the cells can be harvested in fresh medium. 3. Using a pipette, manually detach the cells from the dish
by slow pipetting up and down the dish. Be careful to avoid introducing any bubbles. We recommend using a
200 uL or 1000 uL manual pipette to dislodge the attached cells. Alternatively, cells can be dislodged with a
sterile cell scraper. IMPORTANT: We do NOT recommend enzymatic methods for passaging the hNPl cells.
Doing so reduces the long term viability of the cells and can cause karyotypic abnormalities. 4. Plates should be
observed to ensure that all cells have been removed. This is most easily accomplished by working under a
dissection microscope within a laminar flow hood, but can also be achieved by frequent observation under a
bright field or phase contrast microscope. 5. Transfer the dissociated cells to a 50 mL conical tube. Inspect the
plate to ensure that all the cells have been removed. 6. If necessary, count the cells and calculate the cell
concentration. Cells can be centrifuged at 200 x g for 4 minutes in order to concentrate the cell suspension for
higher plating densities. 7. Plate the cells at the desired density into the appropriately coated flasks, plates or
wells in fully supplemented AB2 Neural Medium. We recommend keeping the cells at a high cell density by
passaging 1:2. 8. Incubate the cells at 37C in a 5% C02 humidified incubator. 9. Exchange the medium with fresh
fully supplemented AB2 Neural Medium 24 hours post plating. Exchange with fresh medium every other day
thereafter. Use caution not to dislodge the cells; do not pipette media directly onto the cells but rather onto
the side of the culture dish. Plate Coating Protocol for Cell Migration Assay: 1. Thaw BD Matrigel at 2-8C
overnight. Since it will gel rapidly at 22C to 35C, keep Matrigel on ice and use pre-cooled pipettes, plates and
tubes when preparing. Gelled Matrigel may re-liquefy if placed at 2-8C on ice for 24 to 48 hours. 2. Handle using
aseptic technique in a laminar flow hood. 3. Once the Matrigel is thawed, swirl vial to be sure that material is
evenly dispersed. 4. Place thawed vial of Matrigel in sterile area, decontaminate the external surfaces with
ethanol or isopropanol and air dry. Matrigel may be gently pipetted using a pre-cooled pipette to ensure
homogeneity. 5. Dilute Matrigel 1:200 with cooled AB2 Neural Culture Medium. Prepare 1 mL diluted Matrigel
for each column (8 wells) to be used. Keep on ice. 6. Add 100 uL of diluted Matrigel to each well intended for
use in the 96 well plate. 7. Tap the plate gently to ensure the entire surface of the well is covered with diluted
Matrigel. 8. Place dishes at 2-8C for 1-3 hours. 9. Remove the residual coating solution and rinse each well twice
with 200 uL of PBS per well. 10. Remove PBS and insert the Oris Cell Seeding Stoppers into the coated wells of
the 96-well plate. 11. Visually inspect to ensure that the Oris Cell Seeding Stoppers are firmly sealed. Cell
Migration Assay Protocol: 1. Harvest cells as described in steps 1-5 of section Subculture of hNPl Neural
-------
Progenitor cells. 2. Count cells and adjust cell suspension volume to the following concentration: 600,000
cells/mL 3. Plate 100 uL of suspended cells into each stoppered well for a cell density of 60,000 cells per well. 4.
Incubate the cells at 37C in a 5% C02 humidified incubator overnight (16-24 hours) to permit cell attachment.
5. Using the Oris Stopper Tool, remove all stoppers, except for those in "no migration controls" which will remain
in place until time of staining. 6. Carefully remove the seeding media from the wells and add 200 uL medium
containing the test compound per well. 7. Briefly examine the wells by phase contrast microscopy to ensure
continued adherence of the cells. 8. Incubate the cells at 37C/5% C02 for 72 hours to permit cell migration. 9.
After 72 hours, mix 5 uL Calcein AM, 5 uL Hoechst 33342, and 10 mL phenol red-free Neurobasal medium with
0.1% BSA. 10. Carefully remove stoppers from the "no migration controls". 11. Carefully remove the test
medium from all wells and add 100 uL of diluted Calcein/Hoechst solution to each well. 12. Incubate plate at
37C/5% C02for 30- 60 minutes with the lid on and in the dark (the darkness of a standard incubator will suffice).
13. For use with a fluorescence microplate reader, attach the Oris Detection Mask and read promptly for Calcein
fluorescence (ex 494 nm/ em 517 nm). 14. For image analysis, photomicrograph wells using epifluorescence
illumination with or without the Oris Detection mask. Images can then be analyzed using either area closure
with the calcein stain or number of cells (nuclei) using the Hoechst stain. ImageJ freeware available from the
NIH (http://rsbweb.nih.gov/ij/) can be used for migration data analysis as percent area closure or cellular
enumeration
Baseline median absolute deviation for the assay (bmad): 0.115
Response cutoff threshold used to determine hit calls: 0.344
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: The ArunA migration assay measures growth and survival in human embryonic neuroprogenitor
(hNP) and human neural crest (hNC) cells by tracking the presence/absence of viable nuclei movement into a
defined circular area in each microplate well. These different measurements are assessed following 72 hour
incubations with test chemical to evaluate the potential to disrupt neural migration in developing human
embryos.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
5
Standard minimum concentration tested:
1.2 nM
Key positive control:
NA
Target (nominal) number of replicates:
4
Standard maximum concentration tested:
100 nM
Neutral vehicle control:
DMSO
NA
3.
Additionally, this assay was annotated to the intended target family of cell cycle.
Data Interpretation
-------
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: The migration of neuroprogenitor and neural crest cells into the detection zone was assessed by
comparing the percent migration of proliferative Ki-67 cells to total migrating cells following 72 hours exposure.
This was accomplished by determining the percentage of total cells migrating into the detection zone, i.e. the
migration index (Ml), compared to the percentage of migrating cells that expressed the Ki-67 proliferative
marker within the detection zone, i.e. the proliferative index (PI). Normalized response values for each assay
endpoint were calculated as resp = 100 x (rval-bval) / (pval-bval) where rval, bval, and pval correspond to the
raw value, the plate level DMSO control median, and the plate level positive/negative control median,
respectively. In the parallel viability assessment, normalized response was calculated as resp = log2(rval/bval).
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
7: resp.log2 (Transform the response values to log-scale (base 2).), 9: resp.fc (Calculate the normalized
response (resp) as the fold change, i.e. the ratio of the corrected (cval) and baseline (bval) values; resp =
cval/bal.), 11: bval.apid.nwlls.med (Calculate the baseline value (bval) as the plate-wise median, by assay
plate ID (apid), of the corrected values (cval) for neutral control wells (wilt = n).)
Level 4: Baseline and required tcplFit2 parameters defined by:
2: bmad.aeid.lowconc.nwells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (resp) for neutral control wells (wilt = n). Calculate one
standard deviation of the normalized response for neutral control wells (wilt = n); onesd = sqrt(sum((resp
- mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
1: bmad3 (Add a cutoff value of 3 multiplied by the baseline median absolute deviation (bmad) as defined
at Level 4.), 27: ow_bidirectional_loss (Multiply winning model hitcall (hitc) by -1 for models fit in the
positive analysis direction. Typically used for endpoints where only negative responses are biologically
relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
-------
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 19: viability.gnls (Flag series with
an active hit call (hitc >= 0.9) if denoted as cell viability assay with winning model is gain-loss (gnls); if hitc
>= 0.9, modl=="gnls" and cell_viability_assay == 1, then flag.), 20: no.med.gt.3bmad (Flag series where
no median response values are greater than baseline as defined by 3 times the baseline median absolute
deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg is the
number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 63 Number of chemicals tested: 58
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
20
Inactive hit count: 0
-------
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed
1.332
Neutral control median absolute deviation, by plate: nmad
0.103
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100
7.79%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed
Positive control well median absolute deviation, by plate: pmad
NA
NA
Z Prime Factor for median positive and neutral control across all plates:
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells:
((pmed - nmed) /sqrtfpmad2 + nmad2)
NA
NA
-------
Positive control signal-to-noise: ((pmed-nmed)/nmad)
NA
Positive control signal-to-background: (pmed/nmed)
NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed
NA
Negative control well median absolute deviation value, by plate: mmad
NA
Z Prime Factor for median negative and neutral control across all plates:
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
NA
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells:
((mmed - nmed) /sqrtfmmad2 + nmad2)
NA
Signal-to-noise (median across all plates, using negative control wells):
((mmed-nmed)/nmad)
NA
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 7.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
-------
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Zurlinden TJ, Saili KS, Baker NC, Toimela T, Heinonen T, Knudsen TB. A cross-platform approach to
characterize and screen potential neurovascular unit toxicants. Reprod Toxicol. 2020 Jun 24;96:300-315. doi:
10.1016/j.reprotox.2020.06.010. Epub ahead of print. PMID: 32590145.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1826
Aru n A_Cel ITiter_h NC
1. General Information
1.1 Assay Title: Viability Assessment in the ArunA Biomedical's Oris Neural Crest (hNC) Cell Migration Assay
1.2 Assay Summary: ArunA_CellTiter_hNC is a cell-based, single-readout assay that uses human H9-derived
embryonic neural crest stem cells (hNC). Measurements were taken 72 hours after chemical dosing in a 96-well
plate. ArunA_CellTiter_hNC is an assay component measured from the ArunA_CellTiter_hNC assay. It is
designed to make measurements of viability, a form of viability reporter, as detected with fluorescence intensity
signals by HCS Fluorescent Imaging technology. Data from the assay component ArunA_CellTiter_hNC was
analyzed at the endpoint, ArunA_CellTiter_hNC, in the positive analysis fitting direction relative to DMSO as the
negative control and baseline of activity. Using a type of viability reporter, loss-of-signal activity can be used to
understand viability. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the cell cycle intended target family, where the subfamily is cytotoxicity.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: ArunA Biomedical is a privately owned biotechnology company and Contract Research
Organization (CRO) formerly providing toxicology screening using neural stem cell-based assays.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: This protocol describes the use of ArunA Biomedical's hNPl Neural Progenitor Cells in
conjunction with an Oris Cell Migration Assembly Kit- FLEX to measure the effect of neuroactive compounds and
biologies that modulate proliferation and migration of neural progenitor cells. Certain uses of these products
may be covered by U.S. Pat. No. 6,200,806; No. 7,531,354,B2 licensed to ARUNA and U.S. Pat. No. 7,842,499;
No. 7,018,838; No. 10/597,118; No. 11/342,413; No. 11/890,740; and No. 12/195,007 licensed to PLATYPUS.
1.9 Assay Throughput: 96-well plate. ArunA systems offer high throughput chemical screening in a 96-well format
for the human neuroprogenitor (hNP) and human neural crest (hNC) migration and cell titer endpoints.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes in fluorescence intensity related to the number of [3H]-thymidine labelled nuceli is
indicative of the viability of the system.
Chemical-induced perturbations to cellular key events across neurogenic outcomes, including migration
(neuroprogenitor and neural crest cells) and neural network formation (neurite length, neurite length, and
branch points for neurites), in vitro can inform on cell-based prioritization of neurodevelopmental hazard
potential.
2.2 Scientific Principles: During the development of the nervous systems, many processes occur to give rise to a
functional and healthy neural network and hence nervous system. These important neurodevelopmental
processes may be disrupted by potential toxicants, resulting in developmental neurotoxicity. Among these
processes is migration of neuroprogenitors and neural crest cells (NCCs). Impaired neuroprogenitor and NCC
-------
migration can lead to cerebral malformations and neurodevelopmental disorders, such as diencephalic-
mesencephalic dysplasia syndrome, cerebral palsy, cerebellar ataxia, and microcephaly.
2.3 Experimental System: adherent hNC cell line used. The hN2 cell line is derived from neuroepithelial cells of
WA09 hESC (Thomson et al., 1998) origin according to a previously described protocol (Shin et al., 2005, Shin et
al., 2006). Importantly, as opposed to other methods of deriving neural progenitors through three-dimensional
neurosphere and embryoid body formations (Reubinoff et al., 2001, Zhang et al., 2001), these adherent
monolayer cultures are uniformly exposed to growth factors and/or morphogens throughout their propagation.
Neurogenic lineages from human embryonic stem cell line WA09 were locked into three neural differentiation
states: neuroprogenitor (hNPl - Cat no. 7009), neural crest (hNC - Cat no. 7029), and neural network (hNN -
Cat no. hNJL7014). Prior to differentiation into hN2 cells the population was confirmed karyotypically normal,
>95% nestin positive and <3% OCT-4 positive (Shin et al., 2006). The cells were produced in bulk by propagation
for an additional 2 weeks beyond the neuroepithelial stage by removal of bFGF from the media and
cryopreserved (ArunA Biomedical, Athens, GA) for end user applications. The hNP and hNC cell endpoints
consisted of cell titer and migratory measurements whereas hNN cell endpoints consisted of neuron count and
three neurite-specific metrics to assess network formation: neurite length, neurites per neuron, and branch
points for neurites. For this study, ArunA Bio extended the differentiation period of the hNN cells by
approximately two weeks more than in the original hN2 protocol. This allowed for increased neural network cell
axonation leading to better quantitation of network-specific endpoints. The utility of dissociated hN2 cultures
as an in vitro model for neurite outgrowth was assessed using automated high-content image analysis (HCA). In
addition, the molecular phenotype of these cells was examined using immunocytochemical staining.
2.4 Metabolic Competence: H9-derived cells are locked at different neuronal developmental states of interest to
DNT investigations of chemical exposures. Xenobiotic biotransformation potential has not been characterized.
2.5 Exposure Regime: To assess the hNP and hNC migration and cell titer endpoints, 60,000 cells per well were
plated onto Matrigel in basal growth medium with LIF and bFGF in a 96-well plate format. Plates were incubated
for 16 h at 37C followed by a 72 h exposure to chemical in the test medium. For the migration endpoints, cells
were seeded and incubated in presence of 'seeding stoppers' to prevent cell migration and growth into the
detection zone. At the beginning of chemical exposure, stoppers were removed, and growth medium was
replaced with test medium. In the case of the stopper control wells, stoppers remained in place following
replacement of growth medium with test medium. Following 72 -h exposure to the test medium, cells were
stained at 37C for 30-60 min with calcein-AM. Cell viability in the detection zone was quantitated using a
Flexstation3 microplate reader (ex494 nm/em 517 nm). Corresponding cell titer endpoints were assessed for
the hNP and hNC cells using the Promega CellTiter Aqueous One Solution Cell Proliferation Assay (Cat no. G3581;
CellTiter 96). Finally, to gain insight into the mechanisms by which cells migrate into the detection zone, Ki-67
expression was quantified for 10 additional chemicals in the hNP and hNC systems. Additionally, cytochalasin D
was used as a positive control to inhibit cell migration. Supplementing the AB2 Basal Medium: 1.
Decontaminate the external surfaces of all supplement vials and the medium bottle with ethanol or isopropanol.
2. Aseptically open each supplement vial and add the amount indicated below to the basal medium with a
pipette. To make 100 ml of complete medium: AB2 Neural Medium 96 mL, ANS Supplement 2 mL, bFGF (50
ug/mL) 40 uL, LIF (10 ug/mL) 100 uL, L-Glutamine (200 mM) 1 mL, Penicillin (5,000 U/mL)/Streptomycin (5,000
Ug/mL) 1 mL. 3. Supplemented medium should be stored at 2-8C, protected from light. The complete medium
should be given a 2 week expiration date. Dispense the complete medium into aliquots to avoid repeated
heating prior to each use. Plate Coating Protocol for hNPl Neural Progenitor Expansion: To coat dishes perform
the following steps: 1. Thaw BD Matrigel at 2-8C overnight. Matrix will gel rapidly at 22C to 35C. Keep Matrigel
on ice and use pre-cooled pipettes, plates and tubes when preparing. Gelled Matrigel may be re-liquified if
placed at 2-8C on ice for 24 to 48 hours. 2. Handle using aseptic technique in a laminar flow hood. 3. Once BD
Matrigel Matrix is thawed, swirl vial to be sure that material is evenly dispersed. 4. Place thawed vial of BD
Matrigel Matrix in sterile area, decontaminate the external surfaces with ethanol or isopropanol and air dry. BD
Matrigel Matrix may be gently pipetted using a pre-cooled pipette to ensure homogeneity. 5. Dilute Matrigel
1:200 with cooled Dulbecco's Modified Eagle's Medium. Keep on ice. 6. Add 2 mL diluted Matrigel to a 35-mm
dish. Swirl to ensure the entire surface of the 35-mm dish is covered with the Matrigel solution. 7. Place dishes
-------
at 2-8Cfor 1-3 hours. 8. Rinse thoroughly with PBS. 9. Remove PBS and use immediately. Cell Thawing Protocol
for hNPl Neural Progenitor Expansion: To plate the cells perform the following steps: 1. Do not thaw the cells
until the recommended medium and appropriately coated plasticware and/or glassware are on hand. 2. Remove
the vial from liquid nitrogen and incubate in a 37C water bath. Closely monitor until the cells are completely
thawed. Maximum cell viability is dependent on the rapid and complete thawing of frozen cells. IMPORTANT:
Do not vortex the cells. Breaking cells down to single cell suspensions will significantly increase cell death. 3. As
soon as the cells are completely thawed, disinfect the outside of the vial with 70% ethanol or isopropanol.
Proceed immediately to the next step. 4. In a laminar flow hood, use a 1 or 2 mL pipette to transfer the cells to
a sterile 15 mL conical tube. Be careful to not introduce any bubbles during the transfer process. 5. Using a 10
mL pipette, slowly add dropwise 9 mL of fully supplemented AB2 Neural Medium (pre-warmed to 37C) to the
15 mL conical tube. IMPORTANT: Do not add the whole volume of medium at once to the cells. This may result
in decreased cell viability due to osmotic shock. 6. Gently mix the cell suspension by slow pipetting up and down
twice. Be careful to not introduce any bubbles. IMPORTANT: Do not vortex the cells. Breaking cells down to
single cell suspensions will significantly increase cell death. 7. Centrifuge the tube at room temperature at 200
x g for 4 minutes to pellet the cells. 8. Aspirate as much of the supernatant as possible. Steps 4-8 are necessary
to remove residual cryopreservative (DMSO). 9. Resuspend the cells in a total volume of 2 mL of fully
supplemented AB2 Neural Medium (pre-warmed to 37C). 10. Plate the 2 mL cell suspension of hNPl cells onto
a Matrigel-coated 35 mm dish. 11. Incubate the cells at 37C in a 5% C02 humidified incubator. 12. Exchange the
medium with fresh fully supplemented AB2 Neural Medium 24 hours post plating. Exchange with fresh medium
every other day thereafter. Use caution not to dislodge the cells; do not pipette media directly onto the cells
but rather onto the side of the culture dish. 13. Once the hNPl cells reach 100% confluence, they can be
dissociated manually for passaging (e.g., by cell scraping or by gentle and slow pipetting up and down to detach
the cells). The cells should be maintained at a high density at all times - the recommended passaging ratio is
1:2. Subculture of hNPl Cells: 1. Once the hNPl cells reach 100% confluence, carefully remove the medium
from the 35 mm dish. 2. Apply 2 mL fully supplemented AB2 Neural Medium (pre-warmed to 37C) to the cells
so that the cells can be harvested in fresh medium. 3. Using a pipette, manually detach the cells from the dish
by slow pipetting up and down the dish. Be careful to avoid introducing any bubbles. We recommend using a
200 uL or 1000 uL manual pipette to dislodge the attached cells. Alternatively, cells can be dislodged with a
sterile cell scraper. IMPORTANT: We do NOT recommend enzymatic methods for passaging the hNPl cells.
Doing so reduces the long term viability of the cells and can cause karyotypic abnormalities. 4. Plates should be
observed to ensure that all cells have been removed. This is most easily accomplished by working under a
dissection microscope within a laminar flow hood, but can also be achieved by frequent observation under a
bright field or phase contrast microscope. 5. Transfer the dissociated cells to a 50 mL conical tube. Inspect the
plate to ensure that all the cells have been removed. 6. If necessary, count the cells and calculate the cell
concentration. Cells can be centrifuged at 200 x g for 4 minutes in order to concentrate the cell suspension for
higher plating densities. 7. Plate the cells at the desired density into the appropriately coated flasks, plates or
wells in fully supplemented AB2 Neural Medium. We recommend keeping the cells at a high cell density by
passaging 1:2. 8. Incubate the cells at 37C in a 5% C02 humidified incubator. 9. Exchange the medium with fresh
fully supplemented AB2 Neural Medium 24 hours post plating. Exchange with fresh medium every other day
thereafter. Use caution not to dislodge the cells; do not pipette media directly onto the cells but rather onto
the side of the culture dish. Plate Coating Protocol for Cell Migration Assay: 1. Thaw BD Matrigel at 2-8C
overnight. Since it will gel rapidly at 22C to 35C, keep Matrigel on ice and use pre-cooled pipettes, plates and
tubes when preparing. Gelled Matrigel may re-liquefy if placed at 2-8C on ice for 24 to 48 hours. 2. Handle using
aseptic technique in a laminar flow hood. 3. Once the Matrigel is thawed, swirl vial to be sure that material is
evenly dispersed. 4. Place thawed vial of Matrigel in sterile area, decontaminate the external surfaces with
ethanol or isopropanol and air dry. Matrigel may be gently pipetted using a pre-cooled pipette to ensure
homogeneity. 5. Dilute Matrigel 1:200 with cooled AB2 Neural Culture Medium. Prepare 1 mL diluted Matrigel
for each column (8 wells) to be used. Keep on ice. 6. Add 100 uL of diluted Matrigel to each well intended for
use in the 96 well plate. 7. Tap the plate gently to ensure the entire surface of the well is covered with diluted
Matrigel. 8. Place dishes at 2-8C for 1-3 hours. 9. Remove the residual coating solution and rinse each well twice
with 200 uL of PBS per well. 10. Remove PBS and insert the Oris Cell Seeding Stoppers into the coated wells of
the 96-well plate. 11. Visually inspect to ensure that the Oris Cell Seeding Stoppers are firmly sealed. Cell
Migration Assay Protocol: 1. Harvest cells as described in steps 1-5 of section Subculture of hNPl Neural
-------
Progenitor cells. 2. Count cells and adjust cell suspension volume to the following concentration: 600,000
cells/mL 3. Plate 100 uL of suspended cells into each stoppered well for a cell density of 60,000 cells per well. 4.
Incubate the cells at 37C in a 5% C02 humidified incubator overnight (16-24 hours) to permit cell attachment.
5. Using the Oris Stopper Tool, remove all stoppers, except for those in "no migration controls" which will remain
in place until time of staining. 6. Carefully remove the seeding media from the wells and add 200 uL medium
containing the test compound per well. 7. Briefly examine the wells by phase contrast microscopy to ensure
continued adherence of the cells. 8. Incubate the cells at 37C/5% C02 for 72 hours to permit cell migration. 9.
After 72 hours, mix 5 uL Calcein AM, 5 uL Hoechst 33342, and 10 mL phenol red-free Neurobasal medium with
0.1% BSA. 10. Carefully remove stoppers from the "no migration controls". 11. Carefully remove the test
medium from all wells and add 100 uL of diluted Calcein/Hoechst solution to each well. 12. Incubate plate at
37C/5% C02for 30- 60 minutes with the lid on and in the dark (the darkness of a standard incubator will suffice).
13. For use with a fluorescence microplate reader, attach the Oris Detection Mask and read promptly for Calcein
fluorescence (ex 494 nm/ em 517 nm). 14. For image analysis, photomicrograph wells using epifluorescence
illumination with or without the Oris Detection mask. Images can then be analyzed using either area closure
with the calcein stain or number of cells (nuclei) using the Hoechst stain. ImageJ freeware available from the
NIH (http://rsbweb.nih.gov/ij/) can be used for migration data analysis as percent area closure or cellular
enumeration
Baseline median absolute deviation for the assay (bmad): 0.127
Response cutoff threshold used to determine hit calls: 0.381
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: The ArunA migration assay measures growth and survival in human embryonic neuroprogenitor
(hNP) and human neural crest (hNC) cells by tracking the presence/absence of viable nuclei movement into a
defined circular area in each microplate well. These different measurements are assessed following 72 hour
incubations with test chemical to evaluate the potential to disrupt neural migration in developing human
embryos.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
5
Standard minimum concentration tested:
1.2 nM
Key positive control:
NA
Target (nominal) number of replicates:
4
Standard maximum concentration tested:
100 nM
Neutral vehicle control:
DMSO
NA
3.
Additionally, this assay was annotated to the intended target family of cell cycle.
Data Interpretation
-------
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: The migration of neuroprogenitor and neural crest cells into the detection zone was assessed by
comparing the percent migration of proliferative Ki-67 cells to total migrating cells following 72 hours exposure.
This was accomplished by determining the percentage of total cells migrating into the detection zone, i.e. the
migration index (Ml), compared to the percentage of migrating cells that expressed the Ki-67 proliferative
marker within the detection zone, i.e. the proliferative index (PI). Normalized response values for each assay
endpoint were calculated as resp = 100 x (rval-bval) / (pval-bval) where rval, bval, and pval correspond to the
raw value, the plate level DMSO control median, and the plate level positive/negative control median,
respectively. In the parallel viability assessment, normalized response was calculated as resp = log2(rval/bval).
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
7: resp.log2 (Transform the response values to log-scale (base 2).), 9: resp.fc (Calculate the normalized
response (resp) as the fold change, i.e. the ratio of the corrected (cval) and baseline (bval) values; resp =
cval/bal.), 11: bval.apid.nwlls.med (Calculate the baseline value (bval) as the plate-wise median, by assay
plate ID (apid), of the corrected values (cval) for neutral control wells (wilt = n).)
Level 4: Baseline and required tcplFit2 parameters defined by:
2: bmad.aeid.lowconc.nwells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (resp) for neutral control wells (wilt = n). Calculate one
standard deviation of the normalized response for neutral control wells (wilt = n); onesd = sqrt(sum((resp
- mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
1: bmad3 (Add a cutoff value of 3 multiplied by the baseline median absolute deviation (bmad) as defined
at Level 4.), 27: ow_bidirectional_loss (Multiply winning model hitcall (hitc) by -1 for models fit in the
positive analysis direction. Typically used for endpoints where only negative responses are biologically
relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
-------
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 19: viability.gnls (Flag series with
an active hit call (hitc >= 0.9) if denoted as cell viability assay with winning model is gain-loss (gnls); if hitc
>= 0.9, modl=="gnls" and cell_viability_assay == 1, then flag.), 20: no.med.gt.3bmad (Flag series where
no median response values are greater than baseline as defined by 3 times the baseline median absolute
deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg is the
number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 63 Number of chemicals tested: 58
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
14
Inactive hit count: 0
-------
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed
0.248
Neutral control median absolute deviation, by plate: nmad
0.021
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100
9.44%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed
Positive control well median absolute deviation, by plate: pmad
NA
NA
Z Prime Factor for median positive and neutral control across all plates:
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells:
((pmed - nmed) /sqrtfpmad2 + nmad2)
NA
NA
-------
Positive control signal-to-noise: ((pmed-nmed)/nmad)
NA
Positive control signal-to-background: (pmed/nmed)
NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed
NA
Negative control well median absolute deviation value, by plate: mmad
NA
Z Prime Factor for median negative and neutral control across all plates:
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
NA
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells:
((mmed - nmed) /sqrtfmmad2 + nmad2)
NA
Signal-to-noise (median across all plates, using negative control wells):
((mmed-nmed)/nmad)
NA
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 5.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
-------
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Zurlinden TJ, Saili KS, Baker NC, Toimela T, Heinonen T, Knudsen TB. A cross-platform approach to
characterize and screen potential neurovascular unit toxicants. Reprod Toxicol. 2020 Jun 24;96:300-315. doi:
10.1016/j.reprotox.2020.06.010. Epub ahead of print. PMID: 32590145.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www,epa,gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1827
Aru n A_M igration_h N P
1. General Information
1.1 Assay Title: ArunA Biomedical's Oris Neuroprogenitor (hNP) Cell Migration Assay
1.2 Assay Summary: ArunA_Migration_hNP is a cell-based, single-readout assay that uses human H9-derived
neuroprogenitor stem cells (hNPl). Measurements were taken 72 hours after chemical dosing in a 96-well plate.
ArunA_Migration_hNP is an assay component measured from the ArunA_Migration_hNP assay. It is designed
to make measurements of cell migration, a form of distribution reporter, as detected with fluorescence
intensity signals by HCS Fluorescent Imaging technology. Data from the assay component
ArunA_Migration_hNP was analyzed at the endpoint, ArunA_Migration_hNP, in the positive analysis fitting
direction relative to DMSO as the negative control and baseline of activity. Using a type of distribution reporter,
loss-of-signal activity can be used to understand the cell migration. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the neurodevelopment intended target family, where the
subfamily is migration.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: ArunA Biomedical is a privately owned biotechnology company and Contract Research
Organization (CRO) formerly providing toxicology screening using neural stem cell-based assays.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: This protocol describes the use of ArunA Biomedical's hNPl Neural Progenitor Cells in
conjunction with an Oris Cell Migration Assembly Kit- FLEX to measure the effect of neuroactive compounds and
biologies that modulate proliferation and migration of neural progenitor cells. Certain uses of these products
may be covered by U.S. Pat. No. 6,200,806; No. 7,531,354,B2 licensed to ARUNA and U.S. Pat. No. 7,842,499;
No. 7,018,838; No. 10/597,118; No. 11/342,413; No. 11/890,740; and No. 12/195,007 licensed to PLATYPUS.
1.9 Assay Throughput: 96-well plate. ArunA systems offer high throughput chemical screening in a 96-well format
for the human neuroprogenitor (hNP) and human neural crest (hNC) migration and cell titer endpoints.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes in fluorescence intensity related to Ki-67 expression is indicative of the cell migration.
Chemical-induced perturbations to cellular key events across neurogenic outcomes, including migration
(neuroprogenitor and neural crest cells) and neural network formation (neurite length, neurite length, and
branch points for neurites), in vitro can inform on cell-based prioritization of neurodevelopmental hazard
potential.
2.2 Scientific Principles: During the development of the nervous systems, many processes occur to give rise to a
functional and healthy neural network and hence nervous system. These important neurodevelopmental
processes may be disrupted by potential toxicants, resulting in developmental neurotoxicity. Among these
processes is migration of neuroprogenitors and neural crest cells (NCCs). Impaired neuroprogenitor and NCC
-------
migration can lead to cerebral malformations and neurodevelopmental disorders, such as diencephalic-
mesencephalic dysplasia syndrome, cerebral palsy, cerebellar ataxia, and microcephaly.
2.3 Experimental System: adherent hNPl cell line used. The hN2 cell line is derived from neuroepithelial cells of
WA09 hESC (Thomson et al., 1998) origin according to a previously described protocol (Shin et al., 2005, Shin et
al., 2006). Importantly, as opposed to other methods of deriving neural progenitors through three-dimensional
neurosphere and embryoid body formations (Reubinoff et al., 2001, Zhang et al., 2001), these adherent
monolayer cultures are uniformly exposed to growth factors and/or morphogens throughout their propagation.
Neurogenic lineages from human embryonic stem cell line WA09 were locked into three neural differentiation
states: neuroprogenitor (hNPl - Cat no. 7009), neural crest (hNC - Cat no. 7029), and neural network (hNN -
Cat no. hNJL7014). Prior to differentiation into hN2 cells the population was confirmed karyotypically normal,
>95% nestin positive and <3% OCT-4 positive (Shin et al., 2006). The cells were produced in bulk by propagation
for an additional 2 weeks beyond the neuroepithelial stage by removal of bFGF from the media and
cryopreserved (ArunA Biomedical, Athens, GA) for end user applications. The hNP and hNC cell endpoints
consisted of cell titer and migratory measurements whereas hNN cell endpoints consisted of neuron count and
three neurite-specific metrics to assess network formation: neurite length, neurites per neuron, and branch
points for neurites. For this study, ArunA Bio extended the differentiation period of the hNN cells by
approximately two weeks more than in the original hN2 protocol. This allowed for increased neural network cell
axonation leading to better quantitation of network-specific endpoints. The utility of dissociated hN2 cultures
as an in vitro model for neurite outgrowth was assessed using automated high-content image analysis (HCA). In
addition, the molecular phenotype of these cells was examined using immunocytochemical staining.
2.4 Metabolic Competence: H9-derived cells are locked at different neuronal developmental states of interest to
DNT investigations of chemical exposures. Xenobiotic biotransformation potential has not been characterized.
2.5 Exposure Regime: To assess the hNP and hNC migration and cell titer endpoints, 60,000 cells per well were
plated onto Matrigel in basal growth medium with LIF and bFGF in a 96-well plate format. Plates were incubated
for 16 h at 37C followed by a 72 h exposure to chemical in the test medium. For the migration endpoints, cells
were seeded and incubated in presence of 'seeding stoppers' to prevent cell migration and growth into the
detection zone. At the beginning of chemical exposure, stoppers were removed, and growth medium was
replaced with test medium. In the case of the stopper control wells, stoppers remained in place following
replacement of growth medium with test medium. Following 72 -h exposure to the test medium, cells were
stained at 37C for 30-60 min with calcein-AM. Cell viability in the detection zone was quantitated using a
Flexstation3 microplate reader (ex494 nm/em 517 nm). Corresponding cell titer endpoints were assessed for
the hNP and hNC cells using the Promega CellTiter Aqueous One Solution Cell Proliferation Assay (Cat no. G3581;
CellTiter 96). Finally, to gain insight into the mechanisms by which cells migrate into the detection zone, Ki-67
expression was quantified for 10 additional chemicals in the hNP and hNC systems. Additionally, cytochalasin D
was used as a positive control to inhibit cell migration. Supplementing the AB2 Basal Medium: 1.
Decontaminate the external surfaces of all supplement vials and the medium bottle with ethanol or isopropanol.
2. Aseptically open each supplement vial and add the amount indicated below to the basal medium with a
pipette. To make 100 ml of complete medium: AB2 Neural Medium 96 mL, ANS Supplement 2 mL, bFGF (50
ug/mL) 40 uL, LIF (10 ug/mL) 100 uL, L-Glutamine (200 mM) 1 mL, Penicillin (5,000 U/mL)/Streptomycin (5,000
Ug/mL) 1 mL. 3. Supplemented medium should be stored at 2-8C, protected from light. The complete medium
should be given a 2 week expiration date. Dispense the complete medium into aliquots to avoid repeated
heating prior to each use. Plate Coating Protocol for hNPl Neural Progenitor Expansion: To coat dishes perform
the following steps: 1. Thaw BD Matrigel at 2-8C overnight. Matrix will gel rapidly at 22C to 35C. Keep Matrigel
on ice and use pre-cooled pipettes, plates and tubes when preparing. Gelled Matrigel may be re-liquified if
placed at 2-8C on ice for 24 to 48 hours. 2. Handle using aseptic technique in a laminar flow hood. 3. Once BD
Matrigel Matrix is thawed, swirl vial to be sure that material is evenly dispersed. 4. Place thawed vial of BD
Matrigel Matrix in sterile area, decontaminate the external surfaces with ethanol or isopropanol and air dry. BD
Matrigel Matrix may be gently pipetted using a pre-cooled pipette to ensure homogeneity. 5. Dilute Matrigel
1:200 with cooled Dulbecco's Modified Eagle's Medium. Keep on ice. 6. Add 2 mL diluted Matrigel to a 35-mm
dish. Swirl to ensure the entire surface of the 35-mm dish is covered with the Matrigel solution. 7. Place dishes
-------
at 2-8Cfor 1-3 hours. 8. Rinse thoroughly with PBS. 9. Remove PBS and use immediately. Cell Thawing Protocol
for hNPl Neural Progenitor Expansion: To plate the cells perform the following steps: 1. Do not thaw the cells
until the recommended medium and appropriately coated plasticware and/or glassware are on hand. 2. Remove
the vial from liquid nitrogen and incubate in a 37C water bath. Closely monitor until the cells are completely
thawed. Maximum cell viability is dependent on the rapid and complete thawing of frozen cells. IMPORTANT:
Do not vortex the cells. Breaking cells down to single cell suspensions will significantly increase cell death. 3. As
soon as the cells are completely thawed, disinfect the outside of the vial with 70% ethanol or isopropanol.
Proceed immediately to the next step. 4. In a laminar flow hood, use a 1 or 2 mL pipette to transfer the cells to
a sterile 15 mL conical tube. Be careful to not introduce any bubbles during the transfer process. 5. Using a 10
mL pipette, slowly add dropwise 9 mL of fully supplemented AB2 Neural Medium (pre-warmed to 37C) to the
15 mL conical tube. IMPORTANT: Do not add the whole volume of medium at once to the cells. This may result
in decreased cell viability due to osmotic shock. 6. Gently mix the cell suspension by slow pipetting up and down
twice. Be careful to not introduce any bubbles. IMPORTANT: Do not vortex the cells. Breaking cells down to
single cell suspensions will significantly increase cell death. 7. Centrifuge the tube at room temperature at 200
x g for 4 minutes to pellet the cells. 8. Aspirate as much of the supernatant as possible. Steps 4-8 are necessary
to remove residual cryopreservative (DMSO). 9. Resuspend the cells in a total volume of 2 mL of fully
supplemented AB2 Neural Medium (pre-warmed to 37C). 10. Plate the 2 mL cell suspension of hNPl cells onto
a Matrigel-coated 35 mm dish. 11. Incubate the cells at 37C in a 5% C02 humidified incubator. 12. Exchange the
medium with fresh fully supplemented AB2 Neural Medium 24 hours post plating. Exchange with fresh medium
every other day thereafter. Use caution not to dislodge the cells; do not pipette media directly onto the cells
but rather onto the side of the culture dish. 13. Once the hNPl cells reach 100% confluence, they can be
dissociated manually for passaging (e.g., by cell scraping or by gentle and slow pipetting up and down to detach
the cells). The cells should be maintained at a high density at all times - the recommended passaging ratio is
1:2. Subculture of hNPl Cells: 1. Once the hNPl cells reach 100% confluence, carefully remove the medium
from the 35 mm dish. 2. Apply 2 mL fully supplemented AB2 Neural Medium (pre-warmed to 37C) to the cells
so that the cells can be harvested in fresh medium. 3. Using a pipette, manually detach the cells from the dish
by slow pipetting up and down the dish. Be careful to avoid introducing any bubbles. We recommend using a
200 uL or 1000 uL manual pipette to dislodge the attached cells. Alternatively, cells can be dislodged with a
sterile cell scraper. IMPORTANT: We do NOT recommend enzymatic methods for passaging the hNPl cells.
Doing so reduces the long term viability of the cells and can cause karyotypic abnormalities. 4. Plates should be
observed to ensure that all cells have been removed. This is most easily accomplished by working under a
dissection microscope within a laminar flow hood, but can also be achieved by frequent observation under a
bright field or phase contrast microscope. 5. Transfer the dissociated cells to a 50 mL conical tube. Inspect the
plate to ensure that all the cells have been removed. 6. If necessary, count the cells and calculate the cell
concentration. Cells can be centrifuged at 200 x g for 4 minutes in order to concentrate the cell suspension for
higher plating densities. 7. Plate the cells at the desired density into the appropriately coated flasks, plates or
wells in fully supplemented AB2 Neural Medium. We recommend keeping the cells at a high cell density by
passaging 1:2. 8. Incubate the cells at 37C in a 5% C02 humidified incubator. 9. Exchange the medium with fresh
fully supplemented AB2 Neural Medium 24 hours post plating. Exchange with fresh medium every other day
thereafter. Use caution not to dislodge the cells; do not pipette media directly onto the cells but rather onto
the side of the culture dish. Plate Coating Protocol for Cell Migration Assay: 1. Thaw BD Matrigel at 2-8C
overnight. Since it will gel rapidly at 22C to 35C, keep Matrigel on ice and use pre-cooled pipettes, plates and
tubes when preparing. Gelled Matrigel may re-liquefy if placed at 2-8C on ice for 24 to 48 hours. 2. Handle using
aseptic technique in a laminar flow hood. 3. Once the Matrigel is thawed, swirl vial to be sure that material is
evenly dispersed. 4. Place thawed vial of Matrigel in sterile area, decontaminate the external surfaces with
ethanol or isopropanol and air dry. Matrigel may be gently pipetted using a pre-cooled pipette to ensure
homogeneity. 5. Dilute Matrigel 1:200 with cooled AB2 Neural Culture Medium. Prepare 1 mL diluted Matrigel
for each column (8 wells) to be used. Keep on ice. 6. Add 100 uL of diluted Matrigel to each well intended for
use in the 96 well plate. 7. Tap the plate gently to ensure the entire surface of the well is covered with diluted
Matrigel. 8. Place dishes at 2-8C for 1-3 hours. 9. Remove the residual coating solution and rinse each well twice
with 200 uL of PBS per well. 10. Remove PBS and insert the Oris Cell Seeding Stoppers into the coated wells of
the 96-well plate. 11. Visually inspect to ensure that the Oris Cell Seeding Stoppers are firmly sealed. Cell
Migration Assay Protocol: 1. Harvest cells as described in steps 1-5 of section Subculture of hNPl Neural
-------
Progenitor cells. 2. Count cells and adjust cell suspension volume to the following concentration: 600,000
cells/mL 3. Plate 100 uL of suspended cells into each stoppered well for a cell density of 60,000 cells per well. 4.
Incubate the cells at 37C in a 5% C02 humidified incubator overnight (16-24 hours) to permit cell attachment.
5. Using the Oris Stopper Tool, remove all stoppers, except for those in "no migration controls" which will remain
in place until time of staining. 6. Carefully remove the seeding media from the wells and add 200 uL medium
containing the test compound per well. 7. Briefly examine the wells by phase contrast microscopy to ensure
continued adherence of the cells. 8. Incubate the cells at 37C/5% C02 for 72 hours to permit cell migration. 9.
After 72 hours, mix 5 uL Calcein AM, 5 uL Hoechst 33342, and 10 mL phenol red-free Neurobasal medium with
0.1% BSA. 10. Carefully remove stoppers from the "no migration controls". 11. Carefully remove the test
medium from all wells and add 100 uL of diluted Calcein/Hoechst solution to each well. 12. Incubate plate at
37C/5% C02for 30- 60 minutes with the lid on and in the dark (the darkness of a standard incubator will suffice).
13. For use with a fluorescence microplate reader, attach the Oris Detection Mask and read promptly for Calcein
fluorescence (ex 494 nm/ em 517 nm). 14. For image analysis, photomicrograph wells using epifluorescence
illumination with or without the Oris Detection mask. Images can then be analyzed using either area closure
with the calcein stain or number of cells (nuclei) using the Hoechst stain. ImageJ freeware available from the
NIH (http://rsbweb.nih.gov/ij/) can be used for migration data analysis as percent area closure or cellular
enumeration
Baseline median absolute deviation for the assay (bmad): 9.289
Response cutoff threshold used to determine hit calls: 27.866
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: The ArunA migration assay measures growth and survival in human embryonic neuroprogenitor
(hNP) and human neural crest (hNC) cells by tracking the presence/absence of viable nuclei movement into a
defined circular area in each microplate well. These different measurements are assessed following 72 hour
incubations with test chemical to evaluate the potential to disrupt neural migration in developing human
embryos.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
5
Standard minimum concentration tested:
1.2 nM
Key positive control:
cytochalasin D
Target (nominal) number of replicates:
3
Standard maximum concentration tested:
100 nM
Neutral vehicle control:
DMSO
NA
3.
Additionally, this assay was annotated to the intended target family of neurodevelopment.
Data Interpretation
-------
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: The migration of neuroprogenitor and neural crest cells into the detection zone was assessed by
comparing the percent migration of proliferative Ki-67 cells to total migrating cells following 72 hours exposure.
This was accomplished by determining the percentage of total cells migrating into the detection zone, i.e. the
migration index (Ml), compared to the percentage of migrating cells that expressed the Ki-67 proliferative
marker within the detection zone, i.e. the proliferative index (PI). Normalized response values for each assay
endpoint were calculated as resp = 100 x (rval-bval) / (pval-bval) where rval, bval, and pval correspond to the
raw value, the plate level DMSO control median, and the plate level positive/negative control median,
respectively. In the parallel viability assessment, normalized response was calculated as resp = log2(rval/bval).
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
5: resp.pc (Calculate the normalized response (resp) as a percent of control, i.e. the ratio of the difference
between the corrected (cval) and baseline (bval) values divided the difference between the positive
control (pval) and baseline (bval) values multiplied by 100; resp = (cval-bval)/(pval-bval)*100.), 6:
resp.multnegl (Multiply the normalized response value (resp) by -1; -l*resp.), 11: bval.apid.nwlls.med
(Calculate the baseline value (bval) as the plate-wise median, by assay plate ID (apid), of the corrected
values (cval) for neutral control wells (wilt = n).), 15: pval.apid.medncbyconc.min (Calculate the positive
control value (pval) as the plate-wise minimum, by assay plate ID (apid), of the medians of the corrected
values (cval) for gain-of-signal single- or multiple-concentration negative control wells (wilt = m or o) by
apid, well type, and concentration.)
Level 4: Baseline and required tcplFit2 parameters defined by:
2: bmad.aeid.lowconc.nwells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (resp) for neutral control wells (wilt = n). Calculate one
standard deviation of the normalized response for neutral control wells (wilt = n); onesd = sqrt(sum((resp
- mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
1: bmad3 (Add a cutoff value of 3 multiplied by the baseline median absolute deviation (bmad) as defined
at Level 4.), 27: ow_bidirectional_loss (Multiply winning model hitcall (hitc) by -1 for models fit in the
positive analysis direction. Typically used for endpoints where only negative responses are biologically
relevant.)
Level 6: Cautionary flagging include:
-------
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 63 Number of chemicals tested: 58
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
21
Inactive hit count: Oihitc 0.9
30
WINING MODEL SELECTION
NA hit count: hitc^O
12
Number of sample-assay endpoints with winning hill model:
6
4
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
3
22
quadratic-polynomialfpoly2) model: 5
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
0
1
18
-------
exponentials (exp5) model:
4
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed
7708
Neutral control median absolute deviation, by plate: nmad
664.946
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100
8.45%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed
Positive control well median absolute deviation, by plate: pmad
NA
NA
-------
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - rimed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed 514
Negative control well median absolute deviation value, by plate: mmad 237.957
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: -10.216
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 4.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5.
5.1
Potential Regulatory Applications
Context of Use: Examples of end use scenarios could include, but are not limited to:
-------
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Zurlinden TJ, Saili KS, Baker NC, Toimela T, Heinonen T, Knudsen TB. A cross-platform approach to
characterize and screen potential neurovascular unit toxicants. Reprod Toxicol. 2020 Jun 24;96:300-315. doi:
10.1016/j.reprotox.2020.06.010. Epub ahead of print. PMID: 32590145.
7. Supporting Information:
More information on the ToxCast program can be found at: httpsi//www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1829
Aru n A_M igration_h N C
1. General Information
1.1 Assay Title: ArunA Biomedical's Oris Neural Crest (hNC) Cell Migration Assay
1.2 Assay Summary: ArunA_Migration_hNC is a cell-based, single-readout assay that uses human H9-derived
embryonic neural crest stem cells (hNC). Measurements were taken 72 hours after chemical dosing in a 96-well
plate. ArunA_Migration_hNC is an assay component measured from the ArunA_Migration_hNC assay. It is
designed to make measurements of cell migration, a form of distribution reporter, as detected with
fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the assay component
ArunA_Migration_hNC was analyzed at the endpoint, ArunA_Migration_hNC, in the positive analysis fitting
direction relative to DMSO as the negative control and baseline of activity. Using a type of distribution reporter,
loss-of-signal activity can be used to understand the cell migration. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the neurodevelopment intended target family, where the
subfamily is migration.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: ArunA Biomedical is a privately owned biotechnology company and Contract Research
Organization (CRO) formerly providing toxicology screening using neural stem cell-based assays.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: This protocol describes the use of ArunA Biomedical's hNPl Neural Progenitor Cells in
conjunction with an Oris Cell Migration Assembly Kit- FLEX to measure the effect of neuroactive compounds and
biologies that modulate proliferation and migration of neural progenitor cells. Certain uses of these products
may be covered by U.S. Pat. No. 6,200,806; No. 7,531,354,B2 licensed to ARUNA and U.S. Pat. No. 7,842,499;
No. 7,018,838; No. 10/597,118; No. 11/342,413; No. 11/890,740; and No. 12/195,007 licensed to PLATYPUS.
1.9 Assay Throughput: 96-well plate. ArunA systems offer high throughput chemical screening in a 96-well format
for the human neuroprogenitor (hNP) and human neural crest (hNC) migration and cell titer endpoints.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes in fluorescence intensity related to Ki-67 expression is indicative of the cell migration.
Chemical-induced perturbations to cellular key events across neurogenic outcomes, including migration
(neuroprogenitor and neural crest cells) and neural network formation (neurite length, neurite length, and
branch points for neurites), in vitro can inform on cell-based prioritization of neurodevelopmental hazard
potential.
2.2 Scientific Principles: During the development of the nervous systems, many processes occur to give rise to a
functional and healthy neural network and hence nervous system. These important neurodevelopmental
processes may be disrupted by potential toxicants, resulting in developmental neurotoxicity. Among these
processes is migration of neuroprogenitors and neural crest cells (NCCs). Impaired neuroprogenitor and NCC
-------
migration can lead to cerebral malformations and neurodevelopmental disorders, such as diencephalic-
mesencephalic dysplasia syndrome, cerebral palsy, cerebellar ataxia, and microcephaly.
2.3 Experimental System: adherent hNC cell line used. The hN2 cell line is derived from neuroepithelial cells of
WA09 hESC (Thomson et al., 1998) origin according to a previously described protocol (Shin et al., 2005, Shin et
al., 2006). Importantly, as opposed to other methods of deriving neural progenitors through three-dimensional
neurosphere and embryoid body formations (Reubinoff et al., 2001, Zhang et al., 2001), these adherent
monolayer cultures are uniformly exposed to growth factors and/or morphogens throughout their propagation.
Neurogenic lineages from human embryonic stem cell line WA09 were locked into three neural differentiation
states: neuroprogenitor (hNPl - Cat no. 7009), neural crest (hNC - Cat no. 7029), and neural network (hNN -
Cat no. hNJL7014). Prior to differentiation into hN2 cells the population was confirmed karyotypically normal,
>95% nestin positive and <3% OCT-4 positive (Shin et al., 2006). The cells were produced in bulk by propagation
for an additional 2 weeks beyond the neuroepithelial stage by removal of bFGF from the media and
cryopreserved (ArunA Biomedical, Athens, GA) for end user applications. The hNP and hNC cell endpoints
consisted of cell titer and migratory measurements whereas hNN cell endpoints consisted of neuron count and
three neurite-specific metrics to assess network formation: neurite length, neurites per neuron, and branch
points for neurites. For this study, ArunA Bio extended the differentiation period of the hNN cells by
approximately two weeks more than in the original hN2 protocol. This allowed for increased neural network cell
axonation leading to better quantitation of network-specific endpoints. The utility of dissociated hN2 cultures
as an in vitro model for neurite outgrowth was assessed using automated high-content image analysis (HCA). In
addition, the molecular phenotype of these cells was examined using immunocytochemical staining.
2.4 Metabolic Competence: H9-derived cells are locked at different neuronal developmental states of interest to
DNT investigations of chemical exposures. Xenobiotic biotransformation potential has not been characterized.
2.5 Exposure Regime: To assess the hNP and hNC migration and cell titer endpoints, 60,000 cells per well were
plated onto Matrigel in basal growth medium with LIF and bFGF in a 96-well plate format. Plates were incubated
for 16 h at 37C followed by a 72 h exposure to chemical in the test medium. For the migration endpoints, cells
were seeded and incubated in presence of 'seeding stoppers' to prevent cell migration and growth into the
detection zone. At the beginning of chemical exposure, stoppers were removed, and growth medium was
replaced with test medium. In the case of the stopper control wells, stoppers remained in place following
replacement of growth medium with test medium. Following 72 -h exposure to the test medium, cells were
stained at 37C for 30-60 min with calcein-AM. Cell viability in the detection zone was quantitated using a
Flexstation3 microplate reader (ex494 nm/em 517 nm). Corresponding cell titer endpoints were assessed for
the hNP and hNC cells using the Promega CellTiter Aqueous One Solution Cell Proliferation Assay (Cat no. G3581;
CellTiter 96). Finally, to gain insight into the mechanisms by which cells migrate into the detection zone, Ki-67
expression was quantified for 10 additional chemicals in the hNP and hNC systems. Additionally, cytochalasin D
was used as a positive control to inhibit cell migration. Supplementing the AB2 Basal Medium: 1.
Decontaminate the external surfaces of all supplement vials and the medium bottle with ethanol or isopropanol.
2. Aseptically open each supplement vial and add the amount indicated below to the basal medium with a
pipette. To make 100 ml of complete medium: AB2 Neural Medium 96 mL, ANS Supplement 2 mL, bFGF (50
ug/mL) 40 uL, LIF (10 ug/mL) 100 uL, L-Glutamine (200 mM) 1 mL, Penicillin (5,000 U/mL)/Streptomycin (5,000
Ug/mL) 1 mL. 3. Supplemented medium should be stored at 2-8C, protected from light. The complete medium
should be given a 2 week expiration date. Dispense the complete medium into aliquots to avoid repeated
heating prior to each use. Plate Coating Protocol for hNPl Neural Progenitor Expansion: To coat dishes perform
the following steps: 1. Thaw BD Matrigel at 2-8C overnight. Matrix will gel rapidly at 22C to 35C. Keep Matrigel
on ice and use pre-cooled pipettes, plates and tubes when preparing. Gelled Matrigel may be re-liquified if
placed at 2-8C on ice for 24 to 48 hours. 2. Handle using aseptic technique in a laminar flow hood. 3. Once BD
Matrigel Matrix is thawed, swirl vial to be sure that material is evenly dispersed. 4. Place thawed vial of BD
Matrigel Matrix in sterile area, decontaminate the external surfaces with ethanol or isopropanol and air dry. BD
Matrigel Matrix may be gently pipetted using a pre-cooled pipette to ensure homogeneity. 5. Dilute Matrigel
1:200 with cooled Dulbecco's Modified Eagle's Medium. Keep on ice. 6. Add 2 mL diluted Matrigel to a 35-mm
dish. Swirl to ensure the entire surface of the 35-mm dish is covered with the Matrigel solution. 7. Place dishes
-------
at 2-8Cfor 1-3 hours. 8. Rinse thoroughly with PBS. 9. Remove PBS and use immediately. Cell Thawing Protocol
for hNPl Neural Progenitor Expansion: To plate the cells perform the following steps: 1. Do not thaw the cells
until the recommended medium and appropriately coated plasticware and/or glassware are on hand. 2. Remove
the vial from liquid nitrogen and incubate in a 37C water bath. Closely monitor until the cells are completely
thawed. Maximum cell viability is dependent on the rapid and complete thawing of frozen cells. IMPORTANT:
Do not vortex the cells. Breaking cells down to single cell suspensions will significantly increase cell death. 3. As
soon as the cells are completely thawed, disinfect the outside of the vial with 70% ethanol or isopropanol.
Proceed immediately to the next step. 4. In a laminar flow hood, use a 1 or 2 mL pipette to transfer the cells to
a sterile 15 mL conical tube. Be careful to not introduce any bubbles during the transfer process. 5. Using a 10
mL pipette, slowly add dropwise 9 mL of fully supplemented AB2 Neural Medium (pre-warmed to 37C) to the
15 mL conical tube. IMPORTANT: Do not add the whole volume of medium at once to the cells. This may result
in decreased cell viability due to osmotic shock. 6. Gently mix the cell suspension by slow pipetting up and down
twice. Be careful to not introduce any bubbles. IMPORTANT: Do not vortex the cells. Breaking cells down to
single cell suspensions will significantly increase cell death. 7. Centrifuge the tube at room temperature at 200
x g for 4 minutes to pellet the cells. 8. Aspirate as much of the supernatant as possible. Steps 4-8 are necessary
to remove residual cryopreservative (DMSO). 9. Resuspend the cells in a total volume of 2 mL of fully
supplemented AB2 Neural Medium (pre-warmed to 37C). 10. Plate the 2 mL cell suspension of hNPl cells onto
a Matrigel-coated 35 mm dish. 11. Incubate the cells at 37C in a 5% C02 humidified incubator. 12. Exchange the
medium with fresh fully supplemented AB2 Neural Medium 24 hours post plating. Exchange with fresh medium
every other day thereafter. Use caution not to dislodge the cells; do not pipette media directly onto the cells
but rather onto the side of the culture dish. 13. Once the hNPl cells reach 100% confluence, they can be
dissociated manually for passaging (e.g., by cell scraping or by gentle and slow pipetting up and down to detach
the cells). The cells should be maintained at a high density at all times - the recommended passaging ratio is
1:2. Subculture of hNPl Cells: 1. Once the hNPl cells reach 100% confluence, carefully remove the medium
from the 35 mm dish. 2. Apply 2 mL fully supplemented AB2 Neural Medium (pre-warmed to 37C) to the cells
so that the cells can be harvested in fresh medium. 3. Using a pipette, manually detach the cells from the dish
by slow pipetting up and down the dish. Be careful to avoid introducing any bubbles. We recommend using a
200 uL or 1000 uL manual pipette to dislodge the attached cells. Alternatively, cells can be dislodged with a
sterile cell scraper. IMPORTANT: We do NOT recommend enzymatic methods for passaging the hNPl cells.
Doing so reduces the long term viability of the cells and can cause karyotypic abnormalities. 4. Plates should be
observed to ensure that all cells have been removed. This is most easily accomplished by working under a
dissection microscope within a laminar flow hood, but can also be achieved by frequent observation under a
bright field or phase contrast microscope. 5. Transfer the dissociated cells to a 50 mL conical tube. Inspect the
plate to ensure that all the cells have been removed. 6. If necessary, count the cells and calculate the cell
concentration. Cells can be centrifuged at 200 x g for 4 minutes in order to concentrate the cell suspension for
higher plating densities. 7. Plate the cells at the desired density into the appropriately coated flasks, plates or
wells in fully supplemented AB2 Neural Medium. We recommend keeping the cells at a high cell density by
passaging 1:2. 8. Incubate the cells at 37C in a 5% C02 humidified incubator. 9. Exchange the medium with fresh
fully supplemented AB2 Neural Medium 24 hours post plating. Exchange with fresh medium every other day
thereafter. Use caution not to dislodge the cells; do not pipette media directly onto the cells but rather onto
the side of the culture dish. Plate Coating Protocol for Cell Migration Assay: 1. Thaw BD Matrigel at 2-8C
overnight. Since it will gel rapidly at 22C to 35C, keep Matrigel on ice and use pre-cooled pipettes, plates and
tubes when preparing. Gelled Matrigel may re-liquefy if placed at 2-8C on ice for 24 to 48 hours. 2. Handle using
aseptic technique in a laminar flow hood. 3. Once the Matrigel is thawed, swirl vial to be sure that material is
evenly dispersed. 4. Place thawed vial of Matrigel in sterile area, decontaminate the external surfaces with
ethanol or isopropanol and air dry. Matrigel may be gently pipetted using a pre-cooled pipette to ensure
homogeneity. 5. Dilute Matrigel 1:200 with cooled AB2 Neural Culture Medium. Prepare 1 mL diluted Matrigel
for each column (8 wells) to be used. Keep on ice. 6. Add 100 uL of diluted Matrigel to each well intended for
use in the 96 well plate. 7. Tap the plate gently to ensure the entire surface of the well is covered with diluted
Matrigel. 8. Place dishes at 2-8C for 1-3 hours. 9. Remove the residual coating solution and rinse each well twice
with 200 uL of PBS per well. 10. Remove PBS and insert the Oris Cell Seeding Stoppers into the coated wells of
the 96-well plate. 11. Visually inspect to ensure that the Oris Cell Seeding Stoppers are firmly sealed. Cell
Migration Assay Protocol: 1. Harvest cells as described in steps 1-5 of section Subculture of hNPl Neural
-------
Progenitor cells. 2. Count cells and adjust cell suspension volume to the following concentration: 600,000
cells/mL 3. Plate 100 uL of suspended cells into each stoppered well for a cell density of 60,000 cells per well. 4.
Incubate the cells at 37C in a 5% C02 humidified incubator overnight (16-24 hours) to permit cell attachment.
5. Using the Oris Stopper Tool, remove all stoppers, except for those in "no migration controls" which will remain
in place until time of staining. 6. Carefully remove the seeding media from the wells and add 200 uL medium
containing the test compound per well. 7. Briefly examine the wells by phase contrast microscopy to ensure
continued adherence of the cells. 8. Incubate the cells at 37C/5% C02 for 72 hours to permit cell migration. 9.
After 72 hours, mix 5 uL Calcein AM, 5 uL Hoechst 33342, and 10 mL phenol red-free Neurobasal medium with
0.1% BSA. 10. Carefully remove stoppers from the "no migration controls". 11. Carefully remove the test
medium from all wells and add 100 uL of diluted Calcein/Hoechst solution to each well. 12. Incubate plate at
37C/5% C02for 30- 60 minutes with the lid on and in the dark (the darkness of a standard incubator will suffice).
13. For use with a fluorescence microplate reader, attach the Oris Detection Mask and read promptly for Calcein
fluorescence (ex 494 nm/ em 517 nm). 14. For image analysis, photomicrograph wells using epifluorescence
illumination with or without the Oris Detection mask. Images can then be analyzed using either area closure
with the calcein stain or number of cells (nuclei) using the Hoechst stain. ImageJ freeware available from the
NIH (http://rsbweb.nih.gov/ij/) can be used for migration data analysis as percent area closure or cellular
enumeration
Baseline median absolute deviation for the assay (bmad): 10.414
Response cutoff threshold used to determine hit calls: 31.243
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: The ArunA migration assay measures growth and survival in human embryonic neuroprogenitor
(hNP) and human neural crest (hNC) cells by tracking the presence/absence of viable nuclei movement into a
defined circular area in each microplate well. These different measurements are assessed following 72 hour
incubations with test chemical to evaluate the potential to disrupt neural migration in developing human
embryos.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
5
Standard minimum concentration tested:
1.2 nM
Key positive control:
cytochalasin D
Target (nominal) number of replicates:
3
Standard maximum concentration tested:
100 nM
Neutral vehicle control:
DMSO
NA
3.
Additionally, this assay was annotated to the intended target family of neurodevelopment.
Data Interpretation
-------
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: The migration of neuroprogenitor and neural crest cells into the detection zone was assessed by
comparing the percent migration of proliferative Ki-67 cells to total migrating cells following 72 hours exposure.
This was accomplished by determining the percentage of total cells migrating into the detection zone, i.e. the
migration index (Ml), compared to the percentage of migrating cells that expressed the Ki-67 proliferative
marker within the detection zone, i.e. the proliferative index (PI). Normalized response values for each assay
endpoint were calculated as resp = 100 x (rval-bval) / (pval-bval) where rval, bval, and pval correspond to the
raw value, the plate level DMSO control median, and the plate level positive/negative control median,
respectively. In the parallel viability assessment, normalized response was calculated as resp = log2(rval/bval).
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
5: resp.pc (Calculate the normalized response (resp) as a percent of control, i.e. the ratio of the difference
between the corrected (cval) and baseline (bval) values divided the difference between the positive
control (pval) and baseline (bval) values multiplied by 100; resp = (cval-bval)/(pval-bval)*100.), 6:
resp.multnegl (Multiply the normalized response value (resp) by -1; -l*resp.), 11: bval.apid.nwlls.med
(Calculate the baseline value (bval) as the plate-wise median, by assay plate ID (apid), of the corrected
values (cval) for neutral control wells (wilt = n).), 15: pval.apid.medncbyconc.min (Calculate the positive
control value (pval) as the plate-wise minimum, by assay plate ID (apid), of the medians of the corrected
values (cval) for gain-of-signal single- or multiple-concentration negative control wells (wilt = m or o) by
apid, well type, and concentration.)
Level 4: Baseline and required tcplFit2 parameters defined by:
2: bmad.aeid.lowconc.nwells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (resp) for neutral control wells (wilt = n). Calculate one
standard deviation of the normalized response for neutral control wells (wilt = n); onesd = sqrt(sum((resp
- mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
1: bmad3 (Add a cutoff value of 3 multiplied by the baseline median absolute deviation (bmad) as defined
at Level 4.), 27: ow_bidirectional_loss (Multiply winning model hitcall (hitc) by -1 for models fit in the
positive analysis direction. Typically used for endpoints where only negative responses are biologically
relevant.)
Level 6: Cautionary flagging include:
-------
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 63 Number of chemicals tested: 58
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
27
Inactive hit count: Oihitc 0.9
29
WINING MODEL SELECTION
NA hit count: hitc^O
7
Number of sample-assay endpoints with winning hill model:
6
4
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
5
23
quadratic-polynomialfpoly2) model: 5
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
0
0
10
-------
exponentials (exp5) model:
10
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed
4067
Neutral control median absolute deviation, by plate: nmad
280.211
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100
8.49%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed
Positive control well median absolute deviation, by plate: pmad
NA
NA
-------
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - rimed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed 165
Negative control well median absolute deviation value, by plate: mmad 85.991
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: -10.15
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 10.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5.
5.1
Potential Regulatory Applications
Context of Use: Examples of end use scenarios could include, but are not limited to:
-------
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Zurlinden TJ, Saili KS, Baker NC, Toimela T, Heinonen T, Knudsen TB. A cross-platform approach to
characterize and screen potential neurovascular unit toxicants. Reprod Toxicol. 2020 Jun 24;96:300-315. doi:
10.1016/j.reprotox.2020.06.010. Epub ahead of print. PMID: 32590145.
7. Supporting Information:
More information on the ToxCast program can be found at: httpsi//www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1831
ArunA_NOG_NucleusCount
1. General Information
1.1 Assay Title: ArunA Biomedical's Neurite Outgrowth (NOG) Assay for Nucleus Count
1.2 Assay Summary: ArunA_NOG (Neurite Outgrowth) is a cell-based, image-based assay that uses human H9-
derived embryonic differentiated neurons (hNN). Measurements were taken 48 hours after chemical dosing in
a 96-well plate. ArunA_NOG_NucleusCount is an assay component measured from the ArunA_NOG assay. It is
designed to make measurements of viability related to the number of neurons, using a form of viability reporter,
as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the assay
component ArunA_NOG_NucleusCount was analyzed at the endpoint, ArunA_NOG_NucleusCount, in the
positive analysis fitting direction relative to DMSO as the negative control and baseline of activity. Using a type
of viability reporter, loss-of-signal activity can be used to understand viability. To generalize the intended target
to other relatable targets, this assay endpoint is annotated to the neurodevelopment intended target family,
where the subfamily is neurite outgrowth.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: ArunA Biomedical is a privately owned biotechnology company and Contract Research
Organization (CRO) formerly providing toxicology screening using neural stem cell-based assays.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: hN2 cells and growth media were provided through Material Transfer Agreement #466-
08 between the U.S. EPA and ArunA Biomedical, Inc.
1.9 Assay Throughput: 96-well plate. ArunA systems offer high throughput chemical screening in a 96-well format
for the human neuroprogenitor (hNP) and human neural crest (hNC) migration and cell titer endpoints.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes in fluorescence intensity related to the number of Hoechst 33,258 labelled nuceli is
indicative of the viability of the system.
Chemical-induced perturbations to cellular key events across neurogenic outcomes, including migration
(neuroprogenitor and neural crest cells) and neural network formation (neurite length, neurite length, and
branch points for neurites), in vitro can inform on cell-based prioritization of neurodevelopmental hazard
potential.
2.2 Scientific Principles: During the development of the nervous systems, many processes occur to give rise to a
functional and healthy neural network and hence nervous system. These important neurodevelopmental
processes may be disrupted by potential toxicants, resulting in developmental neurotoxicity. Among these
processes is migration of neuroprogenitors and neural crest cells (NCCs). Impaired neuroprogenitor and NCC
migration can lead to cerebral malformations and neurodevelopmental disorders, such as diencephalic-
mesencephalic dysplasia syndrome, cerebral palsy, cerebellar ataxia, and microcephaly.
-------
2.3 Experimental System: adherent hNN cell line used. The hN2 cell line is derived from neuroepithelial cells of
WA09 hESC (Thomson et al., 1998) origin according to a previously described protocol (Shin et al., 2005, Shin et
al., 2006). Importantly, as opposed to other methods of deriving neural progenitors through three-dimensional
neurosphere and embryoid body formations (Reubinoff et al., 2001, Zhang et al., 2001), these adherent
monolayer cultures are uniformly exposed to growth factors and/or morphogens throughout their propagation.
Neurogenic lineages from human embryonic stem cell line WA09 were locked into three neural differentiation
states: neuroprogenitor (hNPl - Cat no. 7009), neural crest (hNC - Cat no. 7029), and neural network (hNN -
Cat no. hNJL7014). Prior to differentiation into hN2 cells the population was confirmed karyotypically normal,
>95% nestin positive and <3% OCT-4 positive (Shin et al., 2006). The cells were produced in bulk by propagation
for an additional 2 weeks beyond the neuroepithelial stage by removal of bFGF from the media and
cryopreserved (ArunA Biomedical, Athens, GA) for end user applications. The hNP and hNC cell endpoints
consisted of cell titer and migratory measurements whereas hNN cell endpoints consisted of neuron count and
three neurite-specific metrics to assess network formation: neurite length, neurites per neuron, and branch
points for neurites. For this study, ArunA Bio extended the differentiation period of the hNN cells by
approximately two weeks more than in the original hN2 protocol. This allowed for increased neural network cell
axonation leading to better quantitation of network-specific endpoints. The utility of dissociated hN2 cultures
as an in vitro model for neurite outgrowth was assessed using automated high-content image analysis (HCA). In
addition, the molecular phenotype of these cells was examined using immunocytochemical staining.
2.4 Metabolic Competence: H9-derived cells are locked at different neuronal developmental states of interest to
DNT investigations of chemical exposures. Xenobiotic biotransformation potential has not been characterized.
2.5 Exposure Regime: Chemical treatment: Differentiated hNN cells were seeded and immediately exposed to test
medium for 48 h. Following chemical exposure, cell bodies were stained with Hoechst 33,258 to quantitate
viable neuron count and neurites were labeled with blll-tubulin/DyLightl 488. High content imaging assessed
the neurite outgrowth endpoints: neurite total length per neuron (um), neurite count per neuron, and branch
points per neurite using the methods described in (Harrill et al 2010). Measurements of hN2 morphology:
(Beta)11l-TubuIin stained cell cultures were allowed to warm to room temperature. Plates were then loaded into
a Cellomics ArrayScan VTI HCS reader high-content imaging system (ThermoFisher Scientific, Waltham, MA) for
automated image acquisition and morphometric analyses. This system consists of an epifluorescent microscope
with an EXFO X-cite 120 metal-halide arc lamp, motorized imaging objectives, stage and excitation/emission
filter wheel and a 12-bit high-resolution CCD camera connected to a Dell Intel Xenon computer terminal with 2
GHz processor. Image acquisition and storage was performed using the vHCS Scan software package, version
6.6.1.4. Matched fluorescent images of Hoechst-stained nuclei and (beta)lll-tubulin/DyLight 488 immunolabeled
cells were acquired using 365/515 (channel 1) and 475/515 (channel 2) nm excitation/emission filter couplings,
respectively, with a 20x objective (Zeiss, Inc., Thornwood, NY). Fixed integration times for image acquisition in
each channel were determined by manual sampling of control-treated wells across multiple plates. A matching
pseudocolored composite image of Hoechst-stained nuclei (blue) and (beta)lM-tubulin/DyLight 488 labeled cell
bodies and neurites (green). The Neural Profiling BioApplication performs automated image analysis in a
sequential manner as follows. Briefly, nuclei were identified in channel 1 as bright objects on a dark background.
Nuclei with size and intensity values outside of the ranges determined a priori for viable cells were identified in
the channel 1 image and rejected from further analyses. Spatial coordinates from the channel 1 image were
then superimposed on the matching channel 2 image. Cell body masks in channel 2 were then cast based on
positional data from channel 1 nuclei and a set of user-defined geometric and signal intensity-based parameters.
Cell bodies corresponding to valid neurons were then selected and invalid cell bodies rejected. Parameters for
valid cell body selection include the presence of exactly one nucleus within the cell body mask, a requirement
that the nucleus met the gating criteria imposed in channel 1, a requirement that at least 25% of the nucleus
perimeter is bounded by DyLight 488 labeled cytoplasm and a requirement that the total cell body area not
exceed 4000 um2. Neurites emerging from the selected cell bodies were then individually traced and measured.
For this study, neurites were defined as processes >10 um in length. Neurites were separated from cell bodies
at points when the half-width of the labeled cytoplasm was less 3.6 um across. In the case of neurites with an
ambiguous origin (i.e. appearing to emerge from or contact multiple cell bodies) the Neural Profiling
BioApplication traced the neurite from all potential origin points and retained the longest neurite for
-------
measurements of length and number of neurites per neuron. This effectively prevented repeated sampling of
the same neurite segment within each image. Morphometric data from high-content image analysis (HCA)
included measurements of the average number of neurites per neuron and total neurite length per neuron.
Data for both endpoints were collected on cell-by-cell basis. The number of neurites and the cumulative length
of all neurites associated with each cell body (i.e. total neurite length) were calculated for each cell meeting the
selection criteria outlined above. Cell-level measurements were then averaged to obtain a mean measurement
for the average number of neurites per neuron and total neurite length per neuron for the cell populations
sampled within each well.
Baseline median absolute deviation for the assay (bmad): 0.152
Response cutoff threshold used to determine hit calls: 0.456
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: The ArunA neurite outgrowth assay monitors changes in neurite length and number of branch points
(both total number of branch points and number formed per neuron) in human neural network cells (hNN)
derived from human embryonic stem cells. These different measurements are assessed following 48 hour
incubations with test chemical to help predict the potential to disrupt neural network formation in developing
human embryos.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: High content imaging assessed the neurite outgrowth endpoints: neurite total length per neuron
(nm), neurite count per neuron, and branch points per neurite. Plate-level raw data, provided by each assay
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
5
Standard minimum concentration tested:
1.2 nM
Key positive control:
NA
Target (nominal) number of replicates:
4
Standard maximum concentration tested:
100 nM
Neutral vehicle control:
DMSO
NA
-------
source, were received by EPA from each contractor and analyzed using the ToxCast Pipeline (tcpl). Normalized
response values for each assay endpoint were calculated as resp = log2(rval/bval) where rval, bval, and pval
correspond to the raw value, the plate level DMSO control median, and the plate level positive/negative control
median, respectively.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
7: resp.log2 (Transform the response values to log-scale (base 2).), 9: resp.fc (Calculate the normalized
response (resp) as the fold change, i.e. the ratio of the corrected (cval) and baseline (bval) values; resp =
cval/bal.), 11: bval.apid.nwlls.med (Calculate the baseline value (bval) as the plate-wise median, by assay
plate ID (apid), of the corrected values (cval) for neutral control wells (wilt = n).)
Level 4: Baseline and required tcplFit2 parameters defined by:
2: bmad.aeid.lowconc.nwells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (resp) for neutral control wells (wilt = n). Calculate one
standard deviation of the normalized response for neutral control wells (wilt = n); onesd = sqrt(sum((resp
- mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
1: bmad3 (Add a cutoff value of 3 multiplied by the baseline median absolute deviation (bmad) as defined
at Level 4.), 27: ow_bidirectional_loss (Multiply winning model hitcall (hitc) by -1 for models fit in the
positive analysis direction. Typically used for endpoints where only negative responses are biologically
relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
-------
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
Number of samples tested: 60
Active hit count: hitc>0.9
14
SAMPLE AND CHEMICAL COVERAGE
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1589.5
Neutral control median absolute deviation, by plate: nmad 125.28
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 10.3%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
-------
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 7.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Zurlinden TJ, Saili KS, Baker NC, Toimela T, Heinonen T, Knudsen TB. A cross-platform approach to
characterize and screen potential neurovascular unit toxicants. Reprod Toxicol. 2020 Jun 24;96:300-315. doi:
10.1016/j.reprotox.2020.06.010. Epub ahead of print. PMID: 32590145.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
-------
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1833
ArunA_NOG_NeuriteLength
1. General Information
1.1 Assay Title: ArunA Biomedical's Neurite Outgrowth (NOG) Assay for Neurite Length
1.2 Assay Summary: ArunA_NOG (Neurite Outgrowth) is a cell-based, image-based assay that uses human H9-
derived embryonic differentiated neurons (hNN). Measurements were taken 48 hours after chemical dosing in
a 96-well plate. ArunA_NOG_NeuriteLength is an assay component measured from the ArunA_NOG assay. It is
designed to make measurements of neurite outgrowth related to neurite length, using a form of morphology
reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging technology. Data from the
assay component ArunA_NOG_NeuriteLength was analyzed at the endpoint, ArunA_NOG_NeuriteLength, in the
positive analysis fitting direction relative to DMSO as the negative control and baseline of activity. Using a type
of morphology reporter, loss-of-signal activity can be used to understand developmental effects. To generalize
the intended target to other relatable targets, this assay endpoint is annotated to the neurodevelopment
intended target family, where the subfamily is neurite outgrowth.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: ArunA Biomedical is a privately owned biotechnology company and Contract Research
Organization (CRO) formerly providing toxicology screening using neural stem cell-based assays.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: hN2 cells and growth media were provided through Material Transfer Agreement #466-
08 between the U.S. EPA and ArunA Biomedical, Inc.
1.9 Assay Throughput: 96-well plate. ArunA systems offer high throughput chemical screening in a 96-well format
for the human neuroprogenitor (hNP) and human neural crest (hNC) migration and cell titer endpoints.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes in fluorescence intensity related to blll-tubulin/DyLightl 488 antibody labelling is
indicative of the neurite outgrowth.
Chemical-induced perturbations to cellular key events across neurogenic outcomes, including migration
(neuroprogenitor and neural crest cells) and neural network formation (neurite length, neurite length, and
branch points for neurites), in vitro can inform on cell-based prioritization of neurodevelopmental hazard
potential.
2.2 Scientific Principles: During the development of the nervous systems, many processes occur to give rise to a
functional and healthy neural network and hence nervous system. These important neurodevelopmental
processes may be disrupted by potential toxicants, resulting in developmental neurotoxicity. Among these
processes is migration of neuroprogenitors and neural crest cells (NCCs). Impaired neuroprogenitor and NCC
migration can lead to cerebral malformations and neurodevelopmental disorders, such as diencephalic-
mesencephalic dysplasia syndrome, cerebral palsy, cerebellar ataxia, and microcephaly.
-------
2.3 Experimental System: adherent hNN cell line used. The hN2 cell line is derived from neuroepithelial cells of
WA09 hESC (Thomson et al., 1998) origin according to a previously described protocol (Shin et al., 2005, Shin et
al., 2006). Importantly, as opposed to other methods of deriving neural progenitors through three-dimensional
neurosphere and embryoid body formations (Reubinoff et al., 2001, Zhang et al., 2001), these adherent
monolayer cultures are uniformly exposed to growth factors and/or morphogens throughout their propagation.
Neurogenic lineages from human embryonic stem cell line WA09 were locked into three neural differentiation
states: neuroprogenitor (hNPl - Cat no. 7009), neural crest (hNC - Cat no. 7029), and neural network (hNN -
Cat no. hNJL7014). Prior to differentiation into hN2 cells the population was confirmed karyotypically normal,
>95% nestin positive and <3% OCT-4 positive (Shin et al., 2006). The cells were produced in bulk by propagation
for an additional 2 weeks beyond the neuroepithelial stage by removal of bFGF from the media and
cryopreserved (ArunA Biomedical, Athens, GA) for end user applications. The hNP and hNC cell endpoints
consisted of cell titer and migratory measurements whereas hNN cell endpoints consisted of neuron count and
three neurite-specific metrics to assess network formation: neurite length, neurites per neuron, and branch
points for neurites. For this study, ArunA Bio extended the differentiation period of the hNN cells by
approximately two weeks more than in the original hN2 protocol. This allowed for increased neural network cell
axonation leading to better quantitation of network-specific endpoints. The utility of dissociated hN2 cultures
as an in vitro model for neurite outgrowth was assessed using automated high-content image analysis (HCA). In
addition, the molecular phenotype of these cells was examined using immunocytochemical staining.
2.4 Metabolic Competence: H9-derived cells are locked at different neuronal developmental states of interest to
DNT investigations of chemical exposures. Xenobiotic biotransformation potential has not been characterized.
2.5 Exposure Regime: Chemical treatment: Differentiated hNN cells were seeded and immediately exposed to test
medium for 48 h. Following chemical exposure, cell bodies were stained with Hoechst 33,258 to quantitate
viable neuron count and neurites were labeled with blll-tubulin/DyLightl 488. High content imaging assessed
the neurite outgrowth endpoints: neurite total length per neuron (um), neurite count per neuron, and branch
points per neurite using the methods described in (Harrill et al 2010). Measurements of hN2 morphology:
(Beta)11l-TubuIin stained cell cultures were allowed to warm to room temperature. Plates were then loaded into
a Cellomics ArrayScan VTI HCS reader high-content imaging system (ThermoFisher Scientific, Waltham, MA) for
automated image acquisition and morphometric analyses. This system consists of an epifluorescent microscope
with an EXFO X-cite 120 metal-halide arc lamp, motorized imaging objectives, stage and excitation/emission
filter wheel and a 12-bit high-resolution CCD camera connected to a Dell Intel Xenon computer terminal with 2
GHz processor. Image acquisition and storage was performed using the vHCS Scan software package, version
6.6.1.4. Matched fluorescent images of Hoechst-stained nuclei and (beta)lll-tubulin/DyLight 488 immunolabeled
cells were acquired using 365/515 (channel 1) and 475/515 (channel 2) nm excitation/emission filter couplings,
respectively, with a 20x objective (Zeiss, Inc., Thornwood, NY). Fixed integration times for image acquisition in
each channel were determined by manual sampling of control-treated wells across multiple plates. A matching
pseudocolored composite image of Hoechst-stained nuclei (blue) and (beta)lM-tubulin/DyLight 488 labeled cell
bodies and neurites (green). The Neural Profiling BioApplication performs automated image analysis in a
sequential manner as follows. Briefly, nuclei were identified in channel 1 as bright objects on a dark background.
Nuclei with size and intensity values outside of the ranges determined a priori for viable cells were identified in
the channel 1 image and rejected from further analyses. Spatial coordinates from the channel 1 image were
then superimposed on the matching channel 2 image. Cell body masks in channel 2 were then cast based on
positional data from channel 1 nuclei and a set of user-defined geometric and signal intensity-based parameters.
Cell bodies corresponding to valid neurons were then selected and invalid cell bodies rejected. Parameters for
valid cell body selection include the presence of exactly one nucleus within the cell body mask, a requirement
that the nucleus met the gating criteria imposed in channel 1, a requirement that at least 25% of the nucleus
perimeter is bounded by DyLight 488 labeled cytoplasm and a requirement that the total cell body area not
exceed 4000 um2. Neurites emerging from the selected cell bodies were then individually traced and measured.
For this study, neurites were defined as processes >10 um in length. Neurites were separated from cell bodies
at points when the half-width of the labeled cytoplasm was less 3.6 um across. In the case of neurites with an
ambiguous origin (i.e. appearing to emerge from or contact multiple cell bodies) the Neural Profiling
BioApplication traced the neurite from all potential origin points and retained the longest neurite for
-------
measurements of length and number of neurites per neuron. This effectively prevented repeated sampling of
the same neurite segment within each image. Morphometric data from high-content image analysis (HCA)
included measurements of the average number of neurites per neuron and total neurite length per neuron.
Data for both endpoints were collected on cell-by-cell basis. The number of neurites and the cumulative length
of all neurites associated with each cell body (i.e. total neurite length) were calculated for each cell meeting the
selection criteria outlined above. Cell-level measurements were then averaged to obtain a mean measurement
for the average number of neurites per neuron and total neurite length per neuron for the cell populations
sampled within each well.
Baseline median absolute deviation for the assay (bmad): 0.175
Response cutoff threshold used to determine hit calls: 0.524
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: The ArunA neurite outgrowth assay monitors changes in neurite length and number of branch points
(both total number of branch points and number formed per neuron) in human neural network cells (hNN)
derived from human embryonic stem cells. These different measurements are assessed following 48 hour
incubations with test chemical to help predict the potential to disrupt neural network formation in developing
human embryos.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of neurodevelopment.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: High content imaging assessed the neurite outgrowth endpoints: neurite total length per neuron
(nm), neurite count per neuron, and branch points per neurite. Plate-level raw data, provided by each assay
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
5
Standard minimum concentration tested:
1.2 nM
Key positive control:
NA
Target (nominal) number of replicates:
4
Standard maximum concentration tested:
100 nM
Neutral vehicle control:
DMSO
NA
-------
source, were received by EPA from each contractor and analyzed using the ToxCast Pipeline (tcpl). Normalized
response values for each assay endpoint were calculated as resp = log2(rval/bval) where rval, bval, and pval
correspond to the raw value, the plate level DMSO control median, and the plate level positive/negative control
median, respectively.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
7: resp.log2 (Transform the response values to log-scale (base 2).), 9: resp.fc (Calculate the normalized
response (resp) as the fold change, i.e. the ratio of the corrected (cval) and baseline (bval) values; resp =
cval/bal.), 11: bval.apid.nwlls.med (Calculate the baseline value (bval) as the plate-wise median, by assay
plate ID (apid), of the corrected values (cval) for neutral control wells (wilt = n).)
Level 4: Baseline and required tcplFit2 parameters defined by:
2: bmad.aeid.lowconc.nwells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (resp) for neutral control wells (wilt = n). Calculate one
standard deviation of the normalized response for neutral control wells (wilt = n); onesd = sqrt(sum((resp
- mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
1: bmad3 (Add a cutoff value of 3 multiplied by the baseline median absolute deviation (bmad) as defined
at Level 4.), 27: ow_bidirectional_loss (Multiply winning model hitcall (hitc) by -1 for models fit in the
positive analysis direction. Typically used for endpoints where only negative responses are biologically
relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
-------
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
Number of samples tested: 60
Active hit count: hitc>0.9
14
SAMPLE AND CHEMICAL COVERAGE
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 21.125
Neutral control median absolute deviation, by plate: nmad 3.626
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 12.47%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
-------
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells):
(mmed/nmed)
NA
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 8.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Zurlinden TJ, Saili KS, Baker NC, Toimela T, Heinonen T, Knudsen TB. A cross-platform approach to
characterize and screen potential neurovascular unit toxicants. Reprod Toxicol. 2020 Jun 24;96:300-315. doi:
10.1016/j.reprotox.2020.06.010. Epub ahead of print. PMID: 32590145.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
-------
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1835
ArunA_NOG_NeuritesPerNeuron
1. General Information
1.1 Assay Title: ArunA Biomedical's Neurite Outgrowth (NOG) Assay for Neurites Per Neuron
1.2 Assay Summary: ArunA_NOG (Neurite Outgrowth) is a cell-based, image-based assay that uses human H9-
derived embryonic differentiated neurons (hNN). Measurements were taken 48 hours after chemical dosing in
a 96-well plate. ArunA_NOG_NeuritesPerNeuron is an assay component measured from the ArunA_NOG assay.
It is designed to make measurements of neurite outgrowth related to number of neurites per neuron, using a
form of morphology reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging
technology. Data from the assay component ArunA_NOG_NeuritesPerNeuron was analyzed at the endpoint,
ArunA_NOG_NeuritesPerNeuron, in the positive analysis fitting direction relative to DMSO as the negative
control and baseline of activity. Using a type of morphology reporter, loss-of-signal activity can be used to
understand developmental effects. To generalize the intended target to other relatable targets, this assay
endpoint is annotated to the neurodevelopment intended target family, where the subfamily is neurite
outgrowth.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: ArunA Biomedical is a privately owned biotechnology company and Contract Research
Organization (CRO) formerly providing toxicology screening using neural stem cell-based assays.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: hN2 cells and growth media were provided through Material Transfer Agreement #466-
08 between the U.S. EPA and ArunA Biomedical, Inc.
1.9 Assay Throughput: 96-well plate. ArunA systems offer high throughput chemical screening in a 96-well format
for the human neuroprogenitor (hNP) and human neural crest (hNC) migration and cell titer endpoints.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes in fluorescence intensity related to blll-tubulin/DyLightl 488 antibody labelling is
indicative of the neurite outgrowth.
Chemical-induced perturbations to cellular key events across neurogenic outcomes, including migration
(neuroprogenitor and neural crest cells) and neural network formation (neurite length, neurite length, and
branch points for neurites), in vitro can inform on cell-based prioritization of neurodevelopmental hazard
potential.
2.2 Scientific Principles: During the development of the nervous systems, many processes occur to give rise to a
functional and healthy neural network and hence nervous system. These important neurodevelopmental
processes may be disrupted by potential toxicants, resulting in developmental neurotoxicity. Among these
processes is migration of neuroprogenitors and neural crest cells (NCCs). Impaired neuroprogenitor and NCC
migration can lead to cerebral malformations and neurodevelopmental disorders, such as diencephalic-
mesencephalic dysplasia syndrome, cerebral palsy, cerebellar ataxia, and microcephaly.
-------
2.3 Experimental System: adherent hNN cell line used. The hN2 cell line is derived from neuroepithelial cells of
WA09 hESC (Thomson et al., 1998) origin according to a previously described protocol (Shin et al., 2005, Shin et
al., 2006). Importantly, as opposed to other methods of deriving neural progenitors through three-dimensional
neurosphere and embryoid body formations (Reubinoff et al., 2001, Zhang et al., 2001), these adherent
monolayer cultures are uniformly exposed to growth factors and/or morphogens throughout their propagation.
Neurogenic lineages from human embryonic stem cell line WA09 were locked into three neural differentiation
states: neuroprogenitor (hNPl - Cat no. 7009), neural crest (hNC - Cat no. 7029), and neural network (hNN -
Cat no. hNJL7014). Prior to differentiation into hN2 cells the population was confirmed karyotypically normal,
>95% nestin positive and <3% OCT-4 positive (Shin et al., 2006). The cells were produced in bulk by propagation
for an additional 2 weeks beyond the neuroepithelial stage by removal of bFGF from the media and
cryopreserved (ArunA Biomedical, Athens, GA) for end user applications. The hNP and hNC cell endpoints
consisted of cell titer and migratory measurements whereas hNN cell endpoints consisted of neuron count and
three neurite-specific metrics to assess network formation: neurite length, neurites per neuron, and branch
points for neurites. For this study, ArunA Bio extended the differentiation period of the hNN cells by
approximately two weeks more than in the original hN2 protocol. This allowed for increased neural network cell
axonation leading to better quantitation of network-specific endpoints. The utility of dissociated hN2 cultures
as an in vitro model for neurite outgrowth was assessed using automated high-content image analysis (HCA). In
addition, the molecular phenotype of these cells was examined using immunocytochemical staining.
2.4 Metabolic Competence: H9-derived cells are locked at different neuronal developmental states of interest to
DNT investigations of chemical exposures. Xenobiotic biotransformation potential has not been characterized.
2.5 Exposure Regime: Chemical treatment: Differentiated hNN cells were seeded and immediately exposed to test
medium for 48 h. Following chemical exposure, cell bodies were stained with Hoechst 33,258 to quantitate
viable neuron count and neurites were labeled with blll-tubulin/DyLightl 488. High content imaging assessed
the neurite outgrowth endpoints: neurite total length per neuron (um), neurite count per neuron, and branch
points per neurite using the methods described in (Harrill et al 2010). Measurements of hN2 morphology:
(Beta)11l-TubuIin stained cell cultures were allowed to warm to room temperature. Plates were then loaded into
a Cellomics ArrayScan VTI HCS reader high-content imaging system (ThermoFisher Scientific, Waltham, MA) for
automated image acquisition and morphometric analyses. This system consists of an epifluorescent microscope
with an EXFO X-cite 120 metal-halide arc lamp, motorized imaging objectives, stage and excitation/emission
filter wheel and a 12-bit high-resolution CCD camera connected to a Dell Intel Xenon computer terminal with 2
GHz processor. Image acquisition and storage was performed using the vHCS Scan software package, version
6.6.1.4. Matched fluorescent images of Hoechst-stained nuclei and (beta)lll-tubulin/DyLight 488 immunolabeled
cells were acquired using 365/515 (channel 1) and 475/515 (channel 2) nm excitation/emission filter couplings,
respectively, with a 20x objective (Zeiss, Inc., Thornwood, NY). Fixed integration times for image acquisition in
each channel were determined by manual sampling of control-treated wells across multiple plates. A matching
pseudocolored composite image of Hoechst-stained nuclei (blue) and (beta)lM-tubulin/DyLight 488 labeled cell
bodies and neurites (green). The Neural Profiling BioApplication performs automated image analysis in a
sequential manner as follows. Briefly, nuclei were identified in channel 1 as bright objects on a dark background.
Nuclei with size and intensity values outside of the ranges determined a priori for viable cells were identified in
the channel 1 image and rejected from further analyses. Spatial coordinates from the channel 1 image were
then superimposed on the matching channel 2 image. Cell body masks in channel 2 were then cast based on
positional data from channel 1 nuclei and a set of user-defined geometric and signal intensity-based parameters.
Cell bodies corresponding to valid neurons were then selected and invalid cell bodies rejected. Parameters for
valid cell body selection include the presence of exactly one nucleus within the cell body mask, a requirement
that the nucleus met the gating criteria imposed in channel 1, a requirement that at least 25% of the nucleus
perimeter is bounded by DyLight 488 labeled cytoplasm and a requirement that the total cell body area not
exceed 4000 um2. Neurites emerging from the selected cell bodies were then individually traced and measured.
For this study, neurites were defined as processes >10 um in length. Neurites were separated from cell bodies
at points when the half-width of the labeled cytoplasm was less 3.6 um across. In the case of neurites with an
ambiguous origin (i.e. appearing to emerge from or contact multiple cell bodies) the Neural Profiling
-------
BioApplication traced the neurite from all potential origin points and retained the longest neurite for
measurements of length and number of neurites per neuron. This effectively prevented repeated sampling of
the same neurite segment within each image. Morphometric data from high-content image analysis (HCA)
included measurements of the average number of neurites per neuron and total neurite length per neuron.
Data for both endpoints were collected on cell-by-cell basis. The number of neurites and the cumulative length
of all neurites associated with each cell body (i.e. total neurite length) were calculated for each cell meeting the
selection criteria outlined above. Cell-level measurements were then averaged to obtain a mean measurement
for the average number of neurites per neuron and total neurite length per neuron for the cell populations
sampled within each well.
Baseline median absolute deviation for the assay (bmad): 0.138
Response cutoff threshold used to determine hit calls: 0.415
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: The ArunA neurite outgrowth assay monitors changes in neurite length and number of branch points
(both total number of branch points and number formed per neuron) in human neural network cells (hNN)
derived from human embryonic stem cells. These different measurements are assessed following 48 hour
incubations with test chemical to help predict the potential to disrupt neural network formation in developing
human embryos.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of neurodevelopment.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
5
Standard minimum concentration tested:
1.2 nM
Key positive control:
NA
Target (nominal) number of replicates:
4
Standard maximum concentration tested:
100 nM
Neutral vehicle control:
DMSO
NA
-------
3.2 Data Analysis: High content imaging assessed the neurite outgrowth endpoints: neurite total length per neuron
(urn), neurite count per neuron, and branch points per neurite. Plate-level raw data, provided by each assay
source, were received by EPA from each contractor and analyzed using the ToxCast Pipeline (tcpl). Normalized
response values for each assay endpoint were calculated as resp = log2(rval/bval) where rval, bval, and pval
correspond to the raw value, the plate level DMSO control median, and the plate level positive/negative control
median, respectively.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
7: resp.log2 (Transform the response values to log-scale (base 2).), 9: resp.fc (Calculate the normalized
response (resp) as the fold change, i.e. the ratio of the corrected (cval) and baseline (bval) values; resp =
cval/bal.), 11: bval.apid.nwlls.med (Calculate the baseline value (bval) as the plate-wise median, by assay
plate ID (apid), of the corrected values (cval) for neutral control wells (wilt = n).)
Level 4: Baseline and required tcplFit2 parameters defined by:
2: bmad.aeid.lowconc.nwells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (resp) for neutral control wells (wilt = n). Calculate one
standard deviation of the normalized response for neutral control wells (wilt = n); onesd = sqrt(sum((resp
- mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
1: bmad3 (Add a cutoff value of 3 multiplied by the baseline median absolute deviation (bmad) as defined
at Level 4.), 27: ow_bidirectional_loss (Multiply winning model hitcall (hitc) by -1 for models fit in the
positive analysis direction. Typically used for endpoints where only negative responses are biologically
relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
-------
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 60 Number of chemicals tested: 58
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 2.05
Neutral control median absolute deviation, by plate: nmad 0.215
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 9.3%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
-------
{(mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 5.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Zurlinden TJ, Saili KS, Baker NC, Toimela T, Heinonen T, Knudsen TB. A cross-platform approach to
characterize and screen potential neurovascular unit toxicants. Reprod Toxicol. 2020 Jun 24;96:300-315. doi:
10.1016/j.reprotox.2020.06.010. Epub ahead of print. PMID: 32590145.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
-------
researcli/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1838
ArunA_NOG_BranchPointsPerNeurite
1. General Information
1.1 Assay Title: ArunA Biomedical's Neurite Outgrowth (NOG) Assay for Branch Points Per Neurite
1.2 Assay Summary: ArunA_NOG (Neurite Outgrowth) is a cell-based, image-based assay that uses human H9-
derived embryonic differentiated neurons (hNN). Measurements were taken 48 hours after chemical dosing in
a 96-well plate. ArunA_NOG_BranchPointsPerNeurite is an assay component measured from the ArunA_NOG
assay. It is designed to make measurements of neurite outgrowth related to branch points per neurite, using a
form of morphology reporter, as detected with fluorescence intensity signals by HCS Fluorescent Imaging
technology. Data from the assay component ArunA_NOG_BranchPointsPerNeurite was analyzed at the
endpoint, ArunA_NOG_BranchPointsPerNeurite, with bidirectional fitting relative to DMSO as the negative
control and baseline of activity. Using a type of morphology reporter, gain-of-signal activity can be used to
understand developmental effects. To generalize the intended target to other relatable targets, this assay
endpoint is annotated to the neurodevelopment intended target family, where the subfamily is neurite
outgrowth.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: ArunA Biomedical is a privately owned biotechnology company and Contract Research
Organization (CRO) formerly providing toxicology screening using neural stem cell-based assays.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: hN2 cells and growth media were provided through Material Transfer Agreement #466-
08 between the U.S. EPA and ArunA Biomedical, Inc.
1.9 Assay Throughput: 96-well plate. ArunA systems offer high throughput chemical screening in a 96-well format
for the human neuroprogenitor (hNP) and human neural crest (hNC) migration and cell titer endpoints.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes in fluorescence intensity related to blll-tubulin/DyLightl 488 antibody labelling is
indicative of the neurite outgrowth.
Chemical-induced perturbations to cellular key events across neurogenic outcomes, including migration
(neuroprogenitor and neural crest cells) and neural network formation (neurite length, neurite length, and
branch points for neurites), in vitro can inform on cell-based prioritization of neurodevelopmental hazard
potential.
2.2 Scientific Principles: During the development of the nervous systems, many processes occur to give rise to a
functional and healthy neural network and hence nervous system. These important neurodevelopmental
processes may be disrupted by potential toxicants, resulting in developmental neurotoxicity. Among these
processes is migration of neuroprogenitors and neural crest cells (NCCs). Impaired neuroprogenitor and NCC
migration can lead to cerebral malformations and neurodevelopmental disorders, such as diencephalic-
mesencephalic dysplasia syndrome, cerebral palsy, cerebellar ataxia, and microcephaly.
-------
2.3 Experimental System: adherent hNN cell line used. The hN2 cell line is derived from neuroepithelial cells of
WA09 hESC (Thomson et al., 1998) origin according to a previously described protocol (Shin et al., 2005, Shin et
al., 2006). Importantly, as opposed to other methods of deriving neural progenitors through three-dimensional
neurosphere and embryoid body formations (Reubinoff et al., 2001, Zhang et al., 2001), these adherent
monolayer cultures are uniformly exposed to growth factors and/or morphogens throughout their propagation.
Neurogenic lineages from human embryonic stem cell line WA09 were locked into three neural differentiation
states: neuroprogenitor (hNPl - Cat no. 7009), neural crest (hNC - Cat no. 7029), and neural network (hNN -
Cat no. hNJL7014). Prior to differentiation into hN2 cells the population was confirmed karyotypically normal,
>95% nestin positive and <3% OCT-4 positive (Shin et al., 2006). The cells were produced in bulk by propagation
for an additional 2 weeks beyond the neuroepithelial stage by removal of bFGF from the media and
cryopreserved (ArunA Biomedical, Athens, GA) for end user applications. The hNP and hNC cell endpoints
consisted of cell titer and migratory measurements whereas hNN cell endpoints consisted of neuron count and
three neurite-specific metrics to assess network formation: neurite length, neurites per neuron, and branch
points for neurites. For this study, ArunA Bio extended the differentiation period of the hNN cells by
approximately two weeks more than in the original hN2 protocol. This allowed for increased neural network cell
axonation leading to better quantitation of network-specific endpoints. The utility of dissociated hN2 cultures
as an in vitro model for neurite outgrowth was assessed using automated high-content image analysis (HCA). In
addition, the molecular phenotype of these cells was examined using immunocytochemical staining.
2.4 Metabolic Competence: H9-derived cells are locked at different neuronal developmental states of interest to
DNT investigations of chemical exposures. Xenobiotic biotransformation potential has not been characterized.
2.5 Exposure Regime: Chemical treatment: Differentiated hNN cells were seeded and immediately exposed to test
medium for 48 h. Following chemical exposure, cell bodies were stained with Hoechst 33,258 to quantitate
viable neuron count and neurites were labeled with blll-tubulin/DyLightl 488. High content imaging assessed
the neurite outgrowth endpoints: neurite total length per neuron (um), neurite count per neuron, and branch
points per neurite using the methods described in (Harrill et al 2010). Measurements of hN2 morphology:
(Beta)11l-TubuIin stained cell cultures were allowed to warm to room temperature. Plates were then loaded into
a Cellomics ArrayScan VTI HCS reader high-content imaging system (ThermoFisher Scientific, Waltham, MA) for
automated image acquisition and morphometric analyses. This system consists of an epifluorescent microscope
with an EXFO X-cite 120 metal-halide arc lamp, motorized imaging objectives, stage and excitation/emission
filter wheel and a 12-bit high-resolution CCD camera connected to a Dell Intel Xenon computer terminal with 2
GHz processor. Image acquisition and storage was performed using the vHCS Scan software package, version
6.6.1.4. Matched fluorescent images of Hoechst-stained nuclei and (beta)lll-tubulin/DyLight 488 immunolabeled
cells were acquired using 365/515 (channel 1) and 475/515 (channel 2) nm excitation/emission filter couplings,
respectively, with a 20x objective (Zeiss, Inc., Thornwood, NY). Fixed integration times for image acquisition in
each channel were determined by manual sampling of control-treated wells across multiple plates. A matching
pseudocolored composite image of Hoechst-stained nuclei (blue) and (beta)lM-tubulin/DyLight 488 labeled cell
bodies and neurites (green). The Neural Profiling BioApplication performs automated image analysis in a
sequential manner as follows. Briefly, nuclei were identified in channel 1 as bright objects on a dark background.
Nuclei with size and intensity values outside of the ranges determined a priori for viable cells were identified in
the channel 1 image and rejected from further analyses. Spatial coordinates from the channel 1 image were
then superimposed on the matching channel 2 image. Cell body masks in channel 2 were then cast based on
positional data from channel 1 nuclei and a set of user-defined geometric and signal intensity-based parameters.
Cell bodies corresponding to valid neurons were then selected and invalid cell bodies rejected. Parameters for
valid cell body selection include the presence of exactly one nucleus within the cell body mask, a requirement
that the nucleus met the gating criteria imposed in channel 1, a requirement that at least 25% of the nucleus
perimeter is bounded by DyLight 488 labeled cytoplasm and a requirement that the total cell body area not
exceed 4000 um2. Neurites emerging from the selected cell bodies were then individually traced and measured.
For this study, neurites were defined as processes >10 um in length. Neurites were separated from cell bodies
at points when the half-width of the labeled cytoplasm was less 3.6 um across. In the case of neurites with an
ambiguous origin (i.e. appearing to emerge from or contact multiple cell bodies) the Neural Profiling
-------
BioApplication traced the neurite from all potential origin points and retained the longest neurite for
measurements of length and number of neurites per neuron. This effectively prevented repeated sampling of
the same neurite segment within each image. Morphometric data from high-content image analysis (HCA)
included measurements of the average number of neurites per neuron and total neurite length per neuron.
Data for both endpoints were collected on cell-by-cell basis. The number of neurites and the cumulative length
of all neurites associated with each cell body (i.e. total neurite length) were calculated for each cell meeting the
selection criteria outlined above. Cell-level measurements were then averaged to obtain a mean measurement
for the average number of neurites per neuron and total neurite length per neuron for the cell populations
sampled within each well.
Baseline median absolute deviation for the assay (bmad): 0.228
Response cutoff threshold used to determine hit calls: 0.683
Detection technology used: HCS Fluorescent Imaging (Fluorescence)
2.6 Response: The ArunA neurite outgrowth assay monitors changes in neurite length and number of branch points
(both total number of branch points and number formed per neuron) in human neural network cells (hNN)
derived from human embryonic stem cells. These different measurements are assessed following 48 hour
incubations with test chemical to help predict the potential to disrupt neural network formation in developing
human embryos.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Additionally, this assay was annotated to the intended target family of neurodevelopment.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
5
Standard minimum concentration tested:
1.2 nM
Key positive control:
NA
Target (nominal) number of replicates:
4
Standard maximum concentration tested:
100 nM
Neutral vehicle control:
DMSO
NA
-------
3.2 Data Analysis: High content imaging assessed the neurite outgrowth endpoints: neurite total length per neuron
(urn), neurite count per neuron, and branch points per neurite. Plate-level raw data, provided by each assay
source, were received by EPA from each contractor and analyzed using the ToxCast Pipeline (tcpl). Normalized
response values for each assay endpoint were calculated as resp = log2(rval/bval) where rval, bval, and pval
correspond to the raw value, the plate level DMSO control median, and the plate level positive/negative control
median, respectively.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
1: none (Use corrected response value (cval) as is; cval = cval. No additional mc2 methods needed for
component-specific corrections.)
Level 3: Endpoint-specific normalization include:
7: resp.log2 (Transform the response values to log-scale (base 2).), 9: resp.fc (Calculate the normalized
response (resp) as the fold change, i.e. the ratio of the corrected (cval) and baseline (bval) values; resp =
cval/bal.), 11: bval.apid.nwlls.med (Calculate the baseline value (bval) as the plate-wise median, by assay
plate ID (apid), of the corrected values (cval) for neutral control wells (wilt = n).)
Level 4: Baseline and required tcplFit2 parameters defined by:
2: bmad.aeid.lowconc.nwells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (resp) for neutral control wells (wilt = n). Calculate one
standard deviation of the normalized response for neutral control wells (wilt = n); onesd = sqrt(sum((resp
- mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
1: bmad3 (Add a cutoff value of 3 multiplied by the baseline median absolute deviation (bmad) as defined
at Level 4.), 28: ow_bidirectional_gain (Multiply winning model hitcall (hitc) by -1 for models fit in the
negative analysis direction. Typically used for endpoints where only positive responses are biologically
relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
-------
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 60 Number of chemicals tested: 58
ACTIVITY HIT CALLS
Active hit count: hitc>0.9 Inactive hit count: 0
-------
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.16
Neutral control median absolute deviation, by plate: nmad 0.03
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 18.15%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
-------
{(mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 2.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: Zurlinden TJ, Saili KS, Baker NC, Toimela T, Heinonen T, Knudsen TB. A cross-platform approach to
characterize and screen potential neurovascular unit toxicants. Reprod Toxicol. 2020 Jun 24;96:300-315. doi:
10.1016/j.reprotox.2020.06.010. Epub ahead of print. PMID: 32590145.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
-------
researcli/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 63
ATG_Ahr_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human Aryl Hydrocarbon Receptor (Ahr)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_Ahr_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_Ahr_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_Ahr_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene AHR. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is basic helix-loop-helix protein.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene
response element AhRE, which is responsive to the endogenous human aryl hydrocarbon receptor
[GeneSymbokAHR | GenelD:196 | Uniprot_SwissProt_Accession:P35869],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
6-formylindolo carbazole
Baseline median absolute deviation for the assay (bmad): 0.197
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Response cutoff threshold used to determine hit calls: 0.983
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
774
Inactive hit count: 0
-------
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
412
537
1214
quadratic-polynomialfpoly2) model: 680
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
47
477
809
16
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
-------
Neutral control well median response value, by plate: nmed
0.313
Neutral control median absolute deviation, by plate: nmad 0.129
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 41.21%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 477.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
-------
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:64
ATG_AP_1_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human AP-1 Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_AP_1_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_AP_1_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_AP_1_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene FOS and JUN. Furthermore, this assay endpoint can be
referred to as a primary readout, because this assay has produced multiple assay endpoints where this one
serves a reporter gene function. To generalize the intended target to other relatable targets, this assay
endpoint is annotated to the dna binding intended target family, where the subfamily is basic leucine zipper.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene AP-1,
which is responsive to the endogenous human FBJ murine osteosarcoma viral oncogene homolog and jun proto-
oncogene [GeneSymbol:FOS & JUN | GenelD:2353 & 3725 | Uniprot_SwissProt_Accession:P01100 & P05412],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2
Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.121
Response cutoff threshold used to determine hit calls: 0.604
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
738
3776
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
142
328
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
366
1754
quadratic-polynomialfpoly2) model: 849
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
94
20
578
331
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.451
Neutral control median absolute deviation, by plate: nmad 0.151
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 33.53%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 331.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 65
ATG_AP_2_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human AP-2 Gene Activation Assay
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_AP_2_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_AP_2_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_AP_2_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene TFAP2A and TFAP2B and TFAP2D. Furthermore, this
assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the dna binding intended target family, where the subfamily is basic
helix-turn-helix leucine zipper.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene AP-2,
which is responsive to the endogenous human transcription factor AP-2 alpha (activating enhancer binding
protein 2 alpha) and transcription factor AP-2 beta (activating enhancer binding protein 2 beta) and transcription
factor AP-2 delta (activating enhancer binding protein 2 delta) [GeneSymbol:TFAP2A & TFAP2B & TFAP2D |
GenelD:7020 & 7021 & 83741 | Uniprot_SwissProt_Accession:P05549 & Q92481 & Q7Z6R9],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
-------
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
-------
Baseline median absolute deviation for the assay (bmad): 0.064
Response cutoff threshold used to determine hit calls: 0.321
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
-------
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
213
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.29
Neutral control median absolute deviation, by plate: nmad 0.157
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 12.18%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 235.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 66
ATG_BRE_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human BRE Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_BRE_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_BRE_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_BRE_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene SMAD1. Furthermore, this assay endpoint can be
referred to as a primary readout, because this assay has produced multiple assay endpoints where this one
serves a reporter gene function. To generalize the intended target to other relatable targets, this assay
endpoint is annotated to the dna binding intended target family, where the subfamily is Smad protein.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene
response element BRE, which is responsive to the endogenous human SMAD family member 1
[GeneSymbokSMADl | GenelD:4086 | Uniprot_SwissProt_Accession:Q15797],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.183
Response cutoff threshold used to determine hit calls: 0.916
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
456
4058
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
123
455
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
383
1487
quadratic-polynomialfpoly2) model: 775
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
56
22
876
285
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.259
Neutral control median absolute deviation, by plate: nmad 0.105
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 40.64%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 285.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 67
ATG_C_EBP_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human CCAAT/enhancer binding protein (C/EBP), beta
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_C_EBP_CIS is one of 52
assay component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of
mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_C_EBP_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_C_EBP_CIS, was
analyzed with bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a
type of inducible reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the
reporter gene at the transcription factor-level as they relate to the gene CEBPB. Furthermore, this assay
endpoint can be referred to as a primary readout, because this assay has produced multiple assay endpoints
where this one serves a reporter gene function. To generalize the intended target to other relatable targets,
this assay endpoint is annotated to the dna binding intended target family, where the subfamily is basic leucine
zipper.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene C/EBP,
which is responsive to the endogenous human CCAAT/enhancer binding protein (C/EBP), beta
[GeneSymbokCEBPB | GenelD:1051 | Uniprot_SwissProt_Accession:P17676],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.1
Response cutoff threshold used to determine hit calls: 0.502
-------
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
268
Inactive hit count: Oihitc 0.9
4246
WINING MODEL SELECTION
NA hit count: hitc^O
0
Number of sample-assay endpoints with winning hill model:
136
602
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
342
1411
quadratic-polynomialfpoly2) model: 757
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
41
9
938
226
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.157
Neutral control median absolute deviation, by plate: nmad 0.274
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 23.71%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 226.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 68
ATG_CMV_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human CMV Gene Activation (Internal Marker)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_CMV_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_CMV_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_CMV_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the background control
at the transcription factor-level as they relate to the gene . Furthermore, this assay endpoint can be referred
to as a secondary readout, because this assay has produced multiple assay endpoints where this one serves a
background control function. To generalize the intended target to other relatable targets, this assay endpoint
is annotated to the background measurement intended target family, where the subfamily is internal marker.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene CMV,
which is used as an internal marker.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.135
Response cutoff threshold used to determine hit calls: 0.675
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of background measurement.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
675
3839
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
145
294
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
371
1859
quadratic-polynomialfpoly2) model: 798
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
63
321
596
15
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.451
Neutral control median absolute deviation, by plate: nmad 0.138
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 30.57%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 321.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 69
ATG_CRE_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human cAMP responsive element binding protein 3
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_CRE_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_CRE_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_CRE_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene CREB3. Furthermore, this assay endpoint can be
referred to as a primary readout, because this assay has produced multiple assay endpoints where this one
serves a reporter gene function. To generalize the intended target to other relatable targets, this assay
endpoint is annotated to the dna binding intended target family, where the subfamily is basic leucine zipper.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene
response element CRE, which is responsive to the endogenous human cAMP responsive element binding protein
3 [GeneSymbol:CREB3 | GenelD:10488 | Uniprot_SwissProt_Accession:043889],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
Forskolin
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.146
Response cutoff threshold used to determine hit calls: 0.73
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
391
4123
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
126
546
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
380
1450
quadratic-polynomialfpoly2) model: 688
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
61
23
925
263
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.519
Neutral control median absolute deviation, by plate: nmad 0.172
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 33.14%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 263.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 70
ATG_DR4_I_XR_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human LXRE Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_DR4_LXR_CIS is one of 52
assay component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of
mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_DR4_LXR_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_DR4_LXR_CIS,
was analyzed with bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using
a type of inducible reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the
reporter gene at the transcription factor-level as they relate to the gene NR1H2 and NR1H3. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene
response element LXRE, which is responsive to the endogenous human nuclear receptor subfamily 1, group H,
member 2 and nuclear receptor subfamily 1, group H, member 3 [GeneSymbol:NRlH2 & NR1H3 | GenelD:7376
& 10062 | Uniprot_SwissProt_Accession:P55055 & Q13133],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.119
Response cutoff threshold used to determine hit calls: 0.597
-------
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
775
Inactive hit count: Oihitc 0.9
3739
WINING MODEL SELECTION
NA hit count: hitc^O
0
Number of sample-assay endpoints with winning hill model:
145
332
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
363
1740
quadratic-polynomialfpoly2) model: 860
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
71
318
615
18
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.996
Neutral control median absolute deviation, by plate: nmad 0.38
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 38.11%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 318.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:71
ATG_DR5_RAR_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human RARE Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_DR5_RAR_CIS is one of 52
assay component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of
mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_DR5_RAR_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_DR5_CIS, was
analyzed with bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a
type of inducible reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the
reporter gene at the transcription factor-level as they relate to the gene RARA and RARB and RARG.
Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has produced
multiple assay endpoints where this one serves a reporter gene function. To generalize the intended target
to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended target family, where
the subfamily is non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene
response element RARE, which is responsive to the endogenous human retinoic acid receptor, alpha and retinoic
acid receptor, beta and retinoic acid receptor, gamma [GeneSymbokRARA & RARB & RARG | GenelD:5914 &
5915 & 5916 | Uniprot_SwissProt_Accession:P10276 & P10826 & P13631],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
9-cis-Retinoic acid
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.12
Response cutoff threshold used to determine hit calls: 0.599
-------
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
396
Inactive hit count: Oihitc 0.9
4118
WINING MODEL SELECTION
NA hit count: hitc^O
0
Number of sample-assay endpoints with winning hill model:
146
503
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
336
1451
quadratic-polynomialfpoly2) model: 896
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
29
7
759
335
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.464
Neutral control median absolute deviation, by plate: nmad 0.104
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 22.37%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 335.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 72
ATG_E_Box_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human Ebox Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_E_Box_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_E_Box_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_E_Box_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene USF1. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is basic helix-loop-helix protein.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene Ebox,
which is responsive to the endogenous human upstream transcription factor 1 [GeneSymbokUSFl |
GenelD:7391 | Uniprot_SwissProt_Accession:P22415],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.095
Response cutoff threshold used to determine hit calls: 0.477
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
364
4150
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
106
357
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
300
1829
quadratic-polynomialfpoly2) model: 820
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
49
8
723
270
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.9
Neutral control median absolute deviation, by plate: nmad 0.261
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 28.99%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 270.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 73
ATG_E2F_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human E2F transcription factor 1 Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_E2F_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_E2F_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_E2F_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene E2F1. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is E2F transcription factor.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene E2F,
which is responsive to the endogenous human E2F transcription factor 1 [GeneSymbol:E2Fl | GenelD:1869 |
Uniprot_SwissProt_Accession:Q01094],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.097
Response cutoff threshold used to determine hit calls: 0.484
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
95
4419
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
53
322
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
213
2256
quadratic-polynomialfpoly2) model: 849
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
29
200
539
1
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.295
Neutral control median absolute deviation, by plate: nmad 0.03
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 10.05%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 200.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:74
ATG_EGR_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human early growth response 1 (EGR1)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_EGR_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_EGR_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_EGR_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene EGR1. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is zinc finger.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene EGR,
which is responsive to the endogenous human early growth response 1 [GeneSymbol:EGRl | GenelD:1958 |
Uniprot_SwissProt_Accession:P18146],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.14
Response cutoff threshold used to determine hit calls: 0.701
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
482
4032
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
101
331
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
366
1946
quadratic-polynomialfpoly2) model: 692
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
61
289
661
15
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.362
Neutral control median absolute deviation, by plate: nmad 0.076
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 20.89%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 289.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 75
ATG_ERE_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for Estrogen Response Element (ERE)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_ERE_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_ERE_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_ERE_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene ESR1. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the nuclear receptor intended target family, where the subfamily is steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene
response element ERE, which is responsive to the endogenous human estrogen receptor 1 [GeneSymbol:ESRl
| GenelD:2099 | Uniprot_SwissProt_Accession:P03372],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
17b-Estradiol
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.101
Response cutoff threshold used to determine hit calls: 0.507
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
ToxCast ER Pathway Model: Estrogen receptor assays used in ToxCast ER Pathway model
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.), 18: resp.shiftneg.3bmad (Shift all
the normalized response values (resp) less than -3 multiplied by the baseline median absolute deviation
(bmad) to 0; if resp < -3*bmad, resp = 0.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
1102
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.882
Neutral control median absolute deviation, by plate: nmad 0.211
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 23.87%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 395.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 76
ATG_Ets_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human Ets Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_Ets_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_Ets_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_Ets_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene ETS1. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is winged helix-turn-helix.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene Ets,
which is responsive to the endogenous human v-ets avian erythroblastosis virus E26 oncogene homolog 1
[GeneSymbokETSl | GenelD:2113 | Uniprot_SwissProt_Accession:P14921],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.073
Response cutoff threshold used to determine hit calls: 0.364
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
180
4334
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
66
311
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
254
2226
quadratic-polynomialfpoly2) model: 758
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
37
2
616
192
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.707
Neutral control median absolute deviation, by plate: nmad 0.07
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 9.86%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 192.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 77
ATG_FoxA2_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human forkhead box A2 (FOXA2)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_FoxA2_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_FoxA2_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_FoxA2_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene FOXA2. Furthermore, this assay endpoint can be
referred to as a primary readout, because this assay has produced multiple assay endpoints where this one
serves a reporter gene function. To generalize the intended target to other relatable targets, this assay
endpoint is annotated to the dna binding intended target family, where the subfamily is forkhead box protein.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene FoxA,
which is responsive to the endogenous human forkhead box A2 [GeneSymbol:FOXA2 | GenelD:3170 |
Uniprot_SwissProt_Accession:Q9Y261],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.097
Response cutoff threshold used to determine hit calls: 0.486
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
98
4416
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
70
452
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
238
1865
quadratic-polynomialfpoly2) model: 682
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
33
208
913
1
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.798
Neutral control median absolute deviation, by plate: nmad 0.107
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 13.38%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 208.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 78
ATG_FoxO_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human FoxO Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_FoxO_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_FoxO_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_FoxO_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene FOXOl and F0X03. Furthermore, this assay endpoint
can be referred to as a primary readout, because this assay has produced multiple assay endpoints where this
one serves a reporter gene function. To generalize the intended target to other relatable targets, this assay
endpoint is annotated to the dna binding intended target family, where the subfamily is forkhead box protein.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene FoxO,
which is responsive to the endogenous human forkhead box 01 and forkhead box 03 [GeneSymbokFOXOl &
F0X03 | GenelD:2308 & 2309 | Uniprot_SwissProt_Accession:Q12778 & 043524],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. The HepG2 cell line is a permanent cell culture isolated
from the liver tumor lobectomy of a 15-yr-old Caucasian male from Argentina in 1975 (Aden et al. 1979), which
has been cloned and transfected with a library of multiple reporter transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2
(Attagene, personal communication). The parental HepG2 cell line has been shown by others to retain the
potential for Phase I and Phase II metabolic responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6,
2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 (Westerink and Schoonen 2007a) with CYP1A2, CYP2C9, CYP2D6, CYP2E1
and CYP3A activities reported at levels similar to human hepatocytes although variable depending on source
and culture conditions (Hewitt and Hewitt 2004); some enzymes (e.g., CYP2W1) have even been observed at
higher rates than in primary hepatocytes (Guo et al. 2010). Phase II enzyme activities identified in HepG2 cells
include SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 (Hart et al. 2010, Walle et al. 2000,
Westerink and Schoonen 2007b) and UGTs (1A1, 1A6 and 2B7) (Hart et al. 2010). In addition, HepG2 cells can
potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme et al. 2010) and
Nrf2, a transcription factor which regulates genes containing antioxidant response element (ARE) sequences in
their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding cassette (ABC)
xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated in part by Nrf2
TF DNA-binding) (Adachi et al. 2007).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.094
Response cutoff threshold used to determine hit calls: 0.471
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
159
4355
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
45
266
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
212
2410
quadratic-polynomialfpoly2) model: 776
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
42
532
1
178
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.385
Neutral control median absolute deviation, by plate: nmad 0.031
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 8.09%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 178.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 79
ATG_GATA_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human GATA Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_GATA_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_GATA_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_GATA_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene GATA1. Furthermore, this assay endpoint can be
referred to as a primary readout, because this assay has produced multiple assay endpoints where this one
serves a reporter gene function. To generalize the intended target to other relatable targets, this assay
endpoint is annotated to the dna binding intended target family, where the subfamily is GATA proteins.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene GATA,
which is responsive to the endogenous human GATA binding protein 1 (globin transcription factor 1)
[GeneSymbokGATAl | GenelD:2623 | Uniprot_SwissProt_Accession:P15976],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.066
Response cutoff threshold used to determine hit calls: 0.329
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
114
4400
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
37
304
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
2436
193
quadratic-polynomialfpoly2) model: 753
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
0
31
549
159
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.737
Neutral control median absolute deviation, by plate: nmad 0.049
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 6.64%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 159.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 80
ATG_GU_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human GLI Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_GU_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_GU_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_GU_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene GUI. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is zinc finger.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene GLI,
which is responsive to the endogenous human GLI family zinc finger 1 [GeneSymbokGLIl | GenelD:2735 |
Uniprot_SwissProt_Accession:P08151],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.077
Response cutoff threshold used to determine hit calls: 0.385
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
203
4311
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
48
307
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
2415
178
quadratic-polynomialfpoly2) model: 814
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
492
0
56
152
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.529
Neutral control median absolute deviation, by plate: nmad 0.043
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 8.13%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 152.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:81
ATG_GRE_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human GRE Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_GRE_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_GRE_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_GRE_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene NR3C1. Furthermore, this assay endpoint can be
referred to as a primary readout, because this assay has produced multiple assay endpoints where this one
serves a reporter gene function. To generalize the intended target to other relatable targets, this assay
endpoint is annotated to the nuclear receptor intended target family, where the subfamily is steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene
response element GRE, which is responsive to the endogenous human nuclear receptor subfamily 3, group C,
member 1 (glucocorticoid receptor) [GeneSymbol:NR3Cl | GenelD:2908 |
Uniprot_SwissProt_Accession:P04150],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
Dexamethasone
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.083
Response cutoff threshold used to determine hit calls: 0.416
-------
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
147
Inactive hit count: Oihitc 0.9
4367
WINING MODEL SELECTION
NA hit count: hitc^O
0
Number of sample-assay endpoints with winning hill model:
43
289
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
240
2388
quadratic-polynomialfpoly2) model: 781
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
45
2
501
173
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.651
Neutral control median absolute deviation, by plate: nmad 0.108
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 16.63%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 173.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 82
ATG_HIFla_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human hypoxia inducible factor 1 (HIF1A)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_HIFla_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_HIFla_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_HIFla_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene HIF1A. Furthermore, this assay endpoint can be
referred to as a primary readout, because this assay has produced multiple assay endpoints where this one
serves a reporter gene function. To generalize the intended target to other relatable targets, this assay
endpoint is annotated to the dna binding intended target family, where the subfamily is basic helix-loop-helix
protein.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene HIFla,
which is responsive to the endogenous human hypoxia inducible factor 1, alpha subunit (basic helix-loop-helix
transcription factor) [GeneSymbokHIFIA | GenelD:3091 | Uniprot_SwissProt_Accession:Q16665],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.155
Response cutoff threshold used to determine hit calls: 0.773
-------
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
455
Inactive hit count: Oihitc 0.9
4059
WINING MODEL SELECTION
NA hit count: hitc^O
0
Number of sample-assay endpoints with winning hill model:
104
339
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
301
1727
quadratic-polynomialfpoly2) model: 957
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
42
6
599
387
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.422
Neutral control median absolute deviation, by plate: nmad 0.082
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 19.32%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 387.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 83
ATG_HNF6_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human HNF6 Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_HNF6_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_HNF6_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_HNF6_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene ONECUT1. Furthermore, this assay endpoint can be
referred to as a primary readout, because this assay has produced multiple assay endpoints where this one
serves a reporter gene function. To generalize the intended target to other relatable targets, this assay
endpoint is annotated to the dna binding intended target family, where the subfamily is homeobox protein.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene HNF6,
which is responsive to the endogenous human one cut homeobox 1 [GeneSymbol:ONECUTl | GenelD:3175 |
Uniprot_SwissProt_Accession:Q9UBC0],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.072
Response cutoff threshold used to determine hit calls: 0.36
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
158
4356
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
34
285
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
205
2357
quadratic-polynomialfpoly2) model: 877
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
489
50
2
163
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.721
Neutral control median absolute deviation, by plate: nmad 0.053
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 7.4%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 163.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:84
ATG_HSE_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human heat shock transcription factor 1 (HSE)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_HSE_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_HSE_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_HSE_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene HSF1. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is heat shock protein.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene HSE,
which is responsive to the endogenous human heat shock transcription factor 1 [GeneSymbokHSFl |
GenelD:3297 | Uniprot_SwissProt_Accession:Q00613],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
Geldanamycin
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.096
Response cutoff threshold used to determine hit calls: 0.478
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
513
4001
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
59
294
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
246
2055
quadratic-polynomialfpoly2) model: 881
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
218
3
594
112
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.582
Neutral control median absolute deviation, by plate: nmad 0.065
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 11.21%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 218.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 85
ATG_IR1_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human IR1 Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_IR1_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_IR1_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_IR1_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene NR1H4. Furthermore, this assay endpoint can be
referred to as a primary readout, because this assay has produced multiple assay endpoints where this one
serves a reporter gene function. To generalize the intended target to other relatable targets, this assay
endpoint is annotated to the nuclear receptor intended target family, where the subfamily is non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene IR1,
which is responsive to the endogenous human nuclear receptor subfamily 1, group H, member 4
[GeneSymbol:NRlH4 | GenelD:9971 | Uniprot_SwissProt_Accession:Q96Rll],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.099
Response cutoff threshold used to determine hit calls: 0.493
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
394
4120
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
100
367
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
289
1791
quadratic-polynomialfpoly2) model: 874
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
43
6
704
288
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.605
Neutral control median absolute deviation, by plate: nmad 0.17
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 28.18%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 288.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 86
ATG_ISRE_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human interferon regulatory factor 1
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_ISRE_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_ISRE_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_ISRE_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene IRF1. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is interferon regulatory factors.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene
response element ISRE, which is responsive to the endogenous human interferon regulatory factor 1
[GeneSymboklRFl | GenelD:3659 | Uniprot_SwissProt_Accession:P10914],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.142
Response cutoff threshold used to determine hit calls: 0.712
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
490
4024
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
84
309
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
320
2028
quadratic-polynomialfpoly2) model: 847
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
80
237
544
13
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.313
Neutral control median absolute deviation, by plate: nmad 0.099
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 31.74%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 237.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 87
ATG_M_06_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for M06 Gene Activation (Internal Marker)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_M_06_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_M_06_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_M_06_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the background control
at the transcription factor-level as they relate to the gene . Furthermore, this assay endpoint can be referred
to as a secondary readout, because this assay has produced multiple assay endpoints where this one serves a
background control function. To generalize the intended target to other relatable targets, this assay endpoint
is annotated to the background measurement intended target family, where the subfamily is internal marker.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene M_06,
which is used as an internal marker.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.024
Response cutoff threshold used to determine hit calls: 0.263
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of background measurement.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
14
4500
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
18
339
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
200
2448
quadratic-polynomial(poly2) model: 610
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
24
676
1
146
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.047
Neutral control median absolute deviation, by plate: nmad 0.027
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 2.55%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 146.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 88
ATG_M_19_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for M19 Gene Activation (Internal Marker)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_M_19_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_M_19_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_M_19_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the background control
at the transcription factor-level as they relate to the gene . Furthermore, this assay endpoint can be referred
to as a secondary readout, because this assay has produced multiple assay endpoints where this one serves a
background control function. To generalize the intended target to other relatable targets, this assay endpoint
is annotated to the background measurement intended target family, where the subfamily is internal marker.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene M_19,
which is used as an internal marker.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.055
Response cutoff threshold used to determine hit calls: 0.277
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of background measurement.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
57
4457
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
22
355
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
2484
187
quadratic-polynomialfpoly2) model: 585
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
32
650
1
146
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.906
Neutral control median absolute deviation, by plate: nmad 0.052
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 5.73%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 146.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 89
ATG_M_32_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for M32 Gene Activation (Internal Marker)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_M_32_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_M_32_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_M_32_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the background control
at the transcription factor-level as they relate to the gene . Furthermore, this assay endpoint can be referred
to as a secondary readout, because this assay has produced multiple assay endpoints where this one serves a
background control function. To generalize the intended target to other relatable targets, this assay endpoint
is annotated to the background measurement intended target family, where the subfamily is internal marker.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene M_32,
which is used as an internal marker.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.053
Response cutoff threshold used to determine hit calls: 0.265
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of background measurement.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
70
4444
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
58
428
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
2226
189
quadratic-polynomialfpoly2) model: 583
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
803
3
17
155
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 3.87
Neutral control median absolute deviation, by plate: nmad 0.213
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 5.52%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 155.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 90
ATG_M_61_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for M61 Gene Activation (Internal Marker)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_M_61_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_M_61_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_M_61_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the background control
at the transcription factor-level as they relate to the gene . Furthermore, this assay endpoint can be referred
to as a secondary readout, because this assay has produced multiple assay endpoints where this one serves a
background control function. To generalize the intended target to other relatable targets, this assay endpoint
is annotated to the background measurement intended target family, where the subfamily is internal marker.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene M_61,
which is used as an internal marker.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.024
Response cutoff threshold used to determine hit calls: 0.263
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of background measurement.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
14
4500
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
18
340
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
200
2440
quadratic-polynomial(poly2) model: 610
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
24
680
1
149
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.047
Neutral control median absolute deviation, by plate: nmad 0.027
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 2.55%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 149.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:91
ATG_MRE_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human MRE Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_MRE_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_MRE_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_MRE_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene MTF1. Furthermore, this assay endpoint can be
referred to as a primary readout, because this assay has produced multiple assay endpoints where this one
serves a reporter gene function. To generalize the intended target to other relatable targets, this assay
endpoint is annotated to the dna binding intended target family, where the subfamily is zinc finger.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene
response element MRE, which is responsive to the endogenous human metal-regulatory transcription factor 1
[GeneSymbokMTFl | GenelD:4520 | Uniprot_SwissProt_Accession:Q14872],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.121
Response cutoff threshold used to determine hit calls: 0.604
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
793
3721
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
128
299
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
371
1850
quadratic-polynomialfpoly2) model: 707
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
44
293
627
143
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.533
Neutral control median absolute deviation, by plate: nmad 0.104
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 19.47%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 293.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 92
ATG_Myb_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human Myb Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_Myb_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_Myb_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_Myb_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene MYB. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is MYB proteins.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene Myb,
which is responsive to the endogenous human v-myb avian myeloblastosis viral oncogene homolog
[GeneSymbokMYB | GenelD:4602 | Uniprot_SwissProt_Accession:P10242],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2
Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.059
Response cutoff threshold used to determine hit calls: 0.296
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
126
4388
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
29
285
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
2590
165
quadratic-polynomialfpoly2) model: 736
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
471
32
2
152
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.898
Neutral control median absolute deviation, by plate: nmad 0.049
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 5.45%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 152.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 93
ATG_Myc_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human Myc Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_Myc_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_Myc_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_Myc_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene MYC. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is basic helix-loop-helix leucine zipper.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene Myc,
which is responsive to the endogenous human v-myc avian myelocytomatosis viral oncogene homolog
[GeneSymbokMYC | GenelD:4609 | Uniprot_SwissProt_Accession:P01106],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.098
Response cutoff threshold used to determine hit calls: 0.489
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
195
4319
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
54
330
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
270
2111
quadratic-polynomialfpoly2) model: 808
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
39
3
642
205
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.005
Neutral control median absolute deviation, by plate: nmad 0.169
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 16.82%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 205.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointlD:94
ATG_NF_kB_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human NF-kB Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_NF_kB_CIS is one of 52
assay component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of
mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_NF_kB_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_NF_kB_CIS, was
analyzed with bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a
type of inducible reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the
reporter gene at the transcription factor-level as they relate to the gene NFKB1. Furthermore, this assay
endpoint can be referred to as a primary readout, because this assay has produced multiple assay endpoints
where this one serves a reporter gene function. To generalize the intended target to other relatable targets,
this assay endpoint is annotated to the dna binding intended target family, where the subfamily is NF-kappa B.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene NF-kB,
which is responsive to the endogenous human nuclear factor of kappa light polypeptide gene enhancer in B-
cells 1 [GeneSymbol:NFKBl | GenelD:4790 | Uniprot_SwissProt_Accession:P19838],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.128
Response cutoff threshold used to determine hit calls: 0.642
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
354
4160
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
131
377
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
350
1666
quadratic-polynomialfpoly2) model: 845
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
51
9
744
289
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.522
Neutral control median absolute deviation, by plate: nmad 0.166
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 31.81%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 289.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 95
ATG_NFI_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human NFI Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_NFI_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_NFI_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_NFI_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene NFIA. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is nuclear factor I.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene NFI,
which is responsive to the endogenous human nuclear factor l/A [GeneSymbol:NFIA | GenelD:4774 |
Uniprot_SwissProt_Accession:Q12857],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.085
Response cutoff threshold used to determine hit calls: 0.425
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
283
4231
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
61
323
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
294
1993
quadratic-polynomialfpoly2) model: 867
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
4
60
634
226
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.872
Neutral control median absolute deviation, by plate: nmad 0.142
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 16.32%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 226.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 96
ATG_NRF1_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human NRF1 Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_NRF1_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_NRF1_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_NRF1_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene NRF1. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is nuclear respiratory factors.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene NRF1,
which is responsive to the endogenous human nuclear respiratory factor 1 [GeneSymbokNRFl | GenelD:4899 |
Uniprot_SwissProt_Accession:Q16656],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.081
Response cutoff threshold used to determine hit calls: 0.403
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
198
4316
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
65
351
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
245
2082
quadratic-polynomialfpoly2) model: 816
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
47
3
653
200
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.753
Neutral control median absolute deviation, by plate: nmad 0.098
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 12.99%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 200.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 97
ATG_N RF2_ARE_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human ARE Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_NRF2_ARE_CIS is one of 52
assay component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of
mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_NRF2_ARE_CIS was analyzed into 1 assay endpoint. This assay endpoint,
ATG_NRF2_ARE_CIS, was analyzed with bidirectional fitting relative to DMSO as the negative control and
baseline of activity. Using a type of inducible reporter, measures of mRNA for gain or loss-of-signal activity
can be used to understand the reporter gene at the transcription factor-level as they relate to the gene NFE2L2.
Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has produced
multiple assay endpoints where this one serves a reporter gene function. To generalize the intended target
to other relatable targets, this assay endpoint is annotated to the dna binding intended target family, where the
subfamily is basic leucine zipper.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene
response element ARE, which is responsive to the endogenous human nuclear factor, erythroid 2-like 2
[GeneSymbol:NFE2L2 | GenelD:4780 | Uniprot_SwissProt_Accession:Q16236],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.156
Response cutoff threshold used to determine hit calls: 0.78
-------
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
1559
Inactive hit count: Oihitc 0.9
2955
WINING MODEL SELECTION
NA hit count: hitc^O
0
Number of sample-assay endpoints with winning hill model:
267
259
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
480
1323
quadratic-polynomialfpoly2) model: 879
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
64
51
629
510
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.254
Neutral control median absolute deviation, by plate: nmad 0.073
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 28.6%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 510.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 98
ATG_Oct_M LP_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human Oct Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_Oct_MLP_CIS is one of 52
assay component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of
mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_Oct_MLP_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_Oct_MLP_CIS,
was analyzed with bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using
a type of inducible reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the
reporter gene at the transcription factor-level as they relate to the gene POU2F1. Furthermore, this assay
endpoint can be referred to as a primary readout, because this assay has produced multiple assay endpoints
where this one serves a reporter gene function. To generalize the intended target to other relatable targets,
this assay endpoint is annotated to the dna binding intended target family, where the subfamily is POU domain
protein.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene Oct,
which is responsive to the endogenous human POU class 2 homeobox 1 [GeneSymbol:POU2Fl | GenelD:5451
| Uniprot_SwissProt_Accession:P14859],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.136
Response cutoff threshold used to determine hit calls: 0.68
-------
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
594
Inactive hit count: Oihitc 0.9
3920
WINING MODEL SELECTION
NA hit count: hitc^O
0
Number of sample-assay endpoints with winning hill model:
122
429
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
419
1485
quadratic-polynomialfpoly2) model: 784
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
95
27
788
313
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.711
Neutral control median absolute deviation, by plate: nmad 0.325
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 45.67%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 313.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 99
ATG_p53_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human p53 Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_p53_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_p53_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_p53_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene TP53. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is tumor suppressor.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene p53,
which is responsive to the endogenous human tumor protein p53 [GeneSymbol:TP53 | GenelD:7157 |
Uniprot_SwissProt_Accession:P04637],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2
Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.128
Response cutoff threshold used to determine hit calls: 0.641
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
403
4111
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
105
472
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
321
1613
quadratic-polynomialfpoly2) model: 954
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
71
5
663
258
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.429
Neutral control median absolute deviation, by plate: nmad 0.154
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 35.94%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 258.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 100
ATG_Pax6_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human paired box 6 (PAX6)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_Pax6_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_Pax6_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_Pax6_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene PAX6. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is paired box protein.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene Pax,
which is responsive to the endogenous human paired box 6 [GeneSymbol:PAX6 | GenelD:5080 |
Uniprot_SwissProt_Accession:P26367],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.128
Response cutoff threshold used to determine hit calls: 0.64
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
443
4071
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
105
356
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
336
2053
quadratic-polynomialfpoly2) model: 628
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
81
242
650
11
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.482
Neutral control median absolute deviation, by plate: nmad 0.096
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 19.99%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 242.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 101
ATG_PBREM_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human PBREM Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_PBREM_CIS is one of 52
assay component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of
mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_PBREM_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_PBREM_CIS, was
analyzed with bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a
type of inducible reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the
reporter gene at the transcription factor-level as they relate to the gene NR1I3. Furthermore, this assay
endpoint can be referred to as a primary readout, because this assay has produced multiple assay endpoints
where this one serves a reporter gene function. To generalize the intended target to other relatable targets,
this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is non-
steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene PBREM,
which is responsive to the endogenous human nuclear receptor subfamily 1, group I, member 3
[GeneSymbol:NRll3 | GenelD:9970 | Uniprot_SwissProt_Accession:Q14994],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.096
Response cutoff threshold used to determine hit calls: 0.481
-------
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
353
Inactive hit count: Oihitc 0.9
4161
WINING MODEL SELECTION
NA hit count: hitc^O
0
Number of sample-assay endpoints with winning hill model:
133
457
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
341
1569
quadratic-polynomialfpoly2) model: 813
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
34
9
805
301
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.51
Neutral control median absolute deviation, by plate: nmad 0.071
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 13.95%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 301.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 102
ATG_PPRE_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human Peroxisome Proliferator-activated Response
Element
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_PPRE_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_PPRE_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_PPRE_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene PPARA and PPARD and PPARG. Furthermore, this assay
endpoint can be referred to as a primary readout, because this assay has produced multiple assay endpoints
where this one serves a reporter gene function. To generalize the intended target to other relatable targets,
this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is non-
steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene
response element PPRE, which is responsive to the endogenous human peroxisome proliferator-activated
receptor alpha and peroxisome proliferator-activated receptor delta and peroxisome proliferator-activated
receptor gamma [GeneSymbol:PPARA & PPARD & PPARG | GenelD:5465 & 5467 & 5468 |
Uniprot_SwissProt_Accession:Q07869 & Q03181 & P37231],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
-------
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
Rosiglitazone
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
-------
Baseline median absolute deviation for the assay (bmad): 0.18
Response cutoff threshold used to determine hit calls: 0.9
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
-------
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
613
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.24
Neutral control median absolute deviation, by plate: nmad 0.676
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 54.52%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 291.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 103
ATG_PXRE_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human PXRE Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_PXRE_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_PXRE_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_PXRE_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene NR1I2. Furthermore, this assay endpoint can be
referred to as a primary readout, because this assay has produced multiple assay endpoints where this one
serves a reporter gene function. To generalize the intended target to other relatable targets, this assay
endpoint is annotated to the nuclear receptor intended target family, where the subfamily is non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene
response element PXRE, which is responsive to the endogenous human nuclear receptor subfamily 1, group I,
member 2 [GeneSymbol:NRll2 | GenelD:8856 | Uniprot_SwissProt_Accession:075469],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
Rifampicin
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.135
Response cutoff threshold used to determine hit calls: 0.675
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
2349
2165
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
324
645
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
281
896
quadratic-polynomialfpoly2) model: 1185
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
49
476
594
12
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.449
Neutral control median absolute deviation, by plate: nmad 0.15
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 33.35%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 594.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 104
ATG_RORE_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human RORE Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_RORE_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_RORE_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_RORE_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene RORA and RORB and RORC. Furthermore, this assay
endpoint can be referred to as a primary readout, because this assay has produced multiple assay endpoints
where this one serves a reporter gene function. To generalize the intended target to other relatable targets,
this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene
response element RORE, which is responsive to the endogenous human RAR-related orphan receptor A and
RAR-related orphan receptor B and RAR-related orphan receptor C [GeneSymbokRORA & RORB & RORC |
GenelD:6095 & 6096 & 6097 | Uniprot_SwissProt_Accession:P35398 & Q92753 & P51449],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.138
Response cutoff threshold used to determine hit calls: 0.688
-------
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
363
Inactive hit count: Oihitc 0.9
4151
WINING MODEL SELECTION
NA hit count: hitc^O
0
Number of sample-assay endpoints with winning hill model:
123
395
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
390
1717
quadratic-polynomialfpoly2) model: 687
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
36
8
814
292
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.358
Neutral control median absolute deviation, by plate: nmad 0.087
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 24.43%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 292.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 105
ATG_Sox_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human SOX Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_Sox_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_Sox_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_Sox_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene SOX1. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is HMG box protein.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene SOX,
which is responsive to the endogenous human SRY (sex determining region Y)-box 1 [GeneSymbokSOXl |
GenelD:6656 | Uniprot_SwissProt_Accession:000570],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.088
Response cutoff threshold used to determine hit calls: 0.441
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
218
4296
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
60
327
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
219
2258
quadratic-polynomialfpoly2) model: 815
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
66
3
526
188
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.765
Neutral control median absolute deviation, by plate: nmad 0.076
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 9.88%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 188.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 106
ATG_Spl_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human Spl Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_Spl_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_Spl_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_Spl_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene SP1. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is zinc finger.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene Spl,
which is responsive to the endogenous human Spl transcription factor [GeneSymbol:SPl | GenelD:6667 |
Uniprot_SwissProt_Accession:P08047],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2
Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.106
Response cutoff threshold used to determine hit calls: 0.529
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
357
4157
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
80
307
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
265
2081
quadratic-polynomialfpoly2) model: 853
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
60
234
564
18
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.571
Neutral control median absolute deviation, by plate: nmad 0.119
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 20.77%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 234.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 107
ATG_SREBP_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human SREBP Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_SREBP_CIS is one of 52
assay component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of
mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_SREBP_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_SREBP_CIS, was
analyzed with bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a
type of inducible reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the
reporter gene at the transcription factor-level as they relate to the gene SREBF1. Furthermore, this assay
endpoint can be referred to as a primary readout, because this assay has produced multiple assay endpoints
where this one serves a reporter gene function. To generalize the intended target to other relatable targets,
this assay endpoint is annotated to the dna binding intended target family, where the subfamily is basic helix-
loop-helix leucine zipper.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene SREBP,
which is responsive to the endogenous human sterol regulatory element binding transcription factor 1
[GeneSymbokSREBFl | GenelD:6720 | Uniprot_SwissProt_Accession:P36956],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.073
Response cutoff threshold used to determine hit calls: 0.365
-------
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
523
Inactive hit count: Oihitc 0.9
3991
WINING MODEL SELECTION
NA hit count: hitc^O
0
Number of sample-assay endpoints with winning hill model:
100
348
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
370
1845
quadratic-polynomial(poly2) model: 761
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
51
9
700
278
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.988
Neutral control median absolute deviation, by plate: nmad 0.216
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 10.89%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 278.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 108
ATG_STAT3_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human signal transducer and activator of transcription 3
(STAT3)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_STAT3_CIS is one of 52
assay component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of
mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_STAT3_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_STAT3_CIS, was
analyzed with bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a
type of inducible reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the
reporter gene at the transcription factor-level as they relate to the gene STAT3. Furthermore, this assay
endpoint can be referred to as a primary readout, because this assay has produced multiple assay endpoints
where this one serves a reporter gene function. To generalize the intended target to other relatable targets,
this assay endpoint is annotated to the dna binding intended target family, where the subfamily is stat protein.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene STAT,
which is responsive to the endogenous human signal transducer and activator of transcription 3 (acute-phase
response factor) [GeneSymbol:STAT3 | GenelD:6774 | Uniprot_SwissProt_Accession:P40763],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.088
Response cutoff threshold used to determine hit calls: 0.439
-------
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
84
Inactive hit count: Oihitc 0.9
4430
WINING MODEL SELECTION
NA hit count: hitc^O
0
Number of sample-assay endpoints with winning hill model:
31
332
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
2430
179
quadratic-polynomialfpoly2) model: 765
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
33
530
1
161
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.612
Neutral control median absolute deviation, by plate: nmad 0.043
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 7.03%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 161.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 109
ATG_TA_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human TA Gene Activation (Basal Promoter)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_TA_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_TA_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_TA_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the background control
at the transcription factor-level as they relate to the gene . Furthermore, this assay endpoint can be referred
to as a secondary readout, because this assay has produced multiple assay endpoints where this one serves a
background control function. To generalize the intended target to other relatable targets, this assay endpoint
is annotated to the background measurement intended target family, where the subfamily is internal marker.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene TA,
which is used as a basal promoter.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.152
Response cutoff threshold used to determine hit calls: 0.759
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of background measurement.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
207
4307
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
83
379
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
231
2155
quadratic-polynomialfpoly2) model: 694
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
34
6
660
220
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.286
Neutral control median absolute deviation, by plate: nmad 0.064
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 22.29%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 220.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 110
ATG_TAL_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human TAL Gene Activation (Basal Promoter)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_TAL_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_TAL_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_TAL_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the background control
at the transcription factor-level as they relate to the gene . Furthermore, this assay endpoint can be referred
to as a secondary readout, because this assay has produced multiple assay endpoints where this one serves a
background control function. To generalize the intended target to other relatable targets, this assay endpoint
is annotated to the background measurement intended target family, where the subfamily is internal marker.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene TAL,
which is used as a basal promoter.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.137
Response cutoff threshold used to determine hit calls: 0.687
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of background measurement.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
137
4377
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
45
309
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
206
2289
quadratic-poly nomialfpoly 2) model: 811
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
4
35
554
209
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.213
Neutral control median absolute deviation, by plate: nmad 0.042
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 19.49%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 209.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay EndpointID: 111
ATG_TCF_b_cat_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human TCF/b-cat Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_TCF_b_cat_CIS is one of 52
assay component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of
mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_TCF_b_cat_CIS was analyzed into 1 assay endpoint. This assay endpoint,
ATG_TCF_b_cat_CIS, was analyzed with bidirectional fitting relative to DMSO as the negative control and
baseline of activity. Using a type of inducible reporter, measures of mRNA for gain or loss-of-signal activity
can be used to understand the reporter gene at the transcription factor-level as they relate to the gene TCF7
and TCF7L2 and LEF1 and TCF7L1. Furthermore, this assay endpoint can be referred to as a primary readout,
because this assay has produced multiple assay endpoints where this one serves a reporter gene function. To
generalize the intended target to other relatable targets, this assay endpoint is annotated to the dna binding
intended target family, where the subfamily is HMG box protein.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene TCF/b-
cat, which is responsive to the endogenous human transcription factor 7 (T-cell specific, HMG-box) and
transcription factor 7-like 2 (T-cell specific, HMG-box) and lymphoid enhancer-binding factor 1 and transcription
factor 7-like 1 (T-cell specific, HMG-box) [GeneSymbol:TCF7 & TCF7L2 & LEF1 & TCF7L1 | GenelD:6932 & 6934
& 51176 & 83439 | Uniprot_SwissProt_Accession:P36402 & Q9NQB0 & Q9UJU2 & Q9HCS4],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
-------
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
-------
Baseline median absolute deviation for the assay (bmad): 0.139
Response cutoff threshold used to determine hit calls: 0.695
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
-------
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
492
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.422
Neutral control median absolute deviation, by plate: nmad 0.248
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 58.67%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 257.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 112
ATG_TGFb_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human transforming growth factor (TGF)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_TGFb_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_TGFb_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_TGFb_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene TGFB1. Furthermore, this assay endpoint can be
referred to as a primary readout, because this assay has produced multiple assay endpoints where this one
serves a reporter gene function. To generalize the intended target to other relatable targets, this assay
endpoint is annotated to the growth factor intended target family, where the subfamily is transforming growth
factor beta.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene TGF,
which is responsive to the endogenous human transforming growth factor, beta 1 [GeneSymbokTGFBl |
GenelD:7040 | Uniprot_SwissProt_Accession:P01137],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.156
Response cutoff threshold used to determine hit calls: 0.781
-------
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of growth factor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
213
Inactive hit count: Oihitc 0.9
4301
WINING MODEL SELECTION
NA hit count: hitc^O
0
Number of sample-assay endpoints with winning hill model:
90
484
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
263
1828
quadratic-polynomialfpoly2) model: 749
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
42
2
788
216
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.255
Neutral control median absolute deviation, by plate: nmad 0.058
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 22.68%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 216.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 113
ATG_VDRE_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human VDRE Gene Activation
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_VDRE_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_VDRE_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_VDRE_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene VDR. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the nuclear receptor intended target family, where the subfamily is non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene
response element VDRE, which is responsive to the endogenous human vitamin D (1,25- dihydroxyvitamin D3)
receptor [GeneSymbokVDR | GenelD:7421 | Uniprot_SwissProt_Accession:P11473],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.146
Response cutoff threshold used to determine hit calls: 0.73
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
996
3518
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
239
388
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
397
1393
quadratic-polynomialfpoly2) model: 767
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
47
408
20
803
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.533
Neutral control median absolute deviation, by plate: nmad 0.122
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 22.81%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 408.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 114
ATG_Xbpl_CIS
1. General Information
1.1 Assay Title: Attagene CIS-FACTORIAL HepG2 Assay for human X-box binding protein 1 (XBP1)
1.2 Assay Summary: ATG_CIS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_Xbpl_CIS is one of 52 assay
component(s) measured or calculated from the ATG_CIS assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay component
ATG_Xbpl_CIS was analyzed into 1 assay endpoint. This assay endpoint, ATG_Xbpl_CIS, was analyzed with
bidirectional fitting relative to DMSO as the negative control and baseline of activity. Using a type of inducible
reporter, measures of mRNA for gain or loss-of-signal activity can be used to understand the reporter gene at
the transcription factor-level as they relate to the gene XBP1. Furthermore, this assay endpoint can be referred
to as a primary readout, because this assay has produced multiple assay endpoints where this one serves a
reporter gene function. To generalize the intended target to other relatable targets, this assay endpoint is
annotated to the dna binding intended target family, where the subfamily is basic leucine zipper.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene Xbpl,
which is responsive to the endogenous human X-box binding protein 1 [GeneSymbol:XBPl | GenelD:7494 |
Uniprot_SwissProt_Accession:P17861],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2
Scientific Principles: Within the CIS-FACTORIAL version, RTU transcription is controlled by a cis-regulating
element (promoter). The specificity of a RTU is determined by the presence of a TF binding site in the promoter.
-------
Importantly, as all members of TF family recognize the same binding sequence, CIS-FACTORIAL evaluates
activities of TF families.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
300 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.118
Response cutoff threshold used to determine hit calls: 0.591
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of dna binding.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514 Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Inactive hit count: 0
-------
Active hit count: hitc>0.9
492
4022
0
WINING MODEL SELECTION
Number of sample-assay endpoints with winning hill model:
104
408
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
381
1653
quadratic-polynomialfpoly2) model: 111
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
82
282
762
13
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.715
Neutral control median absolute deviation, by plate: nmad 0.208
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 29.03%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 282.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 115
ATG_AR_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human androgen receptor (AR)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_AR_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_AR_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_AR_TRANS, was
analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene AR. Furthermore, this
assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-AR, also known as human androgen receptor
[GeneSymbokAR | GenelD:367 | Uniprot_SwissProt_Accession:P10275],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
6a-Fluorotestosterone
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.171
-------
Response cutoff threshold used to determine hit calls: 0.857
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
ToxCast AR Pathway Model: Androgen receptor assays used in ToxCast AR Pathway model. See
10.1016/j.yrtph.2020.104764 and 10.1021/acs.chemrestox.6b00347
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
-------
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
-------
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
54
Inactive hit count: 0
-------
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.159
Neutral control median absolute deviation, by plate: nmad 0.268
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 23.15%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) / sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
-------
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 163.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 116
ATG_CAR_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human CAR Gene Activation
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_CAR_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_CAR_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_CAR_TRANS, was
analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene NR1I3. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-CAR, also known as human nuclear receptor subfamily
1, group I, member 3 [GeneSymbol:NRll3 | GenelD:9970 | Uniprot_SwissProt_Accession:Q14994],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.203
-------
Response cutoff threshold used to determine hit calls: 1.013
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
19
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 3.252
Neutral control median absolute deviation, by plate: nmad 1.221
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 37.55%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 214.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 117
ATG_ERa_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human estrogen receptor, alpha (ERa)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_ERa_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_ERa_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_ERa_TRANS, was
analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene ESR1. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-ERa, also known as human estrogen receptor 1
[GeneSymbokESRl | GenelD:2099 | Uniprot_SwissProt_Accession:P03372],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
17b-Estradiol
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.221
-------
Response cutoff threshold used to determine hit calls: 1.107
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
ToxCast ER Pathway Model: Estrogen receptor assays used in ToxCast ER Pathway model
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.), 18: resp.shiftneg.3bmad (Shift all
the normalized response values (resp) less than -3 multiplied by the baseline median absolute deviation
(bmad) to 0; if resp < -3*bmad, resp = 0.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
-------
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
965
Inactive hit count: 0
-------
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.743
Neutral control median absolute deviation, by plate: nmad 0.555
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 31.86%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) / sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
-------
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 434.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 118
ATG_ERRa_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human estrogen-related receptor, alpha (ERRa)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_ERRa_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_ERRa_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_ERRa_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene ESRRA. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-ERRa, also known as human estrogen-related receptor
alpha [GeneSymbol:ESRRA | GenelD:2101 | Uniprot_SwissProt_Accession:P11474],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.234
-------
Response cutoff threshold used to determine hit calls: 1.172
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
6
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.424
Neutral control median absolute deviation, by plate: nmad 0.407
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 28.59%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 135.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 119
ATG_E R Rg_TRAN S
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human estrogen-related receptor, gamma (ERRg)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_ERRg_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_ERRg_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_ERRg_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene ESRRG. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-ERRg, also known as human estrogen-related receptor
gamma [GeneSymbokESRRG | GenelD:2104 | Uniprot_SwissProt_Accession:P62508],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.272
-------
Response cutoff threshold used to determine hit calls: 1.358
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
11
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 3.067
Neutral control median absolute deviation, by plate: nmad 1.462
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 47.66%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 168.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 120
ATG_FXR_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for FXR Gene Activation
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_FXR_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_FXR_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_FXR_TRANS, was
analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene NR1H4. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-FXR, also known as human nuclear receptor subfamily
1, group H, member 4 [GeneSymbol:NRlH4 | GenelD:9971 | Uniprot_SwissProt_Accession:Q96Rll],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
CDCA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.166
-------
Response cutoff threshold used to determine hit calls: 0.828
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
98
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.224
Neutral control median absolute deviation, by plate: nmad 0.293
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 23.93%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 176.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 121
ATG_GAL4_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for GAL4 Gene Activation (Internal Marker)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_GAL4_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_GAL4_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_GAL4_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the background control at the transcription factor-level as they relate to the gene . Furthermore,
this assay endpoint can be referred to as a secondary readout, because this assay has produced multiple assay
endpoints where this one serves a background control function. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the background measurement intended target family,
where the subfamily is baseline control.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-gal4, which is used as an internal marker.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
-------
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.156
Response cutoff threshold used to determine hit calls: 0.778
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of background measurement.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
-------
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4478
Number of chemicals tested: 3827
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
3
Inactive hit count: Oihitc 0.9
2167
WINING MODEL SELECTION
NA hit count: hitc^O
2308
Number of sample-assay endpoints with winning hill model:
43
318
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
2556
159
quadratic-polynomialfpoly2) model: 640
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
488
20
3
199
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.713
Neutral control median absolute deviation, by plate: nmad 0.489
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 28.56%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 199.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 122
ATG_GR_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human glucocorticoid receptor (GR)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_GR_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_GR_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_GR_TRANS, was
analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene NR3C1. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-GR, also known as human nuclear receptor subfamily 3,
group C, member 1 (glucocorticoid receptor) [GeneSymbol:NR3Cl | GenelD:2908 |
Uniprot_SwissProt_Accession:P04150],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.162
-------
Response cutoff threshold used to determine hit calls: 0.811
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
82
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.552
Neutral control median absolute deviation, by plate: nmad 0.367
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 23.64%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 149.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 123
ATG_H N F4a_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human hepatocyte nuclear factor 4, alpha (HNF4a)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_HNF4a_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_HNF4a_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_HNF4a_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene HNF4A. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-HNF4a, also known as human hepatocyte nuclear factor
4, alpha [GeneSymbol:HNF4A | GenelD:3172 | Uniprot_SwissProt_Accession:P41235],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.283
-------
Response cutoff threshold used to determine hit calls: 1.417
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
30
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 2.873
Neutral control median absolute deviation, by plate: nmad 1.353
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 47.08%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 180.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 124
ATG_Hpa5_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human Hpa5 Gene Activation (Basal Promoter)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_Hpa5_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_Hpa5_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_Hpa5_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the background control at the transcription factor-level as they relate to the gene . Furthermore,
this assay endpoint can be referred to as a secondary readout, because this assay has produced multiple assay
endpoints where this one serves a background control function. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the background measurement intended target family,
where the subfamily is internal marker.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-Hpa5, which is used as a basal promoter.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
-------
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
0.129 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
100 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.167
Response cutoff threshold used to determine hit calls: 0.834
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of background measurement.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
-------
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 320
Number of chemicals tested: 310
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
0
Inactive hit count: Oihitc 0.9
187
WINING MODEL SELECTION
NA hit count: hitc^O
133
Number of sample-assay endpoints with winning hill model:
1
26
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
4
188
quadratic-polynomialfpoly2) model: 57
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
0
33
1
10
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 10.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 125
ATG_l_XRa_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human LXRa Gene Activation
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_LXRa_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_LXRa_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_LXRa_TRANS, was
analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene NR1H3. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-LXRa, also known as human nuclear receptor subfamily
1, group H, member 3 [GeneSymbol:NRlH3 | GenelD:10062 | Uniprot_SwissProt_Accession:Q13133],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
T0901317
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.229
-------
Response cutoff threshold used to determine hit calls: 1.144
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
57
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.497
Neutral control median absolute deviation, by plate: nmad 0.373
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 24.91%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 210.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 126
ATG_LXRb_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human LXRb Gene Activation
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_LXRb_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_LXRb_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_LXRb_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene NR1H2. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-LXRb, also known as human nuclear receptor subfamily
1, group H, member 2 [GeneSymbol:NRlH2 | GenelD:7376 | Uniprot_SwissProt_Accession:P55055],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
T0901317
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.199
-------
Response cutoff threshold used to determine hit calls: 0.995
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
46
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.988
Neutral control median absolute deviation, by plate: nmad 0.282
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 28.53%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 229.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 127
ATG_M_06_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for M06 Gene Activation (Internal Marker)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_M_06_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_M_06_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_M_06_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the background control at the transcription factor-level as they relate to the gene . Furthermore,
this assay endpoint can be referred to as a secondary readout, because this assay has produced multiple assay
endpoints where this one serves a background control function. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the background measurement intended target family,
where the subfamily is internal marker.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-M_06, which is used as an internal marker.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
-------
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.035
Response cutoff threshold used to determine hit calls: 0.263
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of background measurement.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
-------
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4798
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
13
Inactive hit count: Oihitc 0.9
3218
WINING MODEL SELECTION
NA hit count: hitc^O
1567
Number of sample-assay endpoints with winning hill model:
26
379
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
2756
148
quadratic-polynomialfpoly2) model: 592
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
0
21
655
169
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.991
Neutral control median absolute deviation, by plate: nmad 0.031
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 3.14%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 169.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 128
ATG_M_19_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for M19 Gene Activation (Internal Marker)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_M_19_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_M_19_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_M_19_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the background control at the transcription factor-level as they relate to the gene . Furthermore,
this assay endpoint can be referred to as a secondary readout, because this assay has produced multiple assay
endpoints where this one serves a background control function. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the background measurement intended target family,
where the subfamily is internal marker.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-M_19, which is used as an internal marker.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
-------
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.07
Response cutoff threshold used to determine hit calls: 0.348
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of background measurement.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
-------
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4798
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
6
Inactive hit count: Oihitc 0.9
2568
WINING MODEL SELECTION
NA hit count: hitc^O
2224
Number of sample-assay endpoints with winning hill model:
38
379
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
2769
146
quadratic-poly nomialfpoly 2) model: 581
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
0
654
15
164
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.019
Neutral control median absolute deviation, by plate: nmad 0.061
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 5.97%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 164.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 129
ATG_M_32_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for M32 Gene Activation (Internal Marker)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_M_32_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_M_32_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_M_32_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the background control at the transcription factor-level as they relate to the gene . Furthermore,
this assay endpoint can be referred to as a secondary readout, because this assay has produced multiple assay
endpoints where this one serves a background control function. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the background measurement intended target family,
where the subfamily is internal marker.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-M_32, which is used as an internal marker.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
-------
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.065
Response cutoff threshold used to determine hit calls: 0.327
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of background measurement.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
-------
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4798
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
11
Inactive hit count: Oihitc 0.9
3013
WINING MODEL SELECTION
NA hit count: hitc^O
1774
Number of sample-assay endpoints with winning hill model:
29
387
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
2713
159
quadratic-polynomialfpoly2) model: 633
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
21
632
1
171
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 4.223
Neutral control median absolute deviation, by plate: nmad 0.266
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 6.3%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 171.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 130
ATG_M_61_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for M61 Gene Activation (Internal Marker)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_M_61_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_M_61_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_M_61_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the background control at the transcription factor-level as they relate to the gene . Furthermore,
this assay endpoint can be referred to as a secondary readout, because this assay has produced multiple assay
endpoints where this one serves a background control function. To generalize the intended target to other
relatable targets, this assay endpoint is annotated to the background measurement intended target family,
where the subfamily is internal marker.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-M_61, which is used as an internal marker.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
-------
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.035
Response cutoff threshold used to determine hit calls: 0.263
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of background measurement.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
-------
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 4798
Number of chemicals tested: 4060
-------
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
11
Inactive hit count: Oihitc 0.9
3221
WINING MODEL SELECTION
NA hit count: hitc^O
1566
Number of sample-assay endpoints with winning hill model:
26
378
gain-loss (gnls) model:
power(pow) model:
linear-polynomial (polyl) model:
2758
148
quadratic-polynomial(poly2) model: 591
exponential-2 (exp2) model:
exponential-3 (exp3) model:
exponential-4 (exp4) model:
exponential-5 (exp5) model:
0
20
656
169
For each concentration series, several point-of-departure (POD) estimates are calculated for the winning model.
The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (bmd),
(2) the activity concentration at 50% of the maximal response (ac50), (3) the activity concentration at the
efficacy cutoff (acc), (4) the activity concentration at 10% of the maximal response, and (5) the concentration at
5% of the maximal response.
3.3 Prediction Model: All statistical analyses were conducted using R programming language, employing the tcpl
package to generate model parameters and confidence intervals. Each chemical concentration response series
is fit to ten predictive models, encoded by the dependency package tcplfit2. The models include the constant,
Hill, gain-loss, two polynomials (i.e. linear and quadratic), power, and four exponential variants. The
polynomials, power, and exponential models are all based on BMDExpress2. The winning model (modi) is
selected based on the lowest AIC value and is used to determine the activity (or hit call) for the concentration
series. If two models have equal AIC values, then the simpler model (i.e. model with fewer parameters) wins. In
invitrodb, levels 4 and 5 capture model fit information. mc4 captures summary values calculated for each
concentration series, whereas mc4_param stores the estimated model parameters for all models fit to data in
long format. mc5 captures the winning model selected and the activity hit call, whereas mc5_param stores the
estimated model parameters for the selected winning model in long format. Activity for each concentration-
response series is determined by calculating a continuous hit-call for the winning model, which is the product
of three proportional weights. The first weight reflects whether there is at least one median response outside
the efficacy cutoff band. Second, the top (or maximal change in the predicted response) is larger than the cutoff.
The last weight reflects whether the AIC of the winning model is less than the constant model, i.e. the winning
model is better fit than a flat line.
The continuous hit call value (hitc), fit category (fitc), and cautionary flags (mc6) can be used to understand the
goodness-of-fit, enabling the user to decide the stringency with which to filter and interpret results. Hitc may
be further binarized into active or inactive, depending on the level of stringency required by the user; herein,
hitc greater than or equal to 0.90 is active, hitc between 0 and 0.90 is inactive, and hitc less than 0 is not
applicable, but different thresholds may be used. Fitc was summarize curve behavior relative activity, efficacy,
and potency comparisons between the AC50 and the concentration range screened. Cautionary flags on fitting
were developed in previous versions of tcpl and have been stored at level 6. These flags are programmatically
generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve
or data. Users may review these filtered groupings to understand high-confidence curves.
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.991
Neutral control median absolute deviation, by plate: nmad 0.031
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 3.14%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 169.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
-------
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 131
ATG_N U RR1_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human NURR1 Gene Activation
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_NURR1_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_NURR1_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_NURR1_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene NR4A2. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-NURR1, also known as human nuclear receptor
subfamily 4, group A, member 2 [GeneSymbol:NR4A2 | GenelD:4929 | Uniprot_SwissProt_Accession:P43354],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.174
-------
Response cutoff threshold used to determine hit calls: 0.868
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
218
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 4.282
Neutral control median absolute deviation, by plate: nmad 1.686
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 39.38%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 260.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 132
ATG_PPARa_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human peroxisome proliferator-activated receptor,
alpha (PPARalpha)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_PPARa_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_PPARa_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_PPARa_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene PPARA. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-PPARa, also known as human peroxisome proliferator-
activated receptor alpha [GeneSymbokPPARA | GenelD:5465 | Uniprot_SwissProt_Accession:Q07869],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
GW0742
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.232
-------
Response cutoff threshold used to determine hit calls: 1.162
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
280
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 3.284
Neutral control median absolute deviation, by plate: nmad 1.296
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 39.45%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 297.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 133
ATG_PPARd_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human peroxisome proliferator-activated receptor,
delta (PPARd)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_PPARd_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_PPARd_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_PPARd_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene PPARD. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-PPARd, also known as human peroxisome proliferator-
activated receptor delta [GeneSymbokPPARD | GenelD:5467 | Uniprot_SwissProt_Accession:Q03181],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
GW7647
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.219
-------
Response cutoff threshold used to determine hit calls: 1.095
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
53
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.79
Neutral control median absolute deviation, by plate: nmad 0.327
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 41.41%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 189.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 134
ATG_PRARg_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human peroxisome proliferator-activated receptor,
gamma (PPARg)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_PPARg_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_PPARg_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_PPARg_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene PPARG. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-PPARg, also known as human peroxisome proliferator-
activated receptor gamma [GeneSymbokPPARG | GenelD:5468 | Uniprot_SwissProt_Accession:P37231],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
Rosiglitazone
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.23
-------
Response cutoff threshold used to determine hit calls: 1.149
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
1084
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 2.558
Neutral control median absolute deviation, by plate: nmad 1.131
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 44.22%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 418.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 135
ATG_PXR_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human PXR Gene Activation
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_PXR_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_PXR_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_PXR_TRANS, was
analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene NR1I2. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-PXR, also known as human nuclear receptor subfamily
1, group I, member 2 [GeneSymbol:NRll2 | GenelD:8856 | Uniprot_SwissProt_Accession:075469],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
T0901317
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.258
-------
Response cutoff threshold used to determine hit calls: 1.291
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
1095
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.49
Neutral control median absolute deviation, by plate: nmad 0.497
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 33.33%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 476.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 136
ATG_RARa_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human retinoic acid receptor, alpha (RXRa)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_RARa_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_RARa_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_RARa_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene RARA. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-RARa, also known as human retinoic acid receptor,
alpha [GeneSymbol:RARA | GenelD:5914 | Uniprot_SwissProt_Accession:P10276],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
Retinoic Acid
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.23
-------
Response cutoff threshold used to determine hit calls: 1.149
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
58
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 8.189
Neutral control median absolute deviation, by plate: nmad 5.048
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 61.65%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 256.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 137
ATG_RARb_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human retinoic acid receptor, beta (RXRb)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_RARb_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_RARb_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_RARb_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene RARB. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-RARb, also known as human retinoic acid receptor, beta
[GeneSymbokRARB | GenelD:5915 | Uniprot_SwissProt_Accession:P10826],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
Retinoic Acid
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.216
-------
Response cutoff threshold used to determine hit calls: 1.079
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
13
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 4.912
Neutral control median absolute deviation, by plate: nmad 2.712
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 55.21%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 214.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 138
ATG_RARgJTRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human retinoic acid receptor, gamma (RXRg)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_RARg_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_RARg_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_RARg_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene RARG. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-RARg, also known as human retinoic acid receptor,
gamma [GeneSymbokRARG | GenelD:5916 | Uniprot_SwissProt_Accession:P13631],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
Retinoic Acid
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.232
-------
Response cutoff threshold used to determine hit calls: 1.159
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
44
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 5.914
Neutral control median absolute deviation, by plate: nmad 2.745
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 46.42%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 244.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 139
ATG_RORb_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human RAR-related orphan receptor B (RORb)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_RORb_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_RORb_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_RORb_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene RORB. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-RORb, also known as human RAR-related orphan
receptor B [GeneSymbokRORB | GenelD:6096 | Uniprot_SwissProt_Accession:Q92753],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.315
-------
Response cutoff threshold used to determine hit calls: 1.574
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
6
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 4.272
Neutral control median absolute deviation, by plate: nmad 2.418
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 56.6%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 179.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 140
ATG_RO Rg_TRAN S
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human RAR-related orphan receptor G (RORg)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_RORg_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_RORg_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_RORg_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene RORC. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-RORg, also known as human RAR-related orphan
receptor C [GeneSymbohRORC | GenelD:6097 | Uniprot_SwissProt_Accession:P51449],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
NA
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.235
-------
Response cutoff threshold used to determine hit calls: 1.174
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
21
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 2.014
Neutral control median absolute deviation, by plate: nmad 0.677
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 33.6%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 160.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 141
ATG_RXRa_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human retinoid X receptor, alpha (RXRa)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_RXRa_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_RXRa_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_RXRa_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene RXRA. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-RXRa, also known as human retinoid X receptor, alpha
[GeneSymbokRXRA | GenelD:6256 | Uniprot_SwissProt_Accession:P19793],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
9-cis-Retinoic acid
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.185
-------
Response cutoff threshold used to determine hit calls: 0.923
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
111
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.088
Neutral control median absolute deviation, by plate: nmad 0.354
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 32.57%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 198.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 142
ATG_RXRb_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human retinoid X receptor, beta (RXRb)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_RXRb_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_RXRb_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_RXRb_TRANS, was
analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene RXRB. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-RXRb, also known as human retinoid X receptor, beta
[GeneSymbokRXRB | GenelD:6257 | Uniprot_SwissProt_Accession:P28702],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
9-cis-Retinoic acid
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.201
-------
Response cutoff threshold used to determine hit calls: 1.003
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
625
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.952
Neutral control median absolute deviation, by plate: nmad 0.692
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 35.44%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 394.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 143
ATG_TH Ra 1_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human thyroid receptor alpha (THRal)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_THRal_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_THRal_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_THRal_TRANS,
was analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene THRA. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-THRa, also known as human thyroid hormone receptor,
alpha [GeneSymbol:THRA | GenelD:7067 | Uniprot_SwissProt_Accession:P10827],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
T3
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.221
-------
Response cutoff threshold used to determine hit calls: 1.104
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
Thyroid Bioactivity: Assays related to the thyroid adverse outcome pathway network
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
87
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.936
Neutral control median absolute deviation, by plate: nmad 0.281
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 30%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 213.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 144
ATG_VDR_TRANS
1. General Information
1.1 Assay Title: Attagene TRANS-FACTORIAL HepG2 Assay for human vitamin D receptor (VDR)
1.2 Assay Summary: ATG_TRANS is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell line,
with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_VDR_TRANS is one of 30
assay component(s) measured or calculated from the ATG_TRANS assay. It is designed to make measurements
of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse
transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. Data from the assay
component ATG_VDR_TRANS was analyzed into 1 assay endpoint. This assay endpoint, ATG_VDR_TRANS, was
analyzed in the positive analysis fitting direction relative to DMSO as the negative control and baseline of
activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity can be used to
understand the reporter gene at the transcription factor-level as they relate to the gene VDR. Furthermore,
this assay endpoint can be referred to as a primary readout, because this assay has produced multiple assay
endpoints where this one serves a reporter gene function. To generalize the intended target to other relatable
targets, this assay endpoint is annotated to the nuclear receptor intended target family, where the subfamily is
non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factor GAL4-VDR, also known as human vitamin D (1,25-
dihydroxyvitamin D3) receptor [GeneSymbokVDR | GenelD:7421 | Uniprot_SwissProt_Accession:P11473],
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
-------
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
1.23 nM
Key positive control:
la,25-Dihydroxyvitamin D3
Baseline median absolute deviation for the assay (bmad): 0.17
Target (nominal) number of replicates:
1
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
-------
Response cutoff threshold used to determine hit calls: 0.852
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
-------
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
-------
Number of samples tested: 4514
Number of chemicals tested: 4060
ACTIVITY HIT CALLS
Active hit count: hitc>0.9
30
Inactive hit count: 0
-------
3.4 Software: The ToxCast Data Analysis Pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores
ToxCast data to populate its linked MySQL database, invitrodb. Data for invitrodb v4.2 was processed using the
tcpl v3.2. See Section 7: Supporting Information on the ToxCast program and tcpl R package.
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.966
Neutral control median absolute deviation, by plate: nmad 0.259
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 26.78%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1- ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrt(mmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
-------
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 133.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1204
ATG_XTT_Cytotoxicity
1. General Information
1.1 Assay Title: Cytotoxicity Assessment in the Attagene TRANS-FACTORIAL HepG2 Assay
1.2 Assay Summary: ATG_XTT_Cytotoxicity is a cell-based, single-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_XTT_Cytotoxicity is
one of one assay component(s) measured or calculated from the ATG_XTT_Cytotoxicity assay. It is designed to
make measurements of cell number, a form of viability reporter, as detected with fluorescence intensity signals
by XTT cytotoxicity assay technology. Data from the assay component ATG_XTT_Cytotoxicity was analyzed into
1 assay endpoint. This assay endpoint, ATG_XTT_Cytotoxicity, was analyzed in the positive analysis fitting
direction relative to DMSO as the negative control and baseline of activity. Using a type of viability reporter,
loss-of-signal activity can be used to understand changes in the viability. Furthermore, this assay endpoint can
be referred to as a primary readout, because the performed assay has only produced 1 assay endpoint. To
generalize the intended target to other relatable targets, this assay endpoint is annotated to the cell cycle
intended target family, where the subfamily is cytotoxicity.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals produced from an enzymatic reaction involving the key
substrate [XTT reagent] are correlated to the viability of the mitochondria in the system.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
-------
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
7
Standard minimum concentration tested:
2.94 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
376 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 15.145
Response cutoff threshold used to determine hit calls: 75.725
Detection technology used: XTT cytotoxicity assay (Fluorescence)
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
-------
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of cell cycle.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
17: sublOO (Center data around zero by subtracting the corrected response value (cval) from 100; 100 -
cval. Typically used if data was pre-normalized around 100 with responses decreasing to 0.)
Level 3: Endpoint-specific normalization include:
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.), 6: resp.multnegl (Multiply the
normalized response value (resp) by -1; -l*resp.)
Level 4: Baseline and required tcplFit2 parameters defined by:
-------
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
2: pc20 (Add a cutoff value of 20. Typically for percent of control data.), 5: bmad5 (Add a cutoff value of
5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 27: ow_bidirectional_loss (Multiply winning model hitcall
(hitc) by -1 for models fit in the positive analysis direction. Typically used for endpoints where only
negative responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 19: viability.gnls (Flag series with
an active hit call (hitc >= 0.9) if denoted as cell viability assay with winning model is gain-loss (gnls); if hitc
>= 0.9, modl=="gnls" and cell_viability_assay == 1, then flag.), 20: no.med.gt.3bmad (Flag series where
no median response values are greater than baseline as defined by 3 times the baseline median absolute
deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg is the
number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 3834 Number of chemicals tested: 3402
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
160
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed NA
Neutral control median absolute deviation, by plate: nmad NA
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 NA%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 239.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1348
ATG_N U R77_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for NUR77 orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_NUR77_TRANS2 is one
of 24 assay components measured from the ATG_TRANS2 assay. It is designed to make measurements of mRNA
induction, a form of inducible reporter, as detected with fluorescence intensity signals by Reverse transcription
polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The assay endpoint,
ATG_NUR77_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as the negative
control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-of-signal activity
can be used to understand the reporter gene at the transcription factor-level as they relate to the gene NR4A1.
Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has produced
multiple assay endpoints where this one serves a reporter gene function. To generalize the intended target to
other relatable targets, this assay endpoint is annotated to the nuclear receptor intended target family, where
the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.172
Response cutoff threshold used to determine hit calls: 0.859
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.013
Neutral control median absolute deviation, by plate: nmad 0.076
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 7.46%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 0.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1349
ATG_GCN F_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for GCNF orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_GCNF_TRANS2 is one
of 24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_GCNF_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as
the negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-
of-signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to
the gene NR6A1. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay
has produced multiple assay endpoints where this one serves a reporter gene function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended
target family, where the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.278
Response cutoff threshold used to determine hit calls: 1.392
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.44
Neutral control median absolute deviation, by plate: nmad 0.073
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 16.53%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 1.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1350
ATG_COU P_TF2_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for COUP-TFII orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_COUP_TF2_TRANS2 is
one of 24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_COUP_TF2_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO
as the negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-
of-signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to
the gene NR2F2. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay
has produced multiple assay endpoints where this one serves a reporter gene function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended
target family, where the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.236
Response cutoff threshold used to determine hit calls: 1.181
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.581
Neutral control median absolute deviation, by plate: nmad 0.064
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 11.1%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 1.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1351
ATG_PN R_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for PNR orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_PNR_TRANS2 is one of
24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_PNR_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as the
negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-of-
signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to the
gene NR2E3. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has
produced multiple assay endpoints where this one serves a reporter gene function. To generalize the intended
target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended target family,
where the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.265
Response cutoff threshold used to determine hit calls: 1.326
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.552
Neutral control median absolute deviation, by plate: nmad 0.07
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 12.76%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 0.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1352
ATG_LRH 1_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for LRH1 orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_LRH1_TRANS2 is one
of 24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_LRH1_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as
the negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-
of-signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to
the gene NR5A2. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay
has produced multiple assay endpoints where this one serves a reporter gene function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended
target family, where the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.332
Response cutoff threshold used to determine hit calls: 1.659
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.692
Neutral control median absolute deviation, by plate: nmad 0.517
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 30.59%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 1.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1353
ATG_Rev_ERB_A_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for Rev-ERB-alpha orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_Rev_ERB_A_TRANS2
is one of 24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_Rev_ERB_A_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO
as the negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-
of-signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to
the gene NR1D1. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay
has produced multiple assay endpoints where this one serves a reporter gene function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended
target family, where the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.183
Response cutoff threshold used to determine hit calls: 0.914
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.829
Neutral control median absolute deviation, by plate: nmad 0.062
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 7.43%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 0.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1354
ATG_H N F4g_TRAN S2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for HNF4g orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_HNF4g_TRANS2 is one
of 24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_HNF4g_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as
the negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-
of-signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to
the gene HNF4G. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay
has produced multiple assay endpoints where this one serves a reporter gene function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended
target family, where the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.369
Response cutoff threshold used to determine hit calls: 1.844
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.554
Neutral control median absolute deviation, by plate: nmad 0.454
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 29.18%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 2.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1355
ATG_ERRb_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for human estrogen-related receptor, beta (ERRb)
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_ERRb_TRANS2 is one
of 24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_ERRb_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as
the negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-
of-signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to
the gene ESRRB. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay
has produced multiple assay endpoints where this one serves a reporter gene function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended
target family, where the subfamily is steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.457
Response cutoff threshold used to determine hit calls: 2.284
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.716
Neutral control median absolute deviation, by plate: nmad 0.478
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 27.87%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 0.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1356
ATG_M R_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for MR orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_MR_TRANS2 is one of
24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_MR_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as the
negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-of-
signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to the
gene NR3C2. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has
produced multiple assay endpoints where this one serves a reporter gene function. To generalize the intended
target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended target family,
where the subfamily is steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.164
Response cutoff threshold used to determine hit calls: 0.82
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
2
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.851
Neutral control median absolute deviation, by plate: nmad 0.098
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 11.51%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 0.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1357
ATG_COU P_TF1_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for COUP-TFI orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_COUP_TFl_TRANS2 is
one of 24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_COUP_TFl_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO
as the negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-
of-signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to
the gene NR2F1. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay
has produced multiple assay endpoints where this one serves a reporter gene function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended
target family, where the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.325
Response cutoff threshold used to determine hit calls: 1.623
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.385
Neutral control median absolute deviation, by plate: nmad 0.035
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 9.05%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 1.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1358
ATG_N0R1_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for NOR1 orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_NORl_TRANS2 is one
of 24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_NORl_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as
the negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-
of-signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to
the gene NR4A3. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay
has produced multiple assay endpoints where this one serves a reporter gene function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended
target family, where the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.177
Response cutoff threshold used to determine hit calls: 0.887
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 2.082
Neutral control median absolute deviation, by plate: nmad 0.207
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 9.93%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 1.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1359
ATG_TR4_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for TR4 orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_TR4_TRANS2 is one of
24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_TR4_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as the
negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-of-
signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to the
gene NR2C2. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has
produced multiple assay endpoints where this one serves a reporter gene function. To generalize the intended
target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended target family,
where the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.2
Response cutoff threshold used to determine hit calls: 1.001
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.748
Neutral control median absolute deviation, by plate: nmad 0.119
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 15.95%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 1.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1360
ATG_DAX1_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for DAX1 orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_DAX1_TRANS2 is one
of 24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_DAX1_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as
the negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-
of-signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to
the gene NR0B1. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay
has produced multiple assay endpoints where this one serves a reporter gene function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended
target family, where the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.194
Response cutoff threshold used to determine hit calls: 0.971
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.777
Neutral control median absolute deviation, by plate: nmad 0.076
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 9.73%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 0.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1361
ATG_Rev_ERB_B_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for Rev-Erb beta orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_Rev_ERB_B_TRANS2
is one of 24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_Rev_ERB_B_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO
as the negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-
of-signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to
the gene NR1D2. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay
has produced multiple assay endpoints where this one serves a reporter gene function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended
target family, where the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.206
Response cutoff threshold used to determine hit calls: 1.029
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.642
Neutral control median absolute deviation, by plate: nmad 0.067
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 10.39%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 0.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1362
ATG_R0Ra_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for ROR alpha orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_RORa_TRANS2 is one
of 24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_RORa_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as
the negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-
of-signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to
the gene RORA. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay
has produced multiple assay endpoints where this one serves a reporter gene function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended
target family, where the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.265
Response cutoff threshold used to determine hit calls: 1.324
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 1.362
Neutral control median absolute deviation, by plate: nmad 0.199
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 14.58%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 1.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1363
ATG_PR_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for human progesterone receptor (PR)
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_PR_TRANS2 is one of
24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_PR_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as the
negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-of-
signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to the
gene PGR. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has
produced multiple assay endpoints where this one serves a reporter gene function. To generalize the intended
target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended target family,
where the subfamily is steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.145
Response cutoff threshold used to determine hit calls: 0.727
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
4
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.917
Neutral control median absolute deviation, by plate: nmad 0.123
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 13.41%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 1.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1364
ATG_RX Rg_TRAN S2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for human retinoid X receptor gamma (RXRg)
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_RXRg_TRANS2 is one
of 24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_RXRg_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as
the negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-
of-signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to
the gene RXRG. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay
has produced multiple assay endpoints where this one serves a reporter gene function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended
target family, where the subfamily is non-steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.203
Response cutoff threshold used to determine hit calls: 1.015
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.872
Neutral control median absolute deviation, by plate: nmad 0.082
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 9.35%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 0.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1365
ATG_SF_1_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for SF-1 orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_SF_1_TRANS2 is one
of 24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_SF_1_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as
the negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-
of-signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to
the gene NR5A1. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay
has produced multiple assay endpoints where this one serves a reporter gene function. To generalize the
intended target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended
target family, where the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.36
Response cutoff threshold used to determine hit calls: 1.802
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 3.499
Neutral control median absolute deviation, by plate: nmad 1.05
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 30.02%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 0.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1366
ATG_SHP_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for SHP orphan gene
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_SHP_TRANS2 is one of
24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_SHP_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as the
negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-of-
signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to the
gene NR0B2. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has
produced multiple assay endpoints where this one serves a reporter gene function. To generalize the intended
target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended target family,
where the subfamily is orphan.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL.
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.194
Response cutoff threshold used to determine hit calls: 0.969
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
0
Inactive hit count: 0
-------
4. Test Method Performance
4.1 Robustness: The following assay performance metrics surmise the robustness of the method i.e. the reliability
of the experimental results and the prediction capability of the model used.
NEUTRAL CONTROL (well type = "n")
Neutral control well median response value, by plate: nmed 0.665
Neutral control median absolute deviation, by plate: nmad 0.048
Coefficient of variation (CV%) in neutral control wells: (nmad/nmed)*100 7.25%
POSITIVE CONTROL (well type = "p")
Positive control well median response value, by plate: pmed NA
Positive control well median absolute deviation, by plate: pmad NA
Z Prime Factor for median positive and neutral control across all plates: NA
(1 - ((3 * (pmad + nmad)) / absfpmed - nmed))
Strictly standardized mean difference (SSMD) for positive compared to neutral control wells: NA
((pmed - nmed) /sqrtfpmad2 + nmad2)
Positive control signal-to-noise: ((pmed-nmed)/nmad) NA
Positive control signal-to-background: (pmed/nmed) NA
NEGATIVE CONTROL (well type = "m")
Negative control well median, by plate: mmed NA
Negative control well median absolute deviation value, by plate: mmad NA
Z Prime Factor for median negative and neutral control across all plates: NA
(1 - ((3 * (mmad + nmad)) / absfmmed - nmed))
Strictly standardized mean difference (SSMD) for negative compared to neutral control wells: NA
((mmed - nmed) /sqrtfmmad2 + nmad2)
Signal-to-noise (median across all plates, using negative control wells): NA
((mmed-nmed)/nmad)
Signal-to-background (median across all plates, using negative control wells): NA
(mmed/nmed)
4.2 Reference Chemical Information: Reference chemical curation is ongoing, and this section will be updated as
more information becomes available.
4.3 Performance Measures and Predictive Capacity: The performance and predictivity for a given assay may be
evaluated with a variety of performance statistics but is dependent upon available data. Predictive capacity (i.e.
false negative, false positive rates) will be assessed when reference chemical information is available. Ideally,
assays will have sufficient data on reference chemicals (i.e. positive and negative controls) to enable estimation
of accuracy statistics, such as sensitivity and specificity.
ToxCast targets may align to a range of event types in the Adverse Outcome Pathway (AOP) framework, however
each assay technology may have specific limitations, which may require user discretion for more complex
interpretations of the data.
The median root mean squared error (RMSE) across all winning models for active hits was calculated as: 1.
4.4 Chemical Library Scope and Limitations: The ToxCast Chemical Library was designed to capture a large spectrum
of structurally and physicochemically diverse compounds. This chemical inventory incorporates toxicity data-
rich chemicals, chemicals spanning major use-categories, and chemicals with exposure potential, including but
-------
not limited to pesticides, antimicrobials, fragrances, green chemistry alternatives, food additives, toxicity
reference compounds and failed pharmaceuticals. In addition to environmental or exposure concerns, chemical
selection criteria also consider practical constraints, such as commercial availability, dimethyl sulfoxide (DMSO)
solubility and stability, and suitability for testing in automated or semi-automated systems (e.g., low volatility
and moderate LogP values). Under these constraints, there were three major, interrelated drivers for chemical
selection: availability of animal toxicity data or mechanistic knowledge, exposure potential, and EPA regulatory
interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate
subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the
chemical landscape to which humans and ecosystems are potentially exposed and for which toxicity data are
mostly lacking. Analytical QC calls per sample and substance should be considered to understand the
applicability domain of the chemicals for screening.
5. Potential Regulatory Applications
5.1 Context of Use: Examples of end use scenarios could include, but are not limited to:
• Support Category Formation and Read-Across: The outcomes from the assay could be used to
substantiate a hypothesis for grouping substances together for the purposes of read-across,
• Priority Setting: The assay might help prioritize substances within an inventory for more detailed
evaluation,
• Screening Level Assessment of a Biomarker or Mechanistic Activity or Response: The screening level
assessment may be sufficient to identify a hazard and provide a gauge of potency; or
• Integrated approaches to testing and assessment (IATA): The assay may form one component of an
IATA.
6. Bibliography: RomanovS, Medvedev A, Gambarian M, Poltoratskaya N, Moeser M, Medvedeva L, Gambarian
M, Diatchenko L, Makarov S. Homogeneous reporter system enables quantitative functional assessment of
multiple transcription factors. Nat Methods. 2008 Mar;5(3):253-60. doi: 10.1038/nmeth.ll86. Epub 2008 Feb
24. PubMed PMID: 18297081., Martin MT, Dix DJ, Judson RS, Kavlock RJ, Reif DM, Richard AM, Rotroff DM,
Romanov S, Medvedev A, Poltoratskaya N, Gambarian M, Moeser M, Makarov SS, Houck KA. Impact of
environmental chemicals on key transcription regulators and correlation to toxicity end points within EPA's
ToxCast program. Chem Res Toxicol. 2010 Mar 15;23(3):578-90. doi: 10.1021/tx900325g. PubMed PMID:
20143881., Medvedev A, Moeser M, Medvedeva L, Martsen E, Granick A, Raines L, Zeng M, Makarov S Jr, Houck
KA, Makarov SS. Evaluating biological activity of compounds by transcription factor activity profiling. Sci Adv.
2018 Sep 26;4(9):eaar4666. doi: 10.1126/sciadv.aar4666. PMID: 30263952; PMCID: PMC6157966.
7. Supporting Information:
More information on the ToxCast program can be found at: https://www.epa.gov/chemical-research/toxicity-
forecasting. The most recent version of downloadable data can be found at: https://www.epa.gov/chemical-
research/exploring-toxcast-data-downloadable-data. The ToxCast Data Analysis Pipeline (tcpl) R package is
available on CRAN or GitHub. Check out tcpl's vignette for comprehensive documentation describing ToxCast
data processing, retrieval, and interpretation.
-------
Assay Endpoint ID: 1367
ATG_ERb_TRANS2
1. General Information
1.1 Assay Title: Attagene TRANS2-FACTORIAL HepG2 Assay for human estrogen receptor, beta (Erb)
1.2 Assay Summary: ATG_TRANS2 is a cell-based, multiplexed-readout assay that uses HepG2, a human liver cell
line, with measurements taken at 24 hours after chemical dosing in a 24-well plate. ATG_ERb_TRANS2 is one of
24 assay components measured or calculated from the ATG_TRANS2 assay. It is designed to make
measurements of mRNA induction, a form of inducible reporter, as detected with fluorescence intensity signals
by Reverse transcription polymerase chain reaction (RT-PCR) and Capillary electrophoresis technology. The
assay endpoint ATG_ERb_TRANS2 was analyzed in the positive analysis fitting direction relative to DMSO as the
negative control and baseline of activity. Using a type of inducible reporter, measures of mRNA for gain-of-
signal activity can be used to understand the reporter gene at the transcription factor-level as they relate to the
gene ESR2. Furthermore, this assay endpoint can be referred to as a primary readout, because this assay has
produced multiple assay endpoints where this one serves a reporter gene function. To generalize the intended
target to other relatable targets, this assay endpoint is annotated to the nuclear receptor intended target family,
where the subfamily is steroidal.
1.3 Date of Document Creation: September 05 2024
1.4 Authors and Contact Information:
US Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure (CCTE)
109 T.W. Alexander Drive (Mail Code D143-02)
Research Triangle Park, NC 27711
1.5 Assay Source: Attagene Inc. is a Contract Research Organization (CRO) offering a unique screening service using
its proprietary multiplexed pathway profiling platform, the FACTORIAL
1.6 Date of Assay Development: For date of assay development, see Section 6: Bibliography.
1.7 References: For complete list of references, see Section 6: Bibliography.
1.8 Proprietary Elements: FACTORIAL is a novel pathway profiling technology trademarked and patented by
Attagene, Inc.
1.9 Assay Throughput: 24-well plate. Transfected HepG2 cells are aliquoted into 24-well microtiter plates and
incubated with test compounds for 24 hours prior to PCR detection of total RNA transcription using capillary
electrophoresis.
l.lOStatus: The assay is fully developed, and data are publicly available in ToxCast's invitroDB.
l.llAbbreviations:
AIC: Akaike Information Criterion ToxCast: US EPA's Toxicity Forecaster Program
AOP: Adverse Outcome Pathway tcpl: ToxCast Data Analysis Pipeline R Package
CV: Coefficient of Variation SSMD: Strictly Standardized Mean Difference
DMSO: Dimethyl Sulfoxide
2. Test Method Description
2.1 Purpose: Changes to fluorescence intensity signals are indicative of inducible changes in transcription factor
activity. This is quantified by the level of mRNA reporter sequence unique to the transfected trans-acting
reporter gene and exogenous transcription factors.
Cellular adaptive response to environmental triggers is often mediated by an intracellular network of regulatory
pathways that modulate gene expression. The signaling pathways interact with DNA by using transcription
factors (TFs), or proteins that bind specific sequences on target genes. Assessing transcription factor activity can
help characterize the functional status and impact of chemical exposures for expression of genes of interest.
2.2 Scientific Principles: Trans-FACTORIAL is an embodiment of the FACTORIAL platform that is designed for
assessing agonist/antagonist properties of compounds across multiple NRs. The trans- FACTORIAL comprises a
library of one-hybrid reporter constructs (trans-RTUs). Atrans-RTU expresses a chimera GAL4-NR protein that
-------
regulates transcription of a reporter sequence. The presence of agonists/antagonists of NR alters the
transactivation function of Gal4-NRand modulates reporter transcription.
2.3 Experimental System: adherent HepG2 cell line used. Hep G2 is an immortal cell line which was derived in 1975
from the liver tissue of a 15-year-old Caucasian male from Argentina with a well-differentiated hepatocellular
carcinoma. These cells are epithelial in morphology, have a modal chromosome number of 55, and are not
tumorigenic in nude mice. This cell line has been cloned and transfected with a library of multiple reporter
transcription units.
2.4 Metabolic Competence: The HepG2 cells used in this assay are variant HG19, a cell line selected for enhanced
xenobiotic metabolism. These cells express 2 to 13 times more cytochrome P450 activity than parental HepG2.
The parental HepG2 cell line has been shown by others to retain the potential for Phase I and Phase II metabolic
responses to xenobiotics, e.g., expression of CYP1A1/2, 2A6, 2B6, 2C8/9, 2C19, 2D6/3A, 2E1, and 3A4/5 with
CYP1A2, CYP2C9, CYP2D6, CYP2E1 and CYP3A activities reported at levels similar to human hepatocytes
although variable depending on source and culture conditions; some enzymes (e.g., CYP2W1) have even been
observed at higher rates than in primary hepatocytes. Phase II enzyme activities identified in HepG2 cells include
SULTS (1A1,1A2,1E1 and 2A1), GSTs (mGST-1, GST ul), NAT1, EPHX1 and UGTs (1A1,1A6 and 2B7). In addition,
HepG2 cells can potentially express xenobiotic regulation activities via functionally active p53 protein (Boehme
et al. 2010) and Nrf2, a transcription factor which regulates genes containing antioxidant response element
(ARE) sequences in their promoters; HepG2 cells also possess the capacity to express a number of ATP-binding
cassette (ABC) xenobiotic export pumps (e.g., ABCC1, C2, C3 and G2 membrane-bound proteins also regulated
in part by Nrf2 TF DNA-binding).
2.5 Exposure Regime: Human liver HepG2 cells are transiently transfected with multiple reporter transcription units
(MRTUs) in 6-well plates using FuGene 6 reagent (Roche; 3 ml FuGene/1 mg DNA). The MRTU constructs are
regulated by a cis-regulating element (promoter). Each RTU expresses a GAL4-UAS that regulates the
transcription of a nearby target reporter gene sequence. A major difference between the CIS and TRANS system
is that in CIS activities of endogenous TFs are measured, whereas the TRANS assay evaluates changes in activities
of exogenous, chimeric NR-Gal4 proteins. Since the HepG2 cell line does not express some nuclear receptors,
the CIS assay cannot be used to evaluate these targets. The transfected cells are pooled, plated onto a 96-well
plates, and exposed to evaluated compound. At the end of 24 hour incubation, total RNA was isolated using
TriZol reagent (Invitrogen). The isolated RNA is then reverse-transcribed using oligo(dT) primer and Mo-MLV
reverse transcriptase (Invitrogen) with DNAse I (Ambion) treatment for 30 min. The one-tenth of the produced
cDNA was amplified by PCR using Taq DNA polymerase (Invitrogen) and two reporter sequence-specific primers.
The PCR products were fluorescently labeled by primer extension with 6-arboxyfluorescein (6-FAM) 5'-labeled
reporter sequence-specific primer (2 min at 95 C, 20 s at 68 C and 10 min at 72 C) and these products were
digested with 5U of Hpal (New England Biolabs) for 2h at 37 C. The fragments were purified using Qiaquick PCR
columns (Qiagen), analyzed on an ABI 3130xL genetic analyzer (Applied Biosystems) with peak positions
identified by using a set of X-rhodamine (ROX)-labeled MapMarkerlOOO molecular weight standards
(BioVentures). The raw capillary electrophoresis data was processed using Attagraph software (Attagene).
ASSAY DESIGN SUMMARY
Nominal number of tested concentrations:
6
Standard minimum concentration tested:
0.0412 nM
Key positive control:
NA
Target (nominal) number of replicates:
2
Standard maximum concentration tested:
10 nM
Neutral vehicle control:
DMSO
Baseline median absolute deviation for the assay (bmad): 0.213
Response cutoff threshold used to determine hit calls: 1.063
Detection technology used: RT-PCR and Capillary electrophoresis (Fluorescence)
-------
2.6 Response: Increased transcription activity is measured by increased fluorescent intensity, specifically the
increased production of mRNA transcripts production in response to active transcription following transcription
factor (TF) interaction with promoter sequences as measured by reverse transcription-polymerase chain
reaction (RT-PCR) and capillary electrophoretic detection of fluorescently labeled mRNA.
2.7 Quality and Acceptance Criteria: Each assay may utilize different acceptance criteria and quality assurance
methods as it pertains to the individual assay platform and implementation. Pre-processing transformations
may indicate issues in plates or wells by setting well quality (wllq) values to 0. Analytical QC calls per sample and
substance can also be considered to understand the applicability domain of the chemicals for screening.
2.8 Technical Limitations: ToxCast data can provide initial (screening) information about the capacity for a chemical
to illicit a biological response; caution is advised with extrapolation of these results to organism-level responses.
The potential for a chemical to elicit adverse health outcomes in living systems is a function of multiple factors,
and this assay is not intended to provide predictive details regarding long term or indirect adverse effects in
complex biological systems but can aid in the prioritization of compound selection for more resource intensive
toxicity studies. See Section 4.4. for more information on the chemical applicability of the assay.
2.9 Related Assays: For related assays, consult the following assay lists or intended target families. This assay is
present in the following assay lists:
NA
Additionally, this assay was annotated to the intended target family of nuclear receptor.
3. Data Interpretation
The ToxCast Data Analysis Pipeline (tcpl) R package includes processing functionality for two screening
paradigms: (1) single-concentration ("SC") and (2) multiple-concentration ("MC") screening. SC screening
consists of testing chemicals at one concentration, often for the purpose of identifying potentially active
chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a
concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. MC data is
the focus of this documentation, with SC data processing metrics to be incorporated in the future.
3.1 Responses captured in prediction model: See Section 2.6 for additional information on responses measured.
3.2 Data Analysis: Readout data was analyzed in the positive (gain of signal) fitting direction using log2 fold-
induction over DMSO controls which provide a baseline signal. Negative and zero values are removed before
analysis and raw values are log transformed. All statistical analyses were conducted using R programming
language, employing tcpl package to generate model parameters and confidence intervals. Each chemical
concentration series was fit and the model which produces the lowest Akaike Information Criterion (AIC) value
is considered the winning model.
Prior to the data processing, all the data must go through pre-processing to transform the heterogeneous data
into a uniform format before it can be loaded into a database. Level 0 pre-processing is done in R by
vendor/dataset-specific scripts with all manual transformations to the data documented with justification.
Common examples of manual transformations include fixing a sample ID typo or changing well quality
value(wllq) to 0 after identifying problems such a plate row/column missing an assay reagent.
Once data is loaded into the database, tcpl utilizes generalized processing functions provided to process,
normalize, model, qualify, and visualize the data. To promote reproducibility, all method assignments must
occur through the database and should come from the available list of methods for each processing level.
Assigned multiple concentration processing methods include:
Level 2: Component-specific corrections include:
2: log2 (Transform the corrected response value (cval) to log-scale (base 2).), 3: rmneg (Exclude wells
with negative corrected response values (cval) and downgrading their well quality (wllq); if cval < 0, wllq
= 0.), 4: rmzero (Exclude wells with corrected response values (cval) equal to zero and downgrading their
well quality (wllq); if cval = 0, wllq = 0.)
Level 3: Endpoint-specific normalization include:
-------
1: none (Set the corrected response value (cval) as the normalized response value (resp); cval = resp. No
additional mc3 methods needed for endpoint-specific normalization.)
Level 4: Baseline and required tcplFit2 parameters defined by:
1: bmad.aeid.lowconc.twells (Calculate the baseline median absolute value (bmad) as the median
absolute deviation of normalized response values (rep) for test compound wells (wilt = t) with
concentration index (cndx) equal to 1 or 2. Calculate one standard deviation of the normalized response
for test compound wells (wilt = t) with a concentration index (cndx) of 1 or 2; onesd = sqrt(sum((resp -
mean resp)A2)/sample size - 1). Onesd is used to establish BMR and therefore required for tcplfit2
processing.)
Level 5: Possible cutoff thresholds, where higher value for endpoint is selected, include:
3: log2_1.2 (Add a cutoff value of log2(1.2). Typically for fold change data.), 5: bmad5 (Add a cutoff value
of 5 multiplied the baseline median absolute deviation (bmad). By default, bmad is calculated using test
compound wells (wilt = t) for the endpoint.), 28: ow_bidirectional_gain (Multiply winning model hitcall
(hitc) by -1 for models fit in the negative analysis direction. Typically used for endpoints where only
positive responses are biologically relevant.)
Level 6: Cautionary flagging include:
5: modi.directionality.fail (Flag series if model directionality is questionable, i.e. if the winning model
direction was opposite, more responses (resp) would have exceeded the cutoff (coff). If loss was winning
directionality (top < 0), flag if count(resp < -l*coff) < 2*count(resp > coff). If gain was winning
directionality (top > 0), flag if count(resp > coff) < 2*count(resp < -l*coff).), 6: singlept.hit.high (Flag
single-point hit that's only at the highest cone tested, where series is an active hit call (hitc >= 0.9) with
the median response observed above baseline occurring only at the highest tested concentration tested.
), 7: singlept.hit.mid (Flag single-point hit that's not at the highest cone tested, where series is an active
hit call (hitc >= 0.9) with the median response observed above baseline occurring only at one
concentration and not the highest concentration tested.), 8: multipoint.neg (Flag multi-point miss, where
series is an inactive hit call (hitc < 0.9) with multiple median responses observed above baseline.), 9:
bmd.high (Flag series if modeled benchmark dose (BMD) is greater than AC50 (concentration at 50
percent maximal response). This is indicates high variability in baseline response in excess of more than
half of the maximal response.), 10: noise (Flag series as noisy if the quality of fit as calculated by the root
mean square error (rmse) for the series is greater than the cutoff (coff); rmse > coff.), 11: border (Flag
series if borderline activity is suspected based on modeled top parameter (top) relative to cutoff (coff);
| top | <= 1.2(coff) or | top | >= 0.8(coff).), 13: low.nrep (Flag series if the average number of replicates
per concentration is less than 2; nrep < 2.), 14: low.nconc (Flag series if 4 concentrations or less were
tested; nconc <= 4.), 15: gnls.lowconc (Flag series where winning model is gain-loss (gnls) and the gain
AC50 is less than the minimum tested concentration, and the loss AC50 is less than the mean tested
concentration.), 17: efficacy.50 (Flag low efficacy hits if series has an active hit call (hitc >= 0.9) and
efficacy values (e.g. top and maximum median response) less than 50 percent; intended for biochemical
assays. If hitc >= 0.9 and coff >= 5, then flag when top < 50 or max_med < 50. If hitc >= 0.9 and coff < 5,
then flag when top < log2(1.5) or max_med < log2(1.5).), 18: ac50.lowconc (Flag series with an active hit
call (hitc >= 0.9) if AC50 (concentration at 50 percent maximal response) is less than the lowest
concentration tested;if hitc >= 0.9 and ac50 < 10Alogc_min, then flag.), 20: no.med.gt.3bmad (Flag series
where no median response values are greater than baseline as defined by 3 times the baseline median
absolute deviation (bmad); nmed_gtbl_pos and nmed_gtbl_neg both = 0, where nmed_gtbl_pos/_neg
is the number of medians greater than 3*bmad/less than -3*bmad.)
The following is an aggregate endpoint summary of the number of samples and chemicals tested, as well as
active or inactive hit calls (hitc) and predicted winning models for all samples tested in this endpoint.
SAMPLE AND CHEMICAL COVERAGE
Number of samples tested: 24 Number of chemicals tested: 24
ACTIVITY HIT CALLS
-------
Active hit count: hitc>0.9
5
Inactive hit count: 0 |