Proceedings of the Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop October 1,2009


SEPA

United States
Environmental Protection
Agency

Proceedings of the
Computational Toxicology Centers
Science To Achieve Results (STAR)
Progress Review Workshop

OCTOBER 1, 2009

U.S. EPA, MAIN CAMPUS, BUILDING C
109 TW ALEXANDER DRIVE
RESEARCH TRIANGLE PARK, NC

Office of Research and Development

National Center for Environmental Research

-------
U.S. EPA Computational Toxicology Centers STAR Progress Review Workshop

Table of Contents

Agenda
Abstracts

Carolina Center for Computational Toxicology

Ivan Rusyn

Environmental Bioinformatics and Computational Toxicology Center
William J. Welsh

Carolina Environmental Bioinformatics Center

Fred A. Wright

The Texas-Indiana Virtual STAR Center; Data-Generating In Vitro and In Silico Models of Developmental
Toxicity in Embryonic Stem Cells and Zebrafish

Maria Bondesson Bolin

Chemical Substance In Vitro!In Silico Screening System To Predict Human and Ecotoxicological Effects
(Chem Screen)

Bart van der Burg

Presentations

Summary

Post-Meeting Participants List

The Office of Research and Development's National Center for Environmental Research

iii

-------
U.S. Environmental Protection Agency (EPA), Office of Research and Development (ORD),
National Center for Environmental Research (NCER)

Computational Toxicology Centers STAR Progress Review Workshop

U.S. Environmental Protection Agency
Main Campus, Building C, Auditorium CI 11A/B
109 TW Alexander Drive
Research Triangle Park, NC 27711

Thursday, October 1, 2009
Agenda

8:00 a.m. - 8:30 a.m. Registration

8:30 a.m. - 9:00 a.m. Welcome, Introduction, and Review of Meeting Goals

Robert Kavlock, EPA, ORD, and Deborah Segal, EPA, ORD, NCER

9:00 a.m. - 10:00 a.m. Carolina Center for Computational Toxicology

Ivan Rusyn, University of North Carolina

10:00 a.m. - 10:15 a.m. Collaborative Work With EPA

Ann Richard, EPA, National Center for Computational Toxicology (NCCT)

10:15 a.m. - 10:30 a.m. Break

10:30 a.m. - 11:30 a.m. New Jersey Environmental Bioinformatics and Computational Toxicology Center

William Welsh, University of Medicine and Dentistry of New Jersey

11:30 a.m. - 11:45 a.m. Collaborative Work With EPA

Susan Euling, EPA, National Center for Environmental Assessment (NCEA)

11:45 a.m. - 12:30 p.m. Lunch (On Your Own)

12:30 p.m. - 1:30 p.m. Carolina Environmental Bioinformatics Research Center

Fred Wright, University of North Carolina

1:30 p.m. - 1:45 p.m. Collaborative Work With EPA

Richard Judson, EPA, NCCT

1:45 p.m. - 2:45 p.m. The Texas-Indiana Virtual STAR Center: Data-Generating In Vitro and In Silico

Models of Developmental Toxicity in Embryonic Stem Cells and Zebrafish

Maria Bondesson Bolin, University of Houston

2:45 p.m. - 3:00 p.m. Collaborative Work With EPA

Thomas Knudsen, EPA, NCCT

3:00 p.m. - 3:30 p.m. A Proposal from the European Commission's Complementary Research Program

Bart van der Burg, BioDetection Systems B. V.

3:30 p.m. - 4:15 p.m. Discussion on Research Needs

Chair: Maggie Breville, EPA, ORD

4:15 p.m. Adjournment

-------
Carolina Center for Computational Toxicology

EPA Grant Number: R833825

Investigators:

1. Ivan Rusyn

E-mail

iir@unc.edu

2. Timothy Elston

E-mail

telston@amath.unc.edu

3. Shawn Gomez

E-mail

smgomez@unc.edu

4. Mayetri Gupta

E-mail

gupta@bios.unc.edu

5. Andrew Nobel

E-mail

nobel@stat.unc.edu

6. Wei Sun

E-mail

wsun@bios.unc.edu

7. Alex Tropsha

E-mail

alex_tropsha@email.unc.edu

8. Simon Wang

E-mail

wangx@email.unc.edu

9. Fred A. Wright

E-mail

fwright@bios.unc.edu

Current Investigators:

1. Ivan Rusyn

E-mail

iir@unc.edu

2. Timothy Elston

E-mail

telston@amath.unc.edu

3. Shawn Gomez

E-mail

smogomez@unc.edu

4. Alex Tropsha

E-mail

alex_tropsha@email.unc.edu

5. Fred A. Wright

E-mail

fwright@bios.unc.edu

6. Karin Yeatts

E-mail

ka rin_yeatts @ u n c. ed u

Institution:

1. University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, 27599
EPA Project Officer:

Project Period: April 1, 2008 through March 31, 2012
Project Amount: $3,400,000

RFA:

Research Category:

Description:

Objective:

The objective of this proposal is to create The Carolina Center for Computational Toxicology. We present
a clear plan for an effective, broad and interdisciplinary effort to devise novel tools, methods and
knowledge that will utilize publicly available data to assist the regulatory agencies and the greater
environmental health sciences community in protecting the environment and human health.

Approach:

The Center will apply knowledge and expertise of the individual investigators and teams to develop
complex predictive modeling solutions that span from mechanistic- to discovery-based efforts. The
Center will be divided into three Research Projects and an Administrative Core Unit. To balance the
research needs detailed in the Funding Opportunity EPA-G2007-STAR-D1 and maximize the interactions

-------
within the Center and between the Center and the larger environmental health community, the
following sub-disciplines were recognized as critical to the Center: 1) Biomedical modeling of chemical-
perturbed networks (Project 1, Pis Gomez and Elston), 2) Toxico-genetic modeling (Project 2, Pis Wright
and Rusyn), and 3) Chem-informatics (Project 3, PI Tropsha). Overall, we chose a bottom-up approach to
predictive computational modeling of adverse effects of toxic agents. Our emphasis spans from the fine-
scale predictive simulations of the protein-protein/-chemical interactions in nuclear receptor networks
(Project 1), to mapping chemical-perturbed networks and devising modeling tools that can predict the
pathobiology of the test compounds based on a limited set of biological data (Project 1), to building
tools that will enable toxicologists to understand the role of genetic diversity between individuals in
responses to toxicants (Project 2), to unbiased discovery-driven prediction of adverse chronic in vivo
outcomes based on statistical modeling of chemical structures, high-throughput screening and the
genetic makeup of the organism (Project 3). The Administrative Core Unit provides administrative and
programming staff in support of the entire Center, is responsible for ensuring that Center objectives and
goals are being met, and provides oversight for each for the Projects. A detailed Quality Management
Plan ensures that the research and data management will be conducted with integrity and adhering to
appropriate data interchange standards. The plans for Public Outreach will ensure that the activities of
the Center are translated into useable information and materials for the public and policy makers.

Expected Results:

The Center will advance the field of computational toxicology through the development of new methods
and tools, as well as through collaborative efforts. In each Project, new computer-based models will be
developed and published that represent the state-of-the-art. The tools produced within each project will
be widely disseminated, and the emphasis will be placed on their usability by the risk assessment
community and the investigative toxicologists alike. The synthesis of data from a variety of sources will
move the field of computational toxicology from a hypothesis-driven science toward a predictive
science.

-------
Environmental Bioinformatics and Computational Toxicology Center

EPA Grant Number: R832721

Investigators:

1.	William J. Welsh	E-mail: welshwj@umdnj.edu

2.	Panos G. Georgopoulos E-mail: panosg@fidelio.rutgers.edu

Current Investigators:

1.	William J. Welsh

2.	loannis Androulakis

3.	Christodoulos Floudas

4.	Panos G. Georgopoulos

5.	Marianthi lerapetritou

6.	Herschel Rabitz

7.	Weida Tong

E-mail: welshwj@umdnj.edu
E-mail: yannis@rci.rutgers.edu
E-mail: floudas@titan.princeton.edu
E-mail: panosg@fidelio.rutgers.edu
E-mail: marianth@sol.rutgers.edu
E-mail: hrabitz@princeton.edu
E-mail: weida.tong@fda.hhs.gov

Institution:

1.	University of Medicine and Dentistry of New Jersey, Newark, New Jersey, 07101

2.	Princeton University, Princeton, New Jersey, 08544

3.	Rutgers University, New Brunswick, New Jersey, 08901

Current Institution:

1.	Princeton University, Princeton, New Jersey, 08544

2.	Rutgers University, New Brunswick, New Jersey, 08901

3.	U.S. Food and Drug Administration, Silver Spring, Maryland, 20993

4.	University of Medicine and Dentistry of New Jersey, Newark, New Jersey, 07101

EPA Project Officer:

Project Period: October 1, 2005 through September 30, 2010

Project Amount: $5,422,135

RFA:

Research Category:

Description:

Objective:

The Research Center will bring together a team of computational scientists, with diverse
backgrounds in bioinformatics, cheminformatics and enviroinformatics, from UMDNJ, Rutgers,
and Princeton Universities, and the USFDA's Center for Toxicoinformatics. This team will
address, in a systematic and integrative manner, multiple elements of the toxicant Source-to-
Outcome sequence (Investigational Area 1, as identified in the RFA) as well as develop
cheminformatics tools for toxicant characterization

-------
(Investigational Area 2, Predictive Models for Hazard Identification). The computational tools to
be developed through this effort will be extensively evaluated and refined through
collaborative applications involving Center scientists as well as colleagues from the three
universities and USEPA; particular emphasis will be on methods that enhance current
quantitative risk assessment practices and reduce uncertainties.

Approach:

The proposed Center will address a wide range of issues in Investigational Areas 1 and 2 and,
furthermore, will pursue complementary applications in risk assessment (Investigational Area
3). This will be achieved with the requested resources, by building upon a variety of methods
and software systems recently developed at UMDNJ, Rutgers, Princeton (with funding from
USEPA, USDOE, NIH and NSF), and USFDA. Research activities over the proposed 5-year effort
will be organized in five projects; each project will develop a set of "stand-alone" components
addressing specific problems of computational toxicology. Furthermore, Research Project 1 will
provide an integrative framework for Investigational Area 1 while Project 4 will address the
core issues of Area 2. Extensive interaction as well as public outreach and training activities will
constitute essential elements of the Center and will be tightly interwoven with the research
activities.

Expected Results:

Research Project 1 (Development and Application of a Dose-Response Information Analysis
[DORIAN] System) will provide an integrative framework for the outcomes of the other
projects. This framework will include the following components: a web-accessible
Environmental Bioinformatics Knowledge Base (EBKB) that will provide a user-oriented
interface to an extensive set of information and modeling resources; the ebTrack integrated
analysis system that will include linkages to multiple (public and commercial) computational
and database systems; Bayesian computational tools for characterizing and reducing
uncertainties in mechanistic modeling of toxicity pathways; diagnostic computational tools for
sensitivity and stability analysis of mechanistic models and statistical methods for data analysis;
and enhanced tools for quantitative risk assessment (QRA) applications (e.g. for cross-species
extrapolation, chemical mixtures, and dose-response).

Research Project 2 (Hepatocyte Metabolism Model for Xenobiotics) will develop tools for
identifying maximally informative sets of toxicologically relevant genes; tools for analysis of
toxicologically relevant regulatory networks; an expanded version of the Rutgers hepatocyte
metabolism model that will incorporate transformations of xenobiotics; and tools for the
analysis of transcriptional regulation that will allow assessing changes in hepatocyte phenotypic
phase space.

Research Project 3 (Tools for Optimal Identification of Biological Networks) will develop
efficient identification tools for inferring biological network structure from available laboratory
data; optimization tools for extracting quantitative information of biological system parameters
(rate constants, diffusion coefficients, binding affinities, etc.); global sensitivity analysis tools for
identifying most effective molecular targets or pathways of biological networks and for guiding
the design of laboratory experiments; and optimal feedback control tools for inferring networks
with feedback loops.

-------
Research Project 4 (Cheminformatics Tools for Toxicant Characterization) will develop an
integrative hierarchical decision-forest framework for toxicant characterization that
encompasses several novel technologies, including the Shape Signatures tool that rapidly
matches organic and organometallic chemicals with each other or, alternatively, against target
receptor sites/subsites; the Polynomial Neural Network (PNN) that automatically generates
physically-intuitive linear or non-linear QSAR models; and virtual high-throughput screening
(vHTS) methods that predict ligand binding affinity and provide mechanistic information
(toxicity pathways).

Research Project 5 (Optimization Tools for In Silico Proteomics) will customize computational
methods for protein structure prediction and de novo protein design, with specific focus on the
important families of Glutathione Transferases (GST) (cytosolic, mitochondrial and microsomal
GST); develop and implement computational methods for elucidating the topology of signal
transduction networks and addressing uncertainties in experimental data and models; and
develop de novo computational proteomics methods for peptide and protein identification via
tandem mass spectroscopy.

-------
Carolina Environmental Bioinformatics Center

EPA Grant Number: R832720

Investigators:

1. Fred A. Wright

2. Kenneth J. Galluppi

3. Lawrence Kupper

4. Stephen J. Marron

5. Jan F. Prins

6. Ivan Rusyn

7. David Stotts

8. David Threadgill

9. Alex Tropsha_

E-mail: fwright@bios.unc.edu

E-mail: galluppi@unc.edu

E-mail: kupper@bios.unc.edu

E-mail: marron@email.unc.edu

E-mail: prins@cs.unc.edu

E-mail: iir@unc.edu

E-mail: stotts@cs.unc.edu

E-mail: dwt@med.unc.edu

E-mail: alex_tropsha@email.unc.edu

Current Investigators:

1. Fred A. Wright

2. Rosann Farber

E-mail: fwright@bios.unc.edu

E-mail: rosann.farber@pathology.unc.edu

E-mail: mcmillan@cs.unc.edu

E-mail: iir@unc.edu

E-mail: alex_tropsha@email.unc.edu

3. Leonard McMillan

4. Ivan Rusyn

5. Alex Tropsha

Institution:

1. University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, 27599

EPA Project Officer:

Project Period: October 1, 2005 through September 30, 2010
Project Amount: $4,494,117

RFA:

Research Category:

Description:

Objective:

The Carolina Environmental Bioinformatics Research Center brings together multiple investigators and
disciplines, combining expertise in biostatistics, computational biology, chem-informatics and computer
science to advance the field of Computational Toxicology.

The objective of this proposal is to create an Environmental Bioinformatics Research Center with broad-
ranging capability to enhance and advance the field of Computational Toxicology. The Center will
develop novel analytic and computational methods, create efficient user-friendly tools to disseminate
the methods to the wider community, and will apply the computational methods to data from molecular
toxicology and other studies.

-------
Approach:

Effort will be divided into three Research Projects and an Administrative Unit. Each Research Project is
further divided into Functional Areas consisting of Analysis, Methods Development, and Tools
Development. Project 1 (Biostatistics in Computational Biology) will provide biostatistical support to the
Center, performing analysis and developing new methods in collaboration with EPA personnel and the
computational toxicology community. Project 2 (Chem-informatics) will coordinate the compilation and
mining of data from relevant external databases and perform analysis and methods development for
investigating Quantitative Structure-Activity Relationships with burgeoning high-throughput chem-
informatics data. In addition, Project 2 will develop computational tools to perform these tasks. Project
3 (Computational Infrastructure for Systems Toxicology) will create a framework for merging data from
various -omic technologies in a systems biology approach. The investigation of rodent liver toxicity is
used as a driving biological problem, inspiring new methods and architectures for data storage. Finally,
Project 3 will provide programming support for the further development of tools arising from Projects 1
and 2. The Administration Core provides and staff and support to the Center, is responsible for ensuring
that Center objectives and goals are being met, and provides oversight for each for the Functional Areas.
A detailed Quality Management Plan ensures that the research and data management will be conducted
with integrity and adhering to appropriate data interchange standards. The plans for Public Outreach
and Translation Activity will ensure that the activities of the Center are translated into useable
information and materials for the public and policy makers.

Expected Results:

The Center is expected to advance the field of computational toxicology through the development of
new methods and tools, as well as through direct collaborative efforts with EPA and other
environmental scientists. In each Project, we expect that new methods will be developed and published
that represent the state-of-the-art. The tools developed within each project will be widely disseminated,
and will be useful both to trained bioinformatics scientists and bench scientists. The synthesis of data
from a variety of sources will move the field of computational toxicology from a hypothesis-driven
science toward a predictive science. Each Project is goal-oriented, with criteria for success that will be
reviewed by the Scientific Advisory Committee.

-------
The Texas-Indiana Virtual STAR Center; Data-Generating in vitro and in silico Models of
Developmental Toxicity in Embryonic Stem Cells and Zebrafish

EPA Grant Number: 83428901

Investigators:

1.	Prof. Jan-Ake Gustafsson (Contact PI)

2.	Prof. Richard H. Finnell

3.	Prof. James A. Glazier

E-mail: jgustafsson@uh.edu
E-mail: rfinnell@ibt.tamhsc.edu
E-mail: glazier@indiana.edu

Institutions:

1.	University of Houston, Department of Biology and Biochemistry, Houston,

Texas, 77204

2.	The Texas A&M Institute for Genomic Medicine, Texas A&M University/Texas A&M
Health Science Center, Houston, Texas, 77030

3.	Indiana University, Department of Physics, Bloomington, Indiana, 47405-7003

EPA Project Officer: (leave blank)

Project Period:

Project start: November 1, 2009
Project end: October 31, 2012

Project Amount: $3,190,993

RFA: (leave blank)

Research Category: (leave blank)

Description

Objectives/Hypothesis:

As chemical production increases worldwide, there is increasing evidence as to their
hazardous effects on human health at today's exposure levels, which further implies
that current chemical regulation is insufficient. Thus, a restructuring of the risk
assessment procedure will be required to protect future generations. Given the very
large number of man-made chemicals and the likely complexity of their various and
synergistic modes of action, emerging technologies will be required for the
restructuring. The main objective of the proposed multidisciplinary Texas Indiana Virtual
STAR (TIVS) Center is to contribute to a more reliable chemical risk assessment through
the development of high throughput in vitro and in silico screening models of
developmental toxicity. Specifically, the TIVS Center aims to generate in vitro models of
murine embryonic stem cells and zebrafish for developmental toxicity. The data
produced from these models will be further exploited to produce predictive in silico
models for developmental toxicity on processes that are relevant also for human
embryonic development.

-------
Approach:

The project is divided into three Investigational Areas; zebrafish models, murine
embryonic stem cells models and in silico simulations. The approaches are to:

1. Generate developmental models suitable for high throughput screening.
Zebrafish developmental models (transgenic GFP/EGFP/RFP models of crucial
steps in development) and embryonic stem cell (ESC) differentiation models
(transgenic beta-geo models of crucial steps in differentiation) will be generated.
Important morphology features and signaling pathways during development will
be documented. The impact of environmental pollutants on development and
differentiation will be assessed in the models. Finally, the models will be refined
for high throughput screening and automation.

2. Generate a computational model that faithfully recreates the major
morphological features of normal wild-type zebrafish development (ie-
segmentation into somites, proper patterning of vascular and neural systems)
and the differentiation to three primitive layers (endoderm, mesoderm and
ectoderm) in mouse embryonic stem cells. The data for simulations are
produced from developed high information content zebrafish and ESC models.
Once a working model of normal development has been generated, we will carry
out a directed series of parameter sweeps to try to create developmental
defects in silico. We will compare the results of computationally created defects
with experimentally-generated defects in zebrafish and embryonic stem cells.
Best matches between the two datasets will suggest hypotheses about possible
mechanisms by which defects occur.

3. Perform proof-of-concept experiments of the in vitro and in silico test platforms
with a blind test of chemicals.

Techniques will be molecular biology techniques on zebrafish and ESC models, such as
cloning, imaging, in vitro differentiation and in vitro exposure studies, and in silico
mathematical simulations.

Expected Results (Outputs/Outcomes):

In collaboration with other initiatives taken in the field of chemical safety, our generated
results and models will contribute to large screening effort to prioritize chemicals for
further risk assessment. We will specifically contribute with:

• 9 transgenic fish lines validated for toxicity screening

• 16 embryonic stem cell models validated for toxicity screening

• High information content models on development and differentiation to produce
data for in silico simulations, within the project and elsewhere

• Computational models for developmental toxicology of normal development and
of mechanisms by which chemical perturbations cause experimentally-observed
developmental defects

• Information on developmental toxicity on 39 compounds

-------
All the data produced in this project will be released to public databases. The developed
models will be automated for high throughput screening.

Supplemental Keywords:

Risk assessment, effects, dose-response, teratogen, organism, cellular, infants,
chemicals, toxics, aquatic ecosystem protection, pollution prevention, green chemistry,
public policy, environmental chemistry, biology, physics, genetics, mathematics,
modeling, measurement methods

-------
Chemical Substance In Vitrolln Silico Screening System To Predict Human and
Ecotoxicological Effects (ChemScreen)

Bart van der Burg, BioDetection Systems, Amsterdam, The Netherlands (Coordinator)

The current system of risk assessment of chemicals is complex, very resource-intensive,
and extremely time-consuming. Because of this, there is a great need to modernize this
process. However, this is not feasible without alternative, integrated testing strategies in
which chemical characteristics are used to more advantage, and where costly and time-
consuming animal tests are replaced to a large extent by more rapid, cheap, and
ethically less controversial methods. This is particularly needed for reproductive toxicity
testing of chemicals. Reproductive toxicity is important to assess both human and
environmental toxicity and uses the most animals in toxicity testing. Unfortunately, there
are very few alternative methods. The EU project Chem Screen is a partnership between
nine European institutes and companies from five different countries. It aims to generate
alternative methods and place the tests in a more general innovative animal-free testing
strategy. For this, we will generate a simple rapid screening system, which aims at
widespread implementation within the tight time schedule of the REACH program. It will
be a flexible tool that can be adapted and used for applications beyond the scope of
REACH and in the post-REACH period. It will use in silico methods for prescreening
chemicals for all relevant toxic effects. When found positive, this will be followed by
further in silico and in vitro tests, most of which are available already. To fill the gap of
suitable alternative methods for reproductive toxicity testing, we will use a novel high-
throughput approach combining in silico/in vitro methods. In this approach, we will
combine knowledge of critical processes affected by reproductive toxicants with
knowledge on the mechanistic basis of such effects. Straightforward data interpretation
and decision trees will be developed in which all information on the potential toxicity of a
chemical is considered. In this way, we will provide a cost-effective means to generate a
basic set of data on toxicological properties of chemicals and a decision tool to assess if
further testing of chemicals is required.

-------
CIENCE

and
UDGMENT

SCIENCE AND DECtSONS

AOVANCNG Ra ASSBSWfNT



nwovi hkwcw town

1983

1996

2008

Computational Toxicology:

From Data to Analyses to Applications

September 21-22, 2009 ¦ Washington, DC

LECTURE ROOM ¦ NAS BUILDING ¦ 2101 CONSTITUTION AVENUE, NW (N0T500 FIFTH STREET)

2007

-------
Computational Toxicology:

a sub-discipline of toxicology that aims to use the mathematical,
statistical, modeling and computer science tools to better understand
the mechanisms through which a given chemical induces harm and,
ultimately, be able to predict adverse effects of the toxicants on human
health and/or the environment

Biotechnology
Biostatistics

Modeling

Computer Science

TOXICOLOGY

Engineering

Computational Chemistry

Toxicogenomics

EMERGING SCIENCE FOR
ENVIRONMENTAL HEALTH 1
DECISIONS

-------
Computational Toxicology:

•	Relies on high-throughput and high-
content screening assays to provide
unparalleled level of detail for chemical
and molecular interactions, cellular
pathways and tissue-level processes

•	Provides a novel framework for the in
silico modeling and simulation to
validate and predict key aspects of both
the physiology and toxicant-induced
pathology

•	Enables fundamental understanding of
the complex relationships across
biological systems and supports a
scientifically sound process of projecting
human health risks posed by chemicals

EMERGING SCIENCE FOR
ENVIRONMENTAL HEALTH 1
DECISIONS

-------
Computational Toxicology: Stakeholders

Industry

EMERGING SCIENCE FOR
ENVIRONMENTAL HEALTH 1
DECISIONS

-------
Carolina Center for Computational Toxicology

Organizational Structure

External Advisory Board

Linda Griffith, PhD
Edward LeCluyse, PhD
Howard McLeod, PharmD

Kevin Morgan, PhD
Christopher Portier, PhD
Vital! Proutskiy, MD, PhD
David Threadgill, PhD
Maurice Whelan, PhD

I

r	^

Center Director

^ Ivan Rusyn, MD, PhD j

S	'

Administrative Core

1.	administration

2.	outreach/translation

3.	quality management

Scientific Steering

Committee
Shawn Gomez, PhD
Tim Elston, PhD
Fred Wright, PhD
Alex Tropsha, PhD

Research Projects

1. biomedical modeling
of chemical-perturbed
networks

2. toxico-genetic
modeling

3. chem-informatics

Protein-protein/
-chemical interactions,

reaction rates and
predictive simulations
(Project 1)

t a

Chemical-perturbed
network topology and
biomedical modeling
(Project 1)

t a

Toxico-genetic modeling,
network inference and
pathway assessment
(Project 2)

t a

Statistical modeling and

discovery based on
chemical, biological and
genetic descriptors
(Project 3)

-------
Carolina Center for Computational Toxicology

Administrative Core
Administration Function:

•	Project and budget management

•	Communications

•	Reporting to EPA and UNC

•	Organization of the annual EAB meetings

Integration Function

•	Promoting interactions within the Center

•	Promoting interactions with EPA/NCCT and other partners

•	Facilitating scientific interactions between Projects

Public Outreach/Translation Function

•	Created Center website: http://comptox.unc.edu

•	Implementing bioinformatics and chemo-informatics tools into GUI-enabled software

•	Conducting joint research meetings with EPA/NCCT

•	Presenting at the state, national and international scientific meetings

Quality Management Function

•	Center-wide quality management plan developed and approved by the EPA

•	Quality assurance project plans developed and annual audits performed for Year 1

•	Remedial actions will be completed by November 01, 2009

-------
In Step With the US EPA Guidance: Commitment to Transparency

comptox.unc.edu

rttE UNiV bKSi I Y of NORTH CAROLINA at CHAPEL HILL

HOME

Center Overview

-------
Carolina Center for Computational Toxicology

Identify significant TF regulation effect

DS-linked gene-+TF Activity-* other linked genes

High Throughput Serening |—

Ho»«Ct»1ar Prop«ift>n |

QSAR |
modeling :

Project 1

Predictive modeling of chemical-
perturbed regulatory networks in
systems toxicology

Byproducts	Small molecule —#- -	Expression	—•—*	Promoter Binding

—•- >	Regulation	—0-»-	Molecular Synthesis

\	—Molecular Transport	—&-*	Chemical Reaction

Protein

Functional Class	*	Protein Modification	—Direct Regulation

—•—	Binding

eQTL module = eOTL hot spot +
genes linked to the hot spot

— eQTL hot spot

m gene

""" "a _

Chrll

Activity profiles of TFs

Yeast Segregants

Project 2

Toxico-genetic modeling:
Population-wide predictions from
toxicity profiling

) TF binding data |

Project 3

Development of validated and predictive
Quantitative Structure-Toxicity Relationship

models that employ both chemical and
biological descriptors of molecular structures
and take into account genetic diversity
between individuals

v-Liver
ToxCast
ToxRefDB

ACToR r^G C
ToxCast
ToxRefDB







DSSTox
ACToR
ToxCast
ToxRefDB

-------
PROJECT 1

Predictive modeling of chemical-perturbed regulatory

networks in systems toxicology

Shawn Gomez - co-PI

Assistant Professor, Department of Biomedical Engineering, UNC-Chapel Hill

Timothy Elston - co-PI

Professor, Department of Pharmacology, UNC-Chapel Hill

•	Develop and apply data-driven methods for the inference and high-
level modeling of regulatory network response to chemical
perturbation

•	Develop mechanistic models of nuclear receptor function

•	Integrate and deploy high-, and low-level modeling tools

-------
Major Interactions with the US EPA

Exploring toxicity modeling (mechanistic, dose-
response, etc.): with Rory Connolly (EPA-NHEERL)

Extension and integration of mechanistic
metabolism and other models: work relevant to
the v-Liver Project, Imran Shah (EPA-NCCT)

ToxCAST: with Richard Judson (EPA-NCCT)

-------
Inference & Modeling of Biological Networks

Short term:

*Tool in data analysis and
interpretation

*Help establish biological-
chemical context

Long term:

*	Components to systems -
simplistic wiring

*	Framework for understanding
systems properties, pathways
and cross-talk,...

*	Basis for mechanistic models

mitogen

FADD/MORT

Cdk4

.CPP32

AKT/PKB

poptosis

Cdc25A

cell proliferation

Cyclin

Cdk2

Cdk2

-------
Challenge #1: Data Integration

Static Interaction data

Dynamic condition-dependent and other data

Domain data



TAP Data



Two-hybrid



Otlw interaction



Expression Data ^

Transcription Factor ¦

Phylogenetic ¦











Data



binding data

information 1

Domain domain

Domain domain



Domain domain

Interaction probability

interaction probability



interaction probability



ft, "If

U,.





Combined domaindomain
interaction probability

Protein protein
interaction
probability

Candkjjrti- Networki

Model Selection/Averaging
Chemical Exposure?

¦ —	-—, I. \T3

Lif \{

SVII	— i

-------
PCANS - NMR spectra alignment

Pick Peaks





Dynamic Programming
Alig n ment of Seg merits

1

j i i	J .

1 ll ll 1 1 ill I1 1

1 ,, nJ

J 1 ,

ill ll 11

i, j i7

V

Determine Alignment
Pairs by Correlation

±

_LlI

Segmentation of
Alignment Pair

1 ll

i 11

1 ,1 ill.,.

>

III



. 1 . 1 .i 1 .



I

III

I

I.I'

1 1 i 1 it 1 , , J



Creation of Consensus

Spectrum from Alignment

=== Multiple Consensus Spectra ===>

= One Consensus Spectrum =>

Output Results
from Alignment

lli In

1 1 ll II

I

| | , \ -
i 1 1

Naive Alignment of
Highly Similar Peaks

Staab J et al. (2009) BMC Bioinformatics (In revision)

-------
Network Context: Traditional ways to create networks

NEMO (Yan et al., 2007): frequent dense vertex set mining algorithm

(A)	(B)

(1)	(2)	(3)	(4)	summary graph

-------
Network Context: Subgraph Mining

"Functional Module" in Subgraph Mining

70*=> 7 assays 	

65 ->10assays"S-i53Ssays&46 chemicals
=>13 assays'	'

22 assays & 24 chemicals

Mines binary data to find all
frequent 'dense' sub-graphs
(cliques)

•	Nodes: Assay

•	Edges: Set of 'Active' Chemicals
shared between Nodes

•	Finds all unique subgraphs for a
minimum frequency of 'Active'
chemicals

Differs from Hierarchical
clustering by focusing on subsets
of the data

Useful for defining composite
assays that might be more
predictive

Useful for associating
Assay/Chemical combinations to
endpoints

CD40 -TNF receptor superfam" ly member
CD38 -CD38 molecule
SELE -selectin E
CD69 -CD69 molecule
IL8 -interleukm 8
NR113 -nuclear receptor subfami ty 1, group I, mem ber 3
PPARG -peroxisome p'olIferato'-actVated receptor gamma
PPARA -peroxisome oroliferator-activated receptor alpha
NR112 -nuclear receptor subfamily 1 .group I, mem ber 2
CYP27B1 -cytcch'ome P45G, family 27, subfamily B, polypeptide 1
&ATF2 -metal response element binding transcription factor 2

JUN -jun oncogene

RO W. -RAR-re»ated orphan receptor A

POU2F1 -POD dass 2 homeobox 1

BMPR2 -bonemcphcgenetic prate?n receptor, type II

CREB3 -cAWP responsive element binding protein 3

CEBP8 -CCAAT/enha.ncer binding protein (C'EBPKbeta

GABPA -GA binding protein transcription factor, alpha

-------
Network Context: Subgraph Mining

Endpoint: RatLiver_AnyLesion Minimum Frequency: 40 chemicals (~30%)
Module Found: 10 Assays for a set of 41 chemicals (*Active)

2D Hierarchical Clustering

Subgraph Mining

^Active

"Active

70Q	720	740	760

<- Assays ->

10 20 30 40 50 60 70 0CJ 90 100 110 120

<= Assays =>

-------
Development of a mechanist ic mode
of cellular metabolism:

predicting changes in metabolic flux

Heatmap of Liver Fluxes, Full Circuit (Log Scale)

800	1200

Time Since Feeding

-------
PROJECT 2

Toxico-genetic modeling:
Population-wide predictions from toxicity profiling

Fred Wright - co-PI

Professor, Department of Biostatisties, UNC-Chapel Hill

Ivan Rusyn - co-PI

Assoc. Prof., Dept. of Environmental Sciences & Engineering, UNC-Chapel Hill

•	Develop toxicogenetic expression Quantitative Trait Loci (eQTL)
mapping tools, perform transcription factor network inference and
integrative pathway assessment

•	Perform toxicogenetic modeling of liver toxicity in cultured mouse
hepatocytes

•	Discover chemical-induced regulatory networks using population-
based toxicity phenotyping in human cells

-------
Major Interactions with the US EPA

Developing in vitro tools which will enable testing
for inter-individual susceptibility: with David Dix
(EPA-NCCT) and other Tox21 partners

Developing statistical methodology and
computational tools capable of processing higher-
order multi-dimensional data: work relevant to
future ToxCAST efforts and current Tox21 datasets

ToxCAST: with Richard Judson (EPA-NCCT)

-------
Population-wide predictions from toxicity profiling:
linking toxicology with -omics and genetics

Data

Analysis

Knowledge

Phenotype

a

s /

«int\ / ~	\ \ v

\_X ~bubi \ \ J /
•"	H

NDC80	a	ii/	v

I4	-o

RAp^"	C0L18A1

.T-iJR r?;	* : -*r	ft

= :.!•r 4:. lUru " «:.

Genetics

-omics

Patho-biology

-------
Genome-level analysis of genetic regulation of liver gene expression networks

(eQTL mapping)

-------
Specific Objective 1: Development of Fast and Efficient Toxicogenetic
Expression Quantitative Trait Loci (eQTL) Mapping Tools

(A)

(B)

Observed marker profiles

Center of a
significance set

Permutation p-value estimation

in

0

Q.

O ^
t- CM

O) I
o

lo

CO
I

-3.5 -2.5 -1.5
i°gio(pp)

Fast methods to
perform p-value-based
eQTL inference

A geometric view of
permutation p-values

•	For each transcript, we
imagine a hypersphere in the
vicinity of the most significant
possible genotype profile

•	Permutations correspond to
rotations of sets of observed
genotypes within the space

•	Significance thresholds
determined by "volume" of
space occupied by observed
genotypes

-------
Specific Objective 1: Development of Fast and Efficient Toxicogenetic
Expression Quantitative Trait Loci (eQTL) Mapping Tools

ODI/^IA/A I DA DCD VoL 25 n0' 4 2009, pages 482~489

L/rf/u//VriL rnrLri doi:W.1093/bioinformatics/btn648

Gene expression

FastMap: Fast eQTL mapping in homozygous populations

Daniel M. Gatti1-1", Andrey A. Shabalin2^, Tieu-Chong Lam1, Fred A. Wright3,

Ivan Rusyn1-* and Andrew B. Nobel2,3-*

1 Department of Environmental Sciences and Engineering,2 Department of Statistics and Operations Research,
and 3Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599, USA

,IpM

Oil* Iuuh

I Onno rtfli n — I oml 5NP il/Wn ~

ASMH-ldtlin

Load SIIP tied _ $w* SNP Irso _ PUW Association Wiitn As»od«lon S«

It mass

onf1nW3792_a

1;'l

nnrimm /fw *

«ttf1rn030l7_a

_al

Hiif1i»OJ820 J

(fill TrriJIl-IKi' 1 _ a

1 frtO 30 22_a_at

ipi(1>Fi030J3 a

flntlniirjKXIi.a

_a«

nnfimn«nn -i

iimirnu JSJO a at

tmMmO3O10_a

_a«

UuH"lQ38S2!_o

[fill IrftfnHM irl

>m I

mil

_at

«uHm03©67 a

onf1ni0307G_a

-------
Specific Objective 1: Development of Fast and Efficient Toxicogenetic
Expression Quantitative Trait Loci (eQTL) Mapping Tools

Phenotype

c
o

-------
Specific Objective 1: Development of Fast and Efficient Toxicogenetic
Expression Quantitative Trait Loci (eQTL) Mapping Tools

Understanding genomic context for expression

OPEN 0 ACCESS Ftwly availaMt otlUrw

Pips one

Dissecting Nucleosome Free Regions by a Segmental
Semi-Markov Model

Wei Sun,'a,f Wei XieM,( Feng Xu1, Michael Grunstein5', Ker-Chau U4'5'

1 Department of Btosiatisties. Carolina Center for Genome Science. University of North Carolina Chapel Hill Kerch Carolina. United 5taies of America. 2 Department of
Genetics. Ooferu Center for Genome Science, Univwuty of North Carolina. Chapel Hfll, North Carolina. Umied State* ol America. 3 Department 6»ological Chemmry.
Unncnlty of California los Arxjrtc?, lot Angeles, California, United State* of America, 4 Department of Statittif* Unlveraty of Caitfomia Los AnqHi-s, l o? ArvjHii,
California, United State* at America. JlWHute of Statistical Science. Genomics Rwearch Center. Acadenna Sine*, Taipei. Taiwan

BMC Bioinformatics

BiolVk*! central

Open Access

Methodology article
Improved ChlP-chip analysis by a mixture model approach

Wei Sun"1, Michael I Buck', Mukund PateP and Ian I Davis*1'1

Adiln-w. 'niirjutnifiil nl RinMjlKticv C .unliii.i CroUn Un GmintirStU'liicy tlnivrrslly al Ninth Carntma al Chapel IHit, Clupi'l llill, NC, I J*v\,
•Department of Biochemistry. Lenta of Licellence in Hioinform.iucs and Life Sciences. Slats University of New York al Buffalo, Buffalo, NY, USA.
"¦Drp.mmnU nfC.rnrtUs, Hnivmlly nl Ninth f jinlin.i at Ouprl Mill, CJiaprl H ill, NC, I ISA anil 'Drparlnirni ill IVdlauii*. I rtirlin^rt
C(nri|in
-------
PROJECT 3

Development of validated and predictive Quantitative
Structure-Toxicity Relationship models that employ
both chemical and biological descriptors of molecular
structures and take into account genetic diversity

between individuals

Alexander Tropsha - PI

Chair, Division of Medicinal Chemistry & Natural Products, UNC-Chapel Hill

• Develop rigorous end point toxicity predictors based on the QSAR
modeling workflow and conventional chemical descriptors

• Develop novel computational toxico-genomic models based on
combined chemical and biological descriptors through QSAR
modeling workflow

• Develop novel computational toxico-genetic models based on
combined genetic, chemical and toxicity descriptors through QSAR-
like modeling workflow

-------
Major Interactions with the US EPA

• Integrating chemical descriptors into DSSTox: with
Ann Richard (EPA-NCCT)

• ToxCAST, ToxRefDB and ACToR data analysis: with
Richard Judson (EPA-NCCT)

-------
Predictive Quantitative Structure-Toxicity

Relationship Modeling

'KndlefbB

Multiple
Training Sets

Split into
Training, Test,
and External
Validation Sets

Experimental
Validation of
Prioritized Alerts

Prediction of
Potential Safety
Alerts to
Prioritize for
Testing

Multiple
Test Sets

External validation
Using Applicability
Domain (AD)

Y-Randomization

Combi-QSPR
Modeling

Activity
Prediction

Only accept models
that have a

q2 > 0.6
R2 > 0.6, etc.

Validated Predictive
Models with High Internal
& External Accuracy

-------
Compound prioritization using the
ensemble of QSAR models

QSAR models

Non-toxic

Alerts: further testing

-------
Data Curation

• ln-vitroassays: 524 353

- Remove one of two highly correlated ) assays and
low-variance (<4 non-zero entries) assays

• Chemicals: 320 228

- duplicate structures, mixtures, inorganic compounds,
macromolecules were removed

- Kept only those for which idata is available (i.e.
chronic mouse toxicity)

-------
Focusing on a small subset of data:
Chronic Mouse Toxicity

• Continuity (overlaps with previous ToxRefDB data)

• Manageable (has only 7 assays)

• 3 assays with the highest fraction of actives
chosen for initial studies:

CHR_Mouse_LiverProliferativeLesions (87 actives)
CHR_Mouse_LiverTumors (68 actives)
CHR_Mouse_Tumorigen (88 actives)

-------
Data partitioning based on in vitro-in vivo
correlations as part of the QSAR Modeling workflow

DSSlOX

For each In-vitro vs. In-vivo profile (3 x 353 = 1059 combinations):

ln-vivo

II. Toxic (both)

I. Non-toxic
(both)

IV. Non-toxic
i/i-wVo/active
in vitro

In-vitro

£
&

. gy

In-

vitro

Binary classification QSAR for "baseline" (II & III) vs. off-line (I & IV)

using chemical descriptors only

-------
Developing Novel Bio-Descriptors

Pathway-derived

70*=> 7 assays

«•=>!0 assays y15 & 46 [hemica|s .
=>13 assays-'

22 assays & 24 chemicals

Dose-response-derived

T- ,1,

Dragon only Hybrid (THR=15%)

ROC curves

CCR

"tasfftefiBffi

CD40 -TNF receptor superfamily member
CD38 -C038 molecule
SElE -selectin E
CD69 -CD69 molecule
IL8 -interleukin 8

NR113 -nuclear receptor subfamily 1 .group I, member 3
PPARG -peroxisome proliferator-acC'vated receptor gamma
PPAP.A -peroxisome proliferator-activated receptor alpha
NR112 -nuclear receptor subfamily 1 .group I. member 2
CYP27B1 -cytochrome P4SQ, family 27, subfamily B, polypeptide 1
MTF2 -meta! response element binding transcription factor 2

JUN -jun oncogene

RORA -RAR-re'ated orphan receptor A

POU2F1 -POU class 2 homeobox 1

BWPR2 -bone morphogenetic protein receptor, type II

CREB3 -cAMP responsive element binding protein 3

CEBPS -CCAAT,'enhancer binding protein (C/EBP}.beta

GABPA -GA binding protein transcription factor, alpha

-------
• Focus on accurate prediction of external datasets is
much more critical than accurate fitting of existing data:

- consensus (collaborative!) prediction using all acceptable models

- experimental validation of a small number of computational hits

- outcome: decision support tools in selecting future experimental
screening sets

• Neither cheminformatics nor HTS and -omics data alone
is insufficient to achieve the desired accuracy of the end
point property prediction

- Integration of cheminformatcs and bioinformatics: predictive model s of
selected endpoints using integrated short term biological profiles
(biodescriptors ) and chemical descriptors for compound subsets

- New computational approaches (e.g., hybrid and hierarchical QSAR)

- Interpretation of significant chemical and biological descriptors

-------
Center publications in Year 1

• Choi K, and Gomez SM. (2009) BMC Bioinformatics (In revision)

• Staab J et al. (2009) BMC Bioinformatics (In revision)

• Gatti DM et al. (2009) Bioinformatics 4:482-489

• Sun Wetal. (2009) PLoS One 4:e4721

• Zhu H et al. (2009) Envr Health Persp 117:1257-1264

• Gatti, DM et al. (2009) Mamm Genome 20:437-454

• Harrill AH et al. (2009) Tox Sci 110:235-243

• Sun W, and Wright FA (2009) Ann Appl Stat (accepted)

• Sun W et al. (2009) BMC Bioinformatics 10:173

• Zhu H et al. (2008) Environ. Health Persp 116: 506-513

• Zhu H et al. (2009) Chem Res Tox (In revision)

• Artemenko AG et al. (2009) Chem Res Tox (In revision)

-------
Short-Term Goals for Year 2

Project 1:

• Continue in depth analysis of ToxCast Phase I data;

• Further refine the methods for integration across data types;

• Investigate the applicability of the metabolism model as a tool for the prediction of the effects of
chemical perturbation of metabolic pathways;

• Integration of the eQTL analyses/approaches with the network-focused methodologies (with Proj. 2);

• Establish the network context for QSAR (with Proj. 3).

Project 2:

• Continue development of FastMap software;

• Construct transcription regulation networks in the Bayesian framework by combining eQJLs,
nucleosome occupancy, and transcriptional regulation data;

• Complete characterization of the mouse hepatocyte cultures and perform experiments with key
toxicants;

• Complete GWAS analyses of the HapMap lymphoblast cell viability and apoptosis data and correlate
the toxicity endpoints with basal gene expression profiles.

Project 3:

• Complete the analysis of ToxCast data;

• Continue to explore other datasets that provide both in vivo and in vitro data for chemicals;

• Build models that could be used by EPA to prioritize the selection of ToxCast Phase 2 compounds.

-------
SEFtt

Carolina Environmental
Bioinformatics Research Center:
Collaborative work with EPA

Presented by Ann M. Richard

UNITED STATES ENVIRONMENTAL. PROTECTION AGENCY

This work was reviewed by EPA and approved for presentation but does not
necessarily reflect official Agency policy. Mention of trade names or commercial
products does not constitute endorsement or recommendation by EPA for use.

SERA

NC Bioinformatics STAR Center

Project 1:
Biostatistics for
Computational
Toxicology

What statistical techniques are most appropriate for
handling high-output toxicity platforms?

CUT

Project 2:

Chem informatics

How can biological information
& in vitro HTS data be
incorporated into QSAR
models?

Project 3:
Computational
Infrastructure for
Systems Toxicology

What computational tools
are necessary for these and
related questions arising in
model organism toxicity
research?

SERA,

o. NC Bioinformatics STAR Center

Project 2:
Cheminformatics

How can biological information
& in vitro HTS data be
incorporated into QSAR
models?

9 Ability to generate thousands of QSAR descriptors representing
categories of structure-based computed properties (DRAGON):
Electronic, topological, constitutional, geometrical
¦> feature counts, functional groups, 2Dfingerprints, etc.

a Sophisticated QSAR workflow:
p- kNN & sphere exclusion methods
w Randomized y variable test
~ External test set validation
» Consensus models

SERA

Predictive QSAR Workflow*

Y-Randomization

Original
Dataset

Experimental
Validation of
Prioritized Alerts

Prediction of
Potential Safety
Alerts to
Prioritize for
Testing

Split into
Training, Test,
and External
Validation Sets

Only accept
models that have

External validation
Using Applicability
Domain (AD)

Validated Predictive
. Models with High Internal &
External Accuracy

"Tropsha,. A.,"Golbraikh, A. Predictive QSAR Modeling Workflow, Model
Applicability Domains, and Virtual Screening Curr Pharm. Pes., 2007, 13, 3494-3504

SERA

Chemical Descriptors
(DRAGON):

> Computed from 2D molecular
structures provided in DSSTox SDF
files

> Can use selected categories, or all

> Provide different representations
of chemical space in relation to
activity

> Different degrees of interpretability

constitutional descriptors

' 2

topological descriptors

119

walk and path counts

connectivity indices

information indices

f 6

2D autocorrelations

edge adjacency indices

107

Burden eigenvalue descriptors

topological charge indices

fffif

eigenvalue-based indices

Randic molecular profiles

j 12

geometrical descriptors

RDF descriptors

150

3D-MoRSE descriptors

160

WHIM descriptors

GETAWAY descriptors

197

functional group counts

154

atom-centered fragments

120

charge descriptors

molecular properties

I 21

2D binary fingerprints

780

I 22

2D frequency fingerprints

780

TOTAL

3224

SEPA

NC Bioinformatics STAR Center

Project 2:
Cheminformatics

How can biological information
& in vitro HTS data be
incorporated into QSAR
models?

0 QSAR models based on DSSTox published data files and structure-
inventories

q Share processed data files and calculated descriptors with EPA
researchers for public release
Q Coauthored publications:

> Zhu H, Rusyn I, Richard A, Tropsha A. (2008) Use of cell viability assay data improves the
prediction accuracy of conventional quantitative structure-activity relationship models of
animal carcinogenicity Environ. Health Per sped. 116:506-513.

> Zhu H, Ye L, Richard A, Golbraikh A, Rusyn I, Tropsha A. (2009) A Two-step Hierarchical
Quantitative Structure Activity Relationship Modeling Workflow for Predicting in vivo Chemical
Toxicity from Molecular Structure. Environ. Health Perspect. 117:1257-1264.

-------
DSSTox: Distributed Structure-Searchable
Database Network Project...& NC CEBC

14 current files, >15 substances, >1 OK structures
[External links: PubChem, ChemSpider, Lazar, ACToR
» Publishes high-quality standardized
structure-data (SD) files pertaining to
toxicology:

- EPA, HPV-IS, IRIS, NTP, FDA, NCBI, EBI...
~ SAR-ready summary tox data for modeling
t Public substance/structure ID registry
system

» Public forum for SAR file/data sharing
t DSSTox Structure-Browser

EPA

wCPDBAS (Rodent carcinogenicity data);

NTPHTS cytotox assays

aZEBET acute tox data (to be published)

• ToxCast Phase I chemical inventory
(TOXCST)

« ToxRefDB in vivo endpoints for
modeling

- a Processed data sets (ZEBET
acute tox)

•Calculated chemical
descriptors (DRAGON) for
ToxCast inventory

UNC CEBC

kNN Consensus QSAR Modeling of NTP-HTS Data

NTP HTS 1408

Carcinogenic Potency Database 1481

Chemical
Descriptors Only

(9 models >0.7
validation cutoff)

Chemical + 7 HTS
"Descriptors"

(34 models >0.7
validation cutoff)

>aCfta-»cai .UeKfflirCi

• boiapcii

UNC CEBC

Figure 3. Companion of the result* from fcNN QSAR
two type* of descriptor*.

ZhuH, Rusynl, Richard A, TropshaA. (2008) EHP116:506-513.

Can in vitro IC50 data be used to inform
development of model for in vivo Rat Oral LD50?

• No obvious correlation

o Can we break the problem
into regions of higher
correlation?

~ Can we use QSAR methods
to define those regions based
on chemical structure alone?

'e L, Richard A, Golbraikh A, Rusynl, TropshaA. (2009) EHP 117:1257-1264.

Can in vitro IC50 data be used to inform
development of model for in vivo Rat Oral LD50?

¦r Can in vitro IC50 data be used to inform

development of model for in vivo Rat Oral LD50?

1-1

0 2

IC50 fmmol/l)

. a Use "moving regression" to
define regions of higher correlation
9 Regions bear some
commonalities to "baseline toxicity"
representations

g Attempt to distinguish regions
based on chemical structure alone

•uH, YeL, RichardA, Golbraikh A, Rusynl, TropshaA. (2009) EHP 117:1257-1264.

^hiW^^^Richarc^^oibraikl^^usyrU^rogsh^^2009^H^1^25^26^

x External
Compounds

Linear (Baseline

Compounds)

a Step 1: Apply Classification QSAR to
assign new chemical to Class 1 or Class 2
o Step 2: Apply QSAR 1 or 2 to predict
LD50 based on chemical structure alone

a Step 3: Validate approach with external
data

~ Baseline

~ ounds

ficatbn QSAR ..

aseline

Compounds
Outliers

IC50 used to inform construction of
QSARs, but not needed for prediction

ToxCast & Tox21: High-Multi-Dimensional Data

'T-V

Chemical Structures

I I

M5=na»n»<.r""T,""'i

- Expert-derived chemical and MO A classes
Reactivity & Metabolic activity classes
Chemical feature classes _ _ _ _

XSTox

irnu" "

HTS Data

Sensitivity cutoffs
Activity groupings
Gene target groups
Pathway groupings

-------
Structure Class vs Bioactivity Class

Chemical structure class:
<» Cluster according to
activity and mechanism
« Differences in activity
profiles can discriminate
within structure class «

§1 ~ , -- i*5.

Bioactivity profile class: "MB "7° ***''

n HH ¦

o Can project onto multiple ^ vVj

chemical classes
« Potentially broader coverage of
chemical space

9 Implies mechanistic similarity

Assays

November 18, 2009 1\%

Data IVIning

Relational data models
Toxicological description
Data standards
Data integration
Summary activities

Predictive
Toxicology

HTS assays
Toxicogenomia
Metabolomics j
Mode-of-action

Toxico-

chemoinformatics

QSAR Modeling

Chemical properties
Structural descriptors
Chemical similarity metrics
Statistical associations

Chemical genomics
Chemical diversity
Chemical neighborhoods

Biological Profiling

-------
pl^ T1/^" cmimnmratal bifmformaiits and
CU. I ^ j Computational Trarioology Center

Environmental Bioinformatics and
Computational Toxicology Center (ebCTC):

Research in Multiscale Modeling
of the Effects of Environmental Toxicants

William J. Welsh and Panos G. Georgopoulos
www.ebCTC.org

Presented at USEPA Computational Toxicology Centers Progress Review Workshop
Research Triangle Park, NC - October 1,2009

Funded by the USEPA Science to Achieve Results (STAR) Grant RD-83272101

Consortium Members

UMpWJ

' " Koi'l HI vvoon fOIIS'SON
llprl MEDICAL SCHOOL

Computational Chemodynamics Laboratory,
Environmental & Occupational Health Sciences Institute
Department of Environmental & Occupational Medicine

Rutgers

Department of Pharmacology
Informatics Institute

Department of Biomedical Engineering
Department of Chemical & Biochemical Engineering

ji \ v a a v_j J—j i v w

"HE 5TAH WNIVIRSIiy

Department of Environmental Sciences

Department of Statistics

Princeton B
University A

Computer Aided Systems Laboratory,

Department of Chemical Engineering
Department of Chemistry

Program in Applied and Computational Mathematics

[cjT^v^ 1 S' Food 2nd Drug
Administration

Center for Toxicoinformatics,

National Center for Toxicological Research

ebCTC objectives and general approach

• Objectives

• To address toxicant Source-to-Outcome Continuum through development of an
integrated, modular, computational framework

• To develop predictive chem informatics tools for Hazard Identification and Toxicant
Characterization

• To demonstrate the above tools through applications in Quantitative Risk Assessment

• General Approach

• A computational/engineering/systems perspective

- utilizing a team of computational scientists and engineers, with diverse backgrounds in
bioinformatics, cheminformatics, and enviroinformatics

• New framework and tools build upon an extensive base of past developments

• The research effort emphasizes interaction and collaboration

- among participating scientists in the STAR Bioinformatics Centers

- with USEPA centers and laboratories

- with other centers and institutes of excellence

ebCTC research activities organized in 2 research areas and 5 projects

• Each project is developing a set of "stand-alone" components addressing specific CT problems

• Research Project 1 provides an integrative framework for Investigational Area 1

• Project 4 addresses the core issues of Area 2

Investigational Area I
Source-to-Outcome
Framework

Project 1 -UMDNJ/USFDA:
Multiscale biologically-based
modeling of exposure-to-dose-to-
response processes
P. Georgopoulos, W. Tong

Project 2 - Rutgers: Hepatocyte
Metabolism Model for Xenobiotics
M. lerapetritou, I. Androulakis

Project 3 - Princeton: Tools for
Optimal Identification of Biological
Networks

Project 4 -UMDNJ:
Chemoinformatics Tools
for Toxicant
Characterization
W. Welsh

Project 5 - Princeton:
Optimization Tools for
in silico Proteomics

C. Floudas

The ebCTC research integration plan is consistent with the ^Vision and
Strategy" outlined by NAS in 2007 for Toxicity Testing in the 21st Century

Chemical Characterization

Toxicity Testing

Toxicity Pathway! Targeted Testing

Dofi-Ri^onH and Extrapolation Modeling

Rg_re adapted from NAS (2007) Toxicity Testing in tte 2F C&itLry, A Vision and a Strategy

The ebCTC research integration plan is consistent with the "Vision and
Strategy" outlined by NAS in 2007 for Toxicity Testing in the 21st Century

Dose Response Assessment

Mode <* Action

Pocuatco-tawd
KVSt»

sss. e

¦n v*o md
towns?

kfl it

jhH

Haxard Iflfcrttftcsrtton

Risk Characterza!on Process

Figjre adapted from NAS (2007) Toxicity Testing in tte 2t> CSntiry: A

-------
ebCTC pursues an integrative multiscale research approach
(from molecules to cells to tissues to organs to organisms to populations)
recognizing the importance of processes/signals at all levels of biological organization

Primary

cultures of

human

hepatocytes

maintained

for 6 days

under

different

matrix

conditions

Cell

morphology
depends on
the topology
and

composition
of the matrix
environment

(A) Rigid collagen, type I, substratum with no overlay.

(B) Rigid collagen, type I, substratum with a Matrigel overlay.

(D) Matrigel substratum. Bar50|jm

From Hamilton etal. (2001) Tissue Res 306:85-99

A representative sample of USEPA/ebCTC
project interactions and collaborations

• Toxicogenomic analysis of phthalate exposure data

S. Euling, B. Benson, W. Cniu, L.E. Gray, S. Hester, C. Keshava, N. Keshava, S. Makris, C. Thompson,
V.Wilson (USEPA);

I. Androulakis, M. Ovacik, M. Ierapetritou (ebCTC)

• Toxicogenomic analysis of conazole exposure data and in vitro species extrapolation in primary
hepatocytes

S. Hester, D. Wolf,W. Ward (USEPA);

M, Ierapetritou, P. Georgopoulos, I. Androulakis, W. Welsh, V. Iyer (ebCTC)

• Development of integrated PBPK/PD models for Arsenic and its compounds
E. Kenyon, H. El Masri (USEPA);

S. Isukapalli, P. Georgopoulos, C. Brinkerhoff, A. Sasso, S. Stamatelos, I. Androulakis, M. Ovacik
(ebCTC)

• Computational tools for reconstructing exposures from biomarkers
M. Tornero-Velez, C. Dary, D, Vallero, L. Reiter (USEPA);

S. Isukapalli, P. Georgopoulos, A. Sasso (ebCTC)

• Incorporating the effects of aging on Physiologically Based Toxicokinetic (PBTK) models
M. Tornero-Velez, M, DeVito. E, Kenyon, M. Evans (USEPA);

P. Georgopoulos, S. Isukapalli, A. Sasso (ebCTC)

• Optimal analysis of proteomic data
M, Hemmer, C. Walker (USEPA);

C. Floudas (ebCTC)

• Computational modeling of cellular signaling pathways: Implications for dose-response (modular
infrastructure for virtual organs - liver, skin)

I. Shah, R. Judson (USEPA); P. Georgopoulos, S. Isukapalli, M. Ierapetritou, I. Androulakis, C.
Brinkerhoff, C, Roth, W. Welsh, H, Rabitz (ebCTC)

~ Interactions of ToxCast chemicals with liver Nuclear Receptors
R, Judson, D. Dix, I. Shah (USEPA); S. Mani, W. Welsh (ebCTC)

elCTC™

A list of ebCTC's
peer-reviewed publications

..1 hmiiiliirtTuUf. 4ml can be found at

in..iul I..xri i.Iih-v (Vim. i sbctc.org/piiblications.htmI

Number of ebCTC Publications by Year

Research Area I:

A Source-to-Outcome Framework
to Support Risk Characterization

GENOME

A general mathematical framework for environmental health risk analysis
must consider multiscale bionetwork dynamics

(spanning the genome, transcriptome, proteome, metabolome, cytome, physiome)
linked with the dynamics of environmental ("extragenomic") stressor networks

Eplgenetlc MieiMHmena

Gene Expression
tissue:

In cells and
•

Bioregulatory and Metabolic Networks

Cellular protein composition
and activities

Cellular and tissue metabolic
concentrations and fluxes

t '

Dietary

Gut

"rnicroblolome" |

1 Drugs

Environmental
toxicants

-------
Connecting genotypes with phenotypes to assess toxicokinetic and toxicodynamic
variability - and associated disease susceptibility - to environmental agents
requires integrating data/information across multiple biological levels

PrtttaofMca/lntorociomlca

frwnlatomlcs

Research to address the toxicant "Source-to-Effect Continuum"
through development of an integrated, modular, computational framework

CERM: Center for Exposure and Risk Modeling

MBSITOR: Modeling BMvironment for TOtal Risk studies (development started in 1993 with CDC fuming; USEPA finding commenced in 1998)

ebCTC: environmental bioinformatics and Computational Toxicology Center

DORIAN: DOse-Response Information Analysis system (developmentstarted in 2006 with USEPA STAR fundng; consortium of UMDNJ-RWJMS with
Rutgers. Princeton and USFOA)

RESEARCH FOCUS
AREA FOR

CERM/MENTOR

winwit

Enwomwtti
InlormaUw* Library

P*H»lolofliMl & BodMnuul
Intomatlofl library

Oom RM$6HM MHyvi

ANALYSIS MERGING|

RESEARCH FOCUS
AREA FOR
l ebCTC/OORIAN

Ub»iry

Pathway feuCytlt Toort

A general Bayesian framework for exposure reconstruction
from inversion of biomarker data (individual and population)

Novel methods have been developed that allow the systematic construction of
Fast Equivalent Operational Models (FEOMs);
these include the Stochastic Response Surface Method (SRSM)
and the High Dimensional Model Representation (HDMR)

A "sample" of on-going applications within Research Area I of ebCTC
(including various Risk Assessment Demonstration applications)

Air Contam inant Applications

Multimedia Applications

¦ urban/local/personal scale
inhalation exposures to complex
mixtures of co-occurring ozone,
PM, other criteria pollutants,
and air toxics,

¦ exposures to contaminant
releases from forest and urban
fires,

¦ exposures to contaminant
releases from chemical facility
accidents,

¦ exposures to bioaerosols
(ranging from anthrax spores to
birch and ragweed pollen),

¦ etc.

¦ exposures to mixtures of metals
and metalloids (Hg, Cd, Cu, As,
etc.) and their compounds,

¦ exposures to pesticides
(organophosphates, conazoles),

¦ exposures to organic solvents,

¦ exposures to water chlorination
by-products,

¦ exposures to phthalates,

¦ exposures to PCBs and dioxin-
like compounds,

¦ exposures to CWAs,

¦ etc.

MENTOR employs an "anthropocentric" (person-oriented) approach, linking
multiple scales of macroenvironmental and local models and information with
microenvironmental conditions and human activities in time/space

M icroenviron menta l/exposure/dose model ing system

Source: 3MRAUser Guide 2002 Source: Gecrgopoulos et al., ES&T, 1997, 31(1)

DORIAN aims to provide multiscale integration of toxicokinetic and
toxicodynamic processes (cell to organism) for dose-to-response studies

-------
MENTOR employs an "anthropocentric" (person-oriented) approach, linking
multiple scales of macroenvironmental and local models and information with
microenvironmental conditions and human activities in time/space

Human activities determine pathways of exposure

Example: Cumulative distributions of total As (left) and TCE (right) in urine from
MENTOR predictions for Franklin County, OH compared with the measurements
from NHEXAS-V (corresponding percentiles) for different age groups

la2 |jgA- is tha detection limit fcr TCE in the NHEXAS

Comparison of total
As concentration
levels in urine
samples of the
NHANES population
with corresponding
MENTOR predictions

(2009). Manuscript submitted to Envircn

MENTOR-3P/DORIAN
provide a new
modular "whole body"
platform for consistent
characterization of
multicontaminant
toxicokineticand
toxicodynamic processes
in individuals and
populations;
incorporate physiology
databases to account for
intra- and interindividual
variation and variability

Generic compartmental substructure

Individual and population human biology (physiology and biochemistry)
changes non-uniformly with development, aging, disease, drug treatment, diet,
environmental exposures, etc.

Weights of water, fat,
protein, and other
components as a function
of age, from birth to one
year of age. [Figure
reproduced from Fomon
(1966) with permission
from W.B, Saunders Co.]

Hepatic cytochrome CYP1A2
and CYP2E1 in children of
various age groups as a
percentage of adult weights
(from Cresteil, 1998).

Organ weight from birth to
adolescence in boys (based
on Haddad et al. 2001)

Example references:

- WHO (2006). Principles for Evaluating Health Risks in Children Associated with Exposure to Chemicals. World Health
Organization. Environmental Health Oiteria 237;

- USEPA (2005) Use of PBPK models to quantify the impact of human age aid interindividjai differences in physiology and
biochemistry pertinent to risk;

-Thompson, et al. (2009) J ToxicolEnvtonHealthB 12. 1-24.

MENTOR/DORIAN offes a "whole organism"
modular toxicokinetic/toxicodynamic platform for incorporating
organ/tissue representations at various levels of detail
(on-going projects focus on lung, skin and liver)

Data from Jaques & Kim
(2000) and Daigle, et al.
(2003) studies at rest
and during moderate
exercise

Experimental data
compared to model
predictions using MPPD2,
; ICRP,and (HUMTRN-
""" derived) module of
MENT0R-3P;
experimental conditions
used as model inputs

/ *
-------
Lung and skin models are critical for assessing exposure and intake/uptake;
liver is critical for biotransformation and elimination of xenobiotics:
recent/current liver modeling efforts in MENTOR development focus primarily on
computationally efficient representations of the effects of heterogeneity

An overview of different mathematical descriptions of the liver
for simulating toxicokinetics and toxicodynamics

WellStirred^R,

Hil-

ne-Compartment Models

References

Plug flaw (PFR>

—

-l^asd-

h dofd h o,

pang, etal. (200 7)Aaps Journal 9, E2BS-E233; Weiss

through residence time analysis.

phaim sc, 96. 913-926

HK l/\h

Rep rese ntat >o n of h et emgen eity th rough a

stochasticAiactal models

Heterogeneity in the liuer modeled through

c^mseries

fith

Multiple regions of the l-erwith different upta

Abu-za hra and Pang (2000J Drug Metab Dispos 23. *17-

com pa it mental) model

deeptusue!'1^ pl"'p,,t"S- mMat"11

Bl^dersen.etal.l^T^olApplPharm,

Back-mixing plus fixed lag

u-j l~j

Anissimovand Roberts (2OT21 J Phaimacokinet

sinusoids

~LH H ri_r

phaimacodyn 29.131-56

zonal model w,th sign,f,cant back m»ng. with

Amssimovand Roberts [2OT21 J Phaimacokinet

^QOQ-

cellularspace.

phaimacokinet Biophaim

An overview of different mathematical descriptions of the liver
for simulating toxicokinetics and toxicodynamics (continued)

Discrete, Agent-Based Models

:'K>.

"Higher Dimensional" Models

An issimov, et ai. (199 7) J T heor Biol 188,89-

flow.

Biomed En* 35, 474-491

; —!

Venation in uptake and rretabolic properties across

I"™0"3"'3'0"61"6'"

issimov, et ai. (199 7) J T heor Biol 188,89- 3D 1

Seme as distributed zones but with intermittent

An issimov etal (1
-------
Research in-progress: "reconciliation" of biotransformation and transport of
As modeled at both the individual hepatocyte and the whole organ level

(preliminary/unpublished results from Stamatelos eta!,, 2009 and from Sasso eta/,, 2009)
Cellular arsenic uptake and metabolism

_ fwfxMacytt! i in.11 ' ' i

excretion in human
volunteers after
exposure to iAs

Solid lines:
predictions of the
IVENTCR/DORIAN
rrultiscale whole-
body PBTK model
Dashed lines:
predictions of the
IVEMTCR-3P
implementation of
the El-Masri et al.
PBTKM

Data (from El-Masri
S Kenyon 2008):
controlled exposure
experi rrents of
humans ingesting
100 AsIII (top)
and 100 |_ig AsV
(bottom)

Research in-progress: "reconciliation" of biotransformation and transport of
As modeled at both the individual hepatocyte and the whole organ level

(preliminary/unpublished results from Stamatelos eta!,, 2009)

Schematic of component interactions of the
TK/TD model; arrow and hammerhead indicate
activation and inhibition respectively

Cellular arsenic uptake and metabolism

hepalocyte

MENTOR/DORIAN TK/TD rrodel predictions of ti rre-dependent
transcription (left) and translation (right) of GCLC enzyrre in m
hepatocytes. The measured values represent the fold increase of rrRNA
and protein levels of the enzyrre after exposure to 10 |_M of i AsIII. The
data depict average values iSEM of at least three independent
experi rrents from Thompson et al. 2009

Modeling quantitative metrics of oxidative stress from exposure to TCE

(preliminary/unpublished results from Brinkerhoff etal., 2009)

Toxicokinetic model

Toxicodynamic model

^^d^oxico^DD^hamgtt^992^5:

Research Area II:
Hazard Identification

Molecular-scale methods

Characterize molecular components and interactions
at a "local" (e.g. ligand-receptor) scale

Receptor-based Approaches

Ligand-based Approaches
(QSARs, eta)

Virtual Screening;
Data Analysis

-------
Ligand-Receptor Interactions
Pregnane X Receptor (PXR)

» PXR modulates the transcription of metabolic enzymes and >36 other genes.
• PXR co-regulates the CYP3A4 metabolic gene and the ABCB1 "drug efflux" gene

[Synold,TW, et al., Nature Medicine 7,584-590 (2001).]

»Involved in many drug-drug interactions, giving rise to adverse drug effects.
•Many xenobiotics activate or repress the transcriptional machinery of PXR.
» Studies on PCBs show that the responsive active of PXRs to xenobiotics varies
from species to species.

< fc V

PXR ligands are pervasive and structurally diverse

• bile acids (bile salts, cholesterol metabolites)

• food ingredients, dietary supplements (e.g., isothiocyanate sulforaphane in broccoli)

• prescription drugs (e.g., statins, paclitaxel, antibiotics, azole antifungals, rifampicin)

• herbal components (e.g. hyperforin in St. John's Wort)

• environmental chemicals (EDCs, pesticides, plasticizers, PCBs, PBDEs)

PXR and Xenobiotics

Ory, DS. Nuclear receptor signaling in the control of cholesterol homeostasis: Have the Orphans Found a Home?
Circ. Res. 95:660-670 (2004).

Tabb MM, Kholodovych V, Griin F, Zhou C, Welsh WJ, Blumberg B. Highly chlorinated PCBs inhibit the human
xenobiotic response mediated by the steroid and xenobiotic receptor (SXR). EHP 112:163-169 (2004).

Yu S, Kong AN, Targeting carcinogen metabolism by dietary cancer preventive compounds, Curr Cancer Drug
Targets 7(5):416-24 (2007).

Goetz AK, Dix DJ. Mode of action for reproductive and hepatic toxicity inferred from a genomic study of
triazole antifungals. ToxicolSci. 110(2):449-62 (2009).

Lin YS, Yasuda K, Assem M, Cline C, Barber J, Li CW, Kholodovych V, Ai N, Chen JD, Welsh WJ, Ekins S, Schuetz
EG. The major human pregnane X receptor (PXR) splice variant^ PXR.2, exhibits significantly diminished
ligand-activated transcriptional regulation. Drug Metab Dispos. 37(6): 1295-304 (2009).

Kortagere S, Chekmarev D, Welsh WJ, Ekins S. Hybrid scoring and classification approaches to predict human
pregnane X receptor (PXR) activators. Phatm Res. 26(4):1001-11 (2009).

Unusual PXR antagonist binding site of conazoles

A series of conazoles antagonize PXR (10-20fiM); mutagenesis data indicate
that they bind to the outer surface of PXR—AF-2(H12) binding site

Huang et al,, Oncogene 26 : 258 (2007); Wang et al., CUn Cancer Res 13: 2488 (2007)
conventional structural model for nuclear receptor agonist and antagonist action

JMKfjk 1

Ik iflltgfAtil

AGONIST

ANTAGONIST

_ f ? i

Yi f I

Hydrophobe / aromatic ring

«»

Ekins, Welsh, et al., Mo! Pharmacol72:592-603 (2007).

Unusual PXR antagonist binding site of conazoles

• Using ligand-PXR docking simulations, we identified an alternative antagonist
binding site anchored by Lys277 located in the AF-2 site

• Lys277 most likely serves as a "charge clamp" for interaction between the co-
activator SRC-1 (His687) and PXR

• Conazoles compete with binding of co-activator SRC-1 to the AF-2 site

Ekins, Welsh, et al., Mol Pharmacol 72:592-603 (2007)

Methods Development for Data Analysis
Analysis of Toxcast 309 Data Set

Biological Spectra Analysis (BSA):

Link biological activity profiles to molecular structures

• Traditional (Q)SAR methods use the structure-based features
(molecular descriptors) of a collection of chemicals to describe
and compare their biological activities.

molecular structure i S bioactivity

• In contrast, BSA uses the biological response profiles of the
chemicals to describe and compare their molecular structures.

molecular structure S 1 bioactivity

Fiiri Af etal PNAS 12(2), 261-266(2005)

-------
Biological Activity Spectra (BSA)
- depicted as a heat map -

OC~a~ 1

Tioconazole

§
1

3j_

I §

]

ill

i » S

U 5

Ooiinuda bj

¦a I o

ID 1 IS

.« 1 4f

• tt | •HI

111 " or hi

Of* tfcncmionil HimpKtnCoi

The pawn InhibUkn wiluo u c two a colunnj; scheme

Fliri AF etal PNAS 2005,12(2), 261-266

Heat map for a collection of chemicals
and a panel of protein receptors

Two-way Hierarchical Clustering

clusters proteins based on similarity in their bioresponse profile

clusters chemicals
based on similarity
in their ability to
induce bioresponse
profile

Fliri AF et al PNAS 2005,12(2), 261-266

BSA study on assay data from Attagene, Inc.

• Transcription Activation (TA) assays

• 309 ToxCast chemicals @81 assays

• Reported LEL (lowest effective level) values from each assay

• Inactive chemical-assay combinations were assigned LEL =1000000

• Two-way hierarchical (UPGMA) clustering from Bioinformatics Toolbox
v.3.1, MATLAB 7.6

• Analysis employed both Euclidean distance and Cosine metrics

• Assay results and calculated molecular descriptors were pre-processed using
Unsupervised Forward Selection (UFS)

BSA study on Attagene Data

81 Attagene Assays

Dark regions - Compounds with measurable reported LEL (lowest effective level).
White regions- Inactive chemical-assay combinations (LEL =1000000).

Heat map for reduced set of 28 assays

28 Attagene assays (values
standardized along columns)

Hierarchical clustering of ToxCast chemicals in the space of Attagene response
biospectra reveals two major clusters TOX1 (red) and TOX2 (violet-green-lime-blue).

Heat map for space of chemical descriptors

TOXCAST* +EPI Suite descriptors

Hierarchical clustering of ToxCast chemicals in chemical descriptor space reveals two major clusters CHEM1
(red-violet) and CHEM2 (blue-green).

'HDXC^^^^ombine^e^^^adscope^ikPro|^n^hysChen^enve^escnptor^^^^^^^^^^^^^^^^^^^^

-------
Connection between similarities
in biospectra and chemical space

T0X2

W? "ix

HEM2

OX-

TOX similarity profile

CHEM similarity profile

No obvious chemical similarities within individual subclusters.

Cross-mapping of TOX and CHEM spaces

Ligand-based Models,
Rapid Virtual Screening
&

Chemical Prioritization

Shape Signatures

molecules are compared by subtracting their histograms

17p-estradiol

— Diff= 0.082

Small Drff value means that
two molecules have similar shape

Ranked Hit List

^ocr°

Shape-based QSAR Models for Toxicity Prediction

• Cardiotoxicity

- hERG

- 5HT2B

Chekmarev DS, Kholodovych V, Balakin KV, Ivanenkov Y, Ekins S, Welsh W1 Chem Res Toxicol. 21(6): 1304-14 (2008).

• Neurotoxicity

- bl ood-brai n barri er (B B B) permeabi I i ty

Kortagsre S, Chekmarev D, Welsh WJ, Ekins 5. Pharm Res. 25(8): 1836-45 (2008).

• Hepatotoxicity

- PXR induction & repression

Pi N, Krascwski MD, Welsh WJ, Ekins S. Drug Discov Today 14f9-101:486-94 (2009).

Lin YS, Yasuda K, Assem M, Cline C, Barber J, Li CW, Kholodovych V, Ai N, Chen JD, Welsh WJ, Ekins S, Schuetz EG.
Drug Metab Dispcs. 37(6): 1295-304 (2009).

• Pesticides

- acetylcholinesterase inhibitors

Chekmarev D, Kholodovych V, Kortagere S, Welsh WJ, Ekins S. Pharm Res 26(9):2216-24 (2009).

• Fungicides, Herbicides, Insecticides

Analysis of Pesticides

Data on pesticides were collected from The Pesticide Manual.
http://www.pesticidemanual.com/index.htm

Herbicides 300 compounds
Fungicides 169 compounds
Insecticides 277 compounds

Cluster B: 48 Inhibitors of
acetolactate synthase (ALS)

-------
Analysis of pesticides

dL tiL.

Molecular properties distribution:

MW, drug likeness, etc

Enrichment study on Herbicides
• variable descriptor-based techniques ! -

> Shape Signatures alone \ -

> Shape Signatures + MOE

> Shape Signatures + MA.CCS keys

&
.¦g

Retrieval study witn ju,uuu nu - - « i. - - - • -
Cluster B is only 0.16% of entire set

Enrichment factor E > 250

E(random) = 1; E(ideal) = 600 Ha

Hr |i

1 I

Interactions/ integration
of ebCTC research projects

ti\

Dr. Herschel Rabilz

Project Principal Investigate"

I M 33 ,• •j(RTO3)
I . JPrircetcn Uriversity

I Dr. Christodoulos Floudas

¦Project Prirci pal Investigator
¦ (PP05)

(prircetcn Uriversity

Acknowledgments

• Funding

• USERA funded environmental bioinformatics and Computational Toxicology
Center (ebCTC) (STAR Grant RD-83272101)

• Collaborators within UMDNJ-RWJMS, Rutgers, Princeton, USFDA as well as
many other academic institutions, including the Albert Einstein College of
Medicine, University of Pittsburgh, Mount Sinai Medical School, University
of Montreal, etc.

• Numerous collaborators within the USEPA National Center for
Computational Toxicology and other USEPA Laboratories and Centers

Please visit our website
www.ebCTC.org for events, news, publications, contacts

Viewpoints expressed here are the responsibility of the authors and
do not necessarily reflect views of USEPA or its contractors.

-------
v>EPA

NJ Environmental Bioinformatics and
Computational Toxicology Center - EPA
Collaboration on An Approach to Using
Toxicogenomic Data in Risk Assessment:

Dibutyl Phthalate (DBP) Case Study

Susan Y. Euling, Meric Ovacik, and Ioannis P. Androulakis

•1.V US

Comp Tox Center STAR Progress Review Workshop
RTP, NC

October 1, 2009

eoftheauttiorsand do not necessarily reflect the vi

or policies of the U:S. Environmental Protection Agency.

STAR Bioinformatics
Center/ebCTC Collaborators

Ioannis P. Androulakis

Meric Ovacik

at Rutgers University

HOW CAN GENOMIC DATA BE
USED EFFECTIVELY IN RISK
ASSESSMENT?

Strengths and Limitations of Using Omic Data in Risk Assessment

STRENGTHS

Powerful because global
Can identify:

PROJECT GOALS:

1. Develop an approach for using toxicogenomic data in risk

assessment.

2. Perform a case study using this approach.

CASE STUDY SCOPE:

¦ Use an ongoing or completed assessment as starting point.

Evaluate available data; not a data generation project

Selected DBP for case study:

. has a relatively large genomic data set and phenotypic
anchoring for some of the observed gene expression
changes.

. case study is separate from IRIS assessment with a
different purpose.

-------
Proposed DBP Mechanism of Action

Toxicogenomics Dataset:

Studies of Male Rat Tissues after in utero DBP Exposure

Case Study Project:

Pathway Analysis of Liu etal. Microarray Study

Issue:

~ Differentiating signal from noise in microarray
studies

Explored use of:

~ Signal-to-noise ratio (SNR) method for
identifying DEGs

~ DEG filter methods comparison: SNR to
Rosetta Error Model (REM)

NEW ANALYSIS OF LIU et al. DATA:
COMPARISON OF TWO STATISTICAL FILTERS

IN COMMON PROCESSES & PATHWAYS IDENTIFIED (SNR & REM)

BIOLOGICAL PROCESS PATHWAYS

CELL ADHESION Cytoskeleton remodeling; ECM remodeling; Endothelial cell contacts by junctional mechanisms;

Ephrins signaling; Integrin inside-out signaling; Integrin outside-in signaling; Integrin-mediated cell
adhesion; Reverse signaling by ephrin B

CELL SIGNALING Activation of PKC via G-Protein coupled recepton CCR3 signaling in eosinophils; ChREBP regulation

pathway; G-Protein beta/gamma signaling cascades; G-Proteins mediated regulation p38 and JNK
signaling; Regulation of actin cytoskeleton by Rho GTPases; Role of PKA in cytoskeleton
reorganisation; Leptin signaling via JAK/STATandMAPKcascades2
DISEASE NF-AT signaling in Cardiac Hypertrophy; NTS activation of IL-8 in colonocytes

GROWTH & WNTsignaling pathway; Regulation of acetyl-CoA carboxylase2 activity in muscle; MAG-dependent

DIFFERENTIATION inhibition of neurite outgrowth; EPO-induced Jak-STAT pathway; Angiotensin signaling via STATs;

Angiotensin activation of ERK
HORMONES Ligand-dependent activation oftheESR1/SP pathway

IMMUNE RESPONSE CXCR4 signaling pathway; MIF - the neuroendocrine-macrophage connector

METABOLISM Androstenedione and testosterone biosynthesis and metabolismp.1; Cholesterol Biosynthesis;

Cholesterol metabolism; dATP/dlTP metabolism; dGTP metabolism; Estrone metabolism; Fructose
metabolism; G-alpha(q) regulation of lipid metabolism; Gamma-aminobutyrate (GABA) biosynthesis
and metabolism; Glutathione metabolism; Glycolysis and gluconeogenesis (short map); Glycolysis
and gluconeogenesis p. 1; Glycolysis and gluconeogenesis p. 2; Histamine metabolism; Histidine-
glutamate-glutamine and proline metabolism; Leucine, isoleucine and valine metabolism.p.2; Lysine
metabolism; Mitochondrial ketone bodies biosynthesis and metabolism; Mitochondrial long chain
fatty acid beta-oxidation; Mitochondrial unsaturated fatty acid beta-oxidation; Peroxisomal branched
chain fatty acid oxidation; Phenylalanine metabolism; PPAR regulation of lipid metabolism;
Propionate metabolism p.1; Propionate metabolism p.2\ Regulation of fatty acid synthesis: NLTP and
EHHADH; Regulation of lipid metabolism by niacin and isoprenaline; Regulation of lipid metabolism
via LXR, NF-Y and SREBP; Regulation of lipid metabolism via PPAR, RXR and VDR-, Serotonin -
melatonin biosynthesis and metabolism; TCA; Triacy I glycerol metabolism p.1; Tryptophan
metabolism

TRANSCRIPTION Transcription factor Tubby signaling pathways; Role of VDR in regulation of genes involved in

osteoporosis; Brcal as transcription regulator

Exploratory Methods Development for
Analysis of Genomic Data for
Application to Risk Assessment

Issue:

~ For risk assessment, we're interested in affected pathways;
traditional pathway analysis methods may lose gene and
pathway information

Explored use of:

~ Pathway Activity Level method & utilized the results to build
a gene network model.

-------
Pathway Activity Level Approach

•Adapted method of Tomfohr, J; Lu, J; Kepler, TB. (2005) Pathway
level analysis of gene expression using singular value
decomposition. BMC Bioinformatics 6:225.

•Identifies impact on a pathway without 1st identifying differentially
expressed genes

Advantages:

•Considers all genes (whether DEG or not) in a pathway

•Can compare PA among pathways

Putative Metabolic Gene Network Based on the Pathway Activity Method:
Liu et al. (2005) Data and KEGG Database

mm) dbp treated

^ of SttioMs ^

Cttjrfts Cycl*

Pywvjt*

Atrf Mec.it

oliini

1 ^

Source: I. Androulakis and M. Ovacik |

Exploring Methods to Measure Interspecies
Differences in Toxicodynamics

Issue:

<• Need for approaches and metrics to extrapolate
from animal model to human for risk assessment.

Explored use of:

<• Utilizing available data to develop cross-species metrics
for the biosynthesis of steroids pathway -

1) DNA sequence data: Compared predicted amino acid
sequences of proteins

2) Enzyme presence data

Team Members

U.S. EPA

Susan Makris (NCEA, ORD)

Banalata Sen (formerly NCEA)
Andrea S. Kim (formerly NCEA)
Bob Benson (Region 8)

Channa Keshava (IRIS, NCEA, ORD)
Nagalakshmi Keshava (NCEA, ORD)
Susan Hester (NHEERL, ORD)

Vickie S. Wilson (NHEERL, ORD)
L. Earl Gray Jr. (NHEERL, ORD)
Chad Thompson (formerly NCEA)
Weihsueh Chiu (NCEA ORD)

THE HAMNER INSTITUTES
for HEALTH SCIENCES

Kevin W. Gaido

NIEHS

Paul M.D. Foster
Lori White

NCER STAR BIOINFORMATICS
CENTER/ebCTC

loannis P. Androulakis (Rutgers)
Meric Ovacik (Rutgers)

Marianthi G. lerapetritou (Rutgers)
Panos P. Georgopoulos (UMDNJ)
William Welsh (UMDNJ)

Final Report Available on the NCEA Website

An Approach to Using Toxicogenomic Data in U.S.
EPA Human Health Risk Assessments:
ADibutyl Phthalate Case Study

Ava ilab le at: h ttp: //cf pu b. ep a. gov/n cea/cfm/recor d is play, cfm ? deid=205303

-------
The Carolina Environmental
Bioinformatics Center (CEBC)

One of two EPA STAR Centers funded in November 2005,
intended to extend capabilities in computational toxicology

Specific capabilities highlighted included 'omics expertise and
strengths in elucidating genetic variation

Here we describe the Center and highlight recent
collaborations

Organization

Three major Research Projects: (1) Biostatistics, (2)
Cheminformatics, and (3) Computational Infrastructure
for Systems Toxicology
Administrative Unit

Outreach and Translational Activity (POTA)

Each project includes direct collaboration with
environmental scientists

Progress

• Publications

• Collaborations with environmental scientists

• UNC awarded a second STAR Center (2008), The
Carolina Center for Computational Toxicology (CCCT,
Ivan Rusyn, P.I.)

• software development, and web tools

Representative Joint Publications with EPA

•Harrill JA ,Li Z, Wright FA, Crofton KM. Transcriptional response of rat frontal
cortex following acute exposure to the pyrethroid insecticide permethrin or
deltamethrin. BMC Genomics, 2008 Nov 18;9(1):546
•Harrill JA, Li Z, Wright FA, Crofton K (2007). Transcriptional response of rat
cerebrocortical tissue following acute exposure to the pyrethroid insecticide
permethrin or deltamethrin, submitted.

•Judson R, Elloumi F, Setzer WR, Li Z, Shah I. (2008) A Comparison of Machine
Learning Algorithms for Chemical Toxicity Classification Using a Simulated Multi-
Scale Data Model BMC Bioinformatics, Vol. 9, 241.

•Li Z, Wright FA and Royland JE. Age-dependent Variability in Gene Expression in
Fisher 344 Rat Retina. Toxicological Science, 2008 Nov 18;9(1):546.

•Zhu H, Rusyn I, Richard A, Tropsha A. Use of cell viability assay data improves the
prediction accuracy of conventional quantitative structure-activity relationship
models of animal carcinogenicity. Environ. Health Perspect. 2008; (116): 506-513.
•Zhu H, Tropsha A, Fourches D, Varnek A, Papa E, Gramatica P, ObergT, Dao P,
Cherkasov A, Tetko IV. Combinatorial QSAR Modeling of Chemical Toxicants Tested
against Tetrahymena pyriformis. J. Chem. Inf. Model. 2008; (48): 766-784.

•Zhu H, Ye L, Richard A, Golbraikh A, Rusyn I, Tropsha A. A Two-step Hierarchical
Quantitative Structure Activity Relationship Modeling Workflow for Predicting in
vivo Chemical Toxicity from Molecular Structure. Environ. Health Perspect.
Submitted.

-------
While the CCCT is more highly focused on
biology and mechanistic modeling, the CEBC
focuses on discovery and obtaining valid
statistical conclusions.

Discovery anc^^bstic
Modeling

Discovery-iMihuil

(1) Biostatistics iri Computational
Toxicology

Existing emphasis on strengths in
microarray analysis, elucidation
of networks/pathways, eQTL
analysis

New emphasis on dose-response
testing, data mining, and
penalized regression
Analysis of ToxCast Phase I data:
from EPA and development of
related methods will likely be a
large portion of remaining
activity

PWhii Bom Dnponw P>oflk»

(2) Cheminformatics

seeks to establish a universally
applicable and robust predictive
toxicology modeling framework
Focuses on Quantitative
Structure Activity/Property
Relationships (QSAR)

Establishes a modeling
workflow, toxicity prediction
scheme and software
development

==r-

5s-.

(3) Computation and
Systems Toxicology

Uses model for toxicity profiling in
multiple strains of mice to set up
computational infrastructure
Computational methods
development

Develops user-friendly software
tools from methods in Projects 1
and 2

Project 1:

Biostattstfcs In Computational Toxicology

• Fred Wright, Ph.D. (P.I.) -statistical genetics, genomic
analysis

• Andrew Nobel, Ph.D. - clustering, data dimensional
reduction, genetic pathway analysis

• Other faculty have been phased out

• Zhen Li, M.S. - all of the above

• Partial postdoc and student positions

-------
Project Objectives

•provide biostatistical support to the Center

•perform data analysis and develop methods

•collaborate with EPA and the computational toxicology
community.

Recent Activities

• Direct collaborations and data analysis

- Work with Project 2 investigators on toxicity
prediction/data mining methods

- Work with Project 3 investigators on rodent toxicity
and eQTL mapping

- Analysis of clinical toxicity and metabolomicdata to
explore a large number of prediction approaches

- abstracts on ToxCast data and proposed analyses for
prioritization of chemicals

- Expression QTL mapping relevant to toxicity

At any one time, about 3 active analysis projects
-Collaborations inspire new methods development
-A recent example:

Our pathway analysis procedure SAFE used to identify pathways...

,ijnTiTiiiiiiimiiii^ .i i ii

J? — II

...followed by experimental evidence of pyrethroid effects on the total
number of branch points in primary cortical cell cultures exposed to
deltamethrin or permethrin.

Deltomethrin Permethrin

A. *

D. **

1 1

l r-T

1 1

^ S *

This experience, in addition to exposure to dose-
response data from NCCT personnel, got us thinking...

• Relatively few methods for dose-response that are tuned to
gene expression studies

• Even fewer that consider "pathways" (gene sets)

• A primary challenge is maintaining appropriate type I error
control for individual transcripts, whether parametric or not

• We would like methods to be fast, for permutation or
bootstrapping.

• How to aggregate evidence across transcripts within a
pathway?

-------
Dose response modeling for gene expression and pathways

YtJ = f (dt6) + sij, stJ ~ N(0,cr2)

Y-j is (continuous) response of thej-th subject on the i-th dose
dt; 6 is the vector of parameters for the distribution/

We have performed extensive
investigation of simple
(approximate) two-parameter
logistic fits, establishing
reasonable false positive rates
and power for small sample sizes

Power of simple dose-

\ i

response

k °

\ /

approximations (linear

after transformation)

parameter logistic model;

near model; --fast sigmoid curve

So that we can build on top of our existing
gene expression pathway analysis software

Dose-Response Pathway Analysis for Gene Expression Date

Pathway Dosa Rs»po«ii« Profit®*

I :C31G3 [ iG3
[ l'":J |.GuD

Icsiolo

•Several other collaborative examples

Prediction of in vivo toxicity endpoints from
ToxCast™ Phase I data using a variety of
machine learning approaches

D«£*jihwa of B•(£.)>

LT Tat-UngC&j' RuaWcAiocr W* Sun - Fm2irj hanRutfTr'

CwpaninaM of Bmiuiaa Th* CwMna C•ram Enwvavmnui
_rtri , . . awCotW*°Timaokqy Uwwmfrcf Mont*Cm**

•309 chemicals cmp*H« y»z -SASCuy NC DtpOTw*merwomnwai
soarcn anS Enor**""! Woaty Of HortA CatafcM

• over 70 toxicity endpoints to be predicted

• 600+ bioassay results

•1224 Dragon chemical descriptors provided by Drs. Hao Zhu and AlexTropsha
(Project 2) as additional toxicity predictors.

•Extensive work on cross-validation and ROC area under the curve (AUC)
assessment of 84 (and now nearly 200) prediction models provides a global
view of the strengths and weaknesses of various prediction approaches
(details in talk at ToxCast Data Analysis Summit web site).

-------

Both chemical descriptors and ToxCast assays
remain in the predictive models

WinMon

f :

Iflf

nil

.111 ill

Even biologically naive prediction models suggests
improvement for several end points

Endpoint

CHR_Rat_LiverProliferativeLesions
D EV_Rat_U rogenitalJJreteric
CH R_Rat_LiverT um ors
DEV_Rat_U rogenital_Re.l I
CH R_Rat_LiverNecrosis
MGR_Rat_LiveBirthPNDl
DEV_Rat_Orofacial_CleftLipPalate
CH R_Rat_Testicu I arA trophy
MGR_Rat_Fertility
MGR_Rat_Prostate
MGR_Rat_Epididymis
MGR_Rat_l m pi antations

percent AUC improvement
ToxCast Phase I assays over
chemical descriptors

1R296
6%

696
1%

596
396
3%

A more refined view

Se ns it ivity/s pec if i c ity
tradeoffs may be
favorable, and better
prediction for certain
applicability domains

prediction

Additional methods development in Project 1
(one example)

• Methods for detecting true "trans-bands" in eQTL
studies

"Real" or not?

Results appear highly unlikely to be due to
chance, but can artificially result from transcript
correlation

We have worked out permutation and analytic
(matrix decomposition) methods to assess

This work is part of a larger effort to get a statistically valid
"snapshot" of eQTL data without the need for resampling.

Related efforts

nclude

(i) transforming transcript data to handle outliers, which can be a
problem for SNPs with low minor allele frequency

(ii) Principal component handling of stratification

-fl

-i

PC2 "

• ".V*"

PCI

PC-based stratification control
for eQTL analysis, mouse data

-Iog10(p) residual
regression

-Iog10(p) multiple regression
Bottom line - these issues matter in identifying eQTLs, and

therefore in elucidating genetic susceptibility

-------
Project 2:
Cheminfcrmattcs

AlexTropsha, Ph.D. (P.I.) - Quantitative Structure
Activity Relationship (QSAR) modeling, software tools
for chemical descriptor-based prediction
Hao Zhu, Ph.D. - QSAR modeling
Additional postdoctoral researchers, research faculty,
and students

Leverages effort in the Laboratory for Molecular
Modeling, School of Pharmacy, UNC

Project Objectives

•coordinates the compilation and mining of data from
relevant external databases

•performs analysis and methods development for building
statistically significant and externally predictive
Quantitative Structure-Activity Relationship models of
chemical toxicology data

•Performs collaborative work within the Center and with
EPA collaborators

• Recent activity highlighted here

Improved quantitative models of chemical
toxicity based on combined application of
chemical and biological molecular descriptors

1 Overall project vision: exploiting the entire
structure - in vitro - in vivo continuum
Predictive QSAR Modeling Workflow
1 Applications

— The use of hybrid chemical biological descriptors

— novel data partitioning approach based on in vitro — in
vivo correlations: Hierarchical QSAR modeling of
rodent toxicity

Quantitative
Structure
Property
Relationships

Predictive QSAR Workflow*

*Tropsha,A.,*Golbraikh,A. Predictive QSAR Modeling Workflow, Model
Applicability Domains, and Virtual Screening. Curr. Pharm. Des., 2007,13, 3494-3504.

-------
Application I. Using Full High-Throughput
Screening Dose Response Curves as Biological
Fingerprints of Organic Compounds in QSAR
Studies

*Zhu H, Rusyn I, Richard A, Tropsha A.* Use of cell viability assay data improves the prediction
accuracy of conventional quantitative structure-activity relationship models of animal
carcin ogen icity. Environ Health Perspect 2008;(116):506-513

Using HTS Dose Response Curve to Assist QSAR
Modeling of Carcinogenicity

• Three types of descriptors:

Chemical (300+); Biological (150+); Hybrid
(400+)

• CPDB carcinogenicity data: 328 unique organic
compounds with multi-cell carcinogenicity
calls, 189 actives and 139 inactives

Using HTS data for Carcinogenicity
Modeling

Prediction of the External Validation Set

| kNN-Dragon

| kNN-Hybrid |

Sensitivity 69% 66%

Specificity

46%

56%

CCR

¦ ¦%

62%

Coverage

72%

70%

Application II:

A Two-step Hierarchical QSAR
Modeling Workflow for Predicting in
vivo Chemical Toxicity*

-------
ZEBET Database* and Data Preparation

361

E
o

nds

291

E
o

nds

cytotoxicity IC50 and both rat
and/or mouse LD50

inorganics, mixtures and heavy
metal salts are removed

both in vitro IC50 values and

compo

jnas

Random split

*ZEBET database

230

provided by Dr. Ann

compounds

Richard (EPA)

modeling set

validation set

Data partitioning based on the moving

regression approach

• IC50 vs. rat LD50 values

a 3.

~ C1 Compounds
• C2 Compounds
— Linear Fit(C1) •

s 1

Q 1

100 a

D®1

(Q 0
0£

—' -2

LO9(1"C50)

Modeling Workflow

Prediction Workflow

Final prediction

Classification of the Rat LD50 Values for the
External Set of 23 Compounds

No with

Classification rate = 62% Classification rate = 78%

Pred.
CI

Pred.
C2

Pred.
CI

Pred.
C2

Exp.
CI

Exp.
C2

Prediction of the Rat LD50 Values of the
External 23 Compounds

• R2=0.79, MAE=037, Coverage=74% (17 out of 23)

-------
Future Studies

• Analyze models to identify significant assay-
chemical combinations that are predictive
of in vivo outcomes

• Explore the entire NTP dataset

• Apply model prospectively to prioritize new
compounds for focused toxicity testing.

Ivan Rusyn (co-PI):
toxicology, genomics

Leonard McMillan (co-PI):
computer science, GUI,
software engineering

Additional programmers
and students

Project Objectives

• Develop arid implement algorithms that streamline the analysis of
multi-dimensional data streams in dose-response assessment and
cross-species extrapolation

• Facilitate the development of a standard workflow for (i) analysis
of the -omics data, (ii) linkages to classical indicators of adverse
health effects, and (iii) integration with other types of biological
information such as genome sequences and genetic differences
between species

• Build web-based, open-source and user-friendly graphical
interfaces associated with interoperable computational tools for
data analysis that facilitate incorporation of new data streams into
basic research and decision-making pipelines (methods from
Projects 1 and 2)

•has created a framework for handling emerging -omics data on
genetic susceptibility in model organisms.

•provides programming expertise to create graphical tools that
are used by partners within the Center and in collaboration with
EPA personnel and other environmental scientists

•strengthens and advances the field of computational toxicology
through direct partnerships and the dissemination of tools used
by both bioinformatics and bench scientists.

Driving Biological problem:
Population-wide predictions from toxicity profiling

¦M

®--[i

-J

SK^SSTM

— i

-------
APPLICATIONS NOTE

Integration of existing and new tools into
a Predictive Toxicology Web Portal (ceccr.unc.edu)

compgen.unc.edu

C Computational frenetics

-------
The next year - Project 1

• Finish methodology for open projects and
collaboration

• Finish dose-response pathway analysis method

• ToxCast data analysis - bring to intermediate
conclusion

• ToxCast - go deeper, in terms of choices of endpoints,
sensitivity vs. specificity, domains of applicability

The next year - Project 2

•Continuing work on QSAR modeling of multiple
animal toxicity endpoints

•Developing novel QSAR methodology by using in
vitro biological information to model in vivo toxicity
endpoints

• QSTR modeling of nanotoxicology data.

•For all of these activities we on data collected under
the ToxCast, DSSTox, and other projects.

The next year - Project 3

•Continuing integration/support of tools from other
CEBC projects

•continued programming and algorithmic I

•improvements to algorithms in tools and applications

•development of specific data-mining algorithms for
genomic databases

•continued biology-driven research that generates
appropriate datasets for testing and implementing
novel computational and biostatistical approaches.

Center-wide

•Emphasis on training other scientists in tools
developed

•Bringing open source code and methods to new stage
in evolution

EXTRA

-------
Project 2 Acknowledgements

Principal Investigator

Alexander Tropsha

Research Professors

Clark Jeffries, Alexander
Golbraikh, Hao Zhu,
Simon Wang

Postdoctoral Fellows

Georgiy Abramochkin, Lin Ye,
Denis Faurches

Visiting Research Scientist
Aleks Sedykh

Adjunct Members

Weifan Zheng, Shubin Liu

- P20-HG003898

- R21GM076059

- R01-GM66940

- R0-GM068665

UNC: I. Rusyn, F. Wright, S. Gome
EPA: T. Martin, D. Young
A. Richard, R. Judson,

D. Dix, R. Kaviock

graduate Research Assistants

istopher Grulke, Nancy Baker,
fejn Wang, Hao Tang, Jui-Hua

sieh, Rima Hajjo, Tana rat
Kietsakorn, Tong Ying Wu, Liyina
Zhang, Melody Luo, Guiyu Zhao,
And rew Fant

Research Programmer

Theo Walker

System Administrator

Mihir Shah

Chemical Toxicity Prediction for Toxicogenomics Studies Using
an Example Dataset

Zhen Li, Fathi Elloumi, Fred A Wright

e of different classifiers

Chemicals 113 in total and single dose):

1,5-Naphthalenediamine, 2,3-
benzofuran, 4-Nitroanthranilic,N-[ 1-
naphthyijethyienediamine
dihydrochloride, benzene, coumarin,
pentaohloronitrobenzene, 2,2-
bis(bromomethyi)-1,3- propanediol,
1,2-dibromoethane, 2-
chlorometbyipyridine hydrochloride, N-
methyiolacrylamide, diazinon and
malathion.

Controls; corn oil, water, rodent chow

Age-Dependent Variability in Gene Expression tn Male Fischer 344
Rat Retina

B« u.* r-tal \ Wrlglt* aaljeyu

Needed method for quantifying changes in variability with high
statistical power

Project Objectives, cont.

• Provide an interdisciplinary computer science resource to the
environmental sciences and toxicology community

• Longer-term objectives include new software engineering
methods for better execution and maintenance of above, and
sharing and disseminating results

-------

Overview of Carolina Center for
Computational Toxicology
STAR Program

October 1, 2009

UNITED STATES ENVHWNMIMIM. PROTECTION AGCNCY

-SERA

The EPA's Task

RfD

EPA nomination held up amid debate
over formaldehyde risks

September 24, 2009

Environmental Protection Agency Administrator Lisa
Jackson visited Sen. David Vitter, R-La., in his
office Thursday to ask him to release his hold on the
nomination of Paul Anastas to be the EPA's assistant
administrator in charge of its Office of Research and
Development. Vitter wants the EPA to agree to have
the National Academy of Sciences review its
assessment of the risks posed by
formaldehyde, which is best known
Gulf Coast because of respiratory c<
lodged by people who lived in FEMA trailers with
elevated levels of formaldehyde.

:o folks in the
m plaints

MEDICAL MEMORANDA

Formalin Asthma in Hospital Staff

D. J- HEHDRJCK, D. )- LAKE

Few cuci uf ximyi ubtuuaka mribuuhk to safultd (a
liAyde fcs _

cbt {4CKK( of t«r buMt to
-------
vvEFW Carolina Computational Toxicology Center
STAR Program: Project 2

• Development of Fast and Efficient Toxicogenetic
Expression Quantitative Trait Loci (eQTL)
Mapping Tools

• Discovery of the chemical-induced regulatory
networks using the population-based toxicity
phenotyping in human cells

¦ Prioritization

¦ Mechanism of Action

¦ Dose-Response Modeling

¦ Susceptible Populations

Susceptible Populations
Mechanism of Action
Specific Objective 1: Development of Fast and Efficient Toxicogenetic
Expression Quantitative Trait Loci (eQTL) Mapping Tools

ORIGINAL PAPER '

OMwem**

FaslMap: Fast eQTL mapping hi homozygous populations

Oarttt U Gar*' Arctey A Shaba*f' 1muChang Un1. r, , I a *i ¦ ¦ ii

PFOfl°barb'tal

*=,EPA Carolina Computational Toxicology Center
STAR Program: Project 3

Develop rigorous end point toxicity predictors based on
QSAR modeling workflow and conventional chemical
descriptors

Develop novel computational toxico-genomic models
based on combined chemical and biological descriptors
through QSAR modeling workflow

Develop novel computational toxico-genetic models based
on combined genetic, chemical and toxicity descriptors
through QSAR-like modeling workflow

¦ Prioritization

• Mechanism of Action
> Dose-Response Modeling

¦ Susceptible Populations

| Prioritization

Compound prioritization using the ensemble of QSAR models

Alerts: further testing

-------
Summary

Carolina Center for Computational Toxicology is
developing promising new approaches to address
EPA CompTox research areas of:

-Prioritization
-Mechanism of Action
-Susceptible Populations

Can some of these methods be extended to help
understand dose-response relationships?

-------
The Texas-Indiana Virtual STAR center;

Data-Generating in vitro and in silico Models of Developmental
Toxicity in Embryonic Stem Cells and Zebrafish

Jan-Ake Gustafsson, Richard H. Finnell and James A. Glazier
University of Houston, Texas A&M, Indiana University

November 2009-October 2012

-------
Background

Birth defects

Birth defects affect about one in every 33 babies born in the United States each
year (3%) (6% worldwide). They are the leading cause of infant deaths,
accounting for more than 20% of all infant deaths. Babies born with birth defects
have a greater chance of illness and long term disability than babies without birth
defects.

Heart defects: 1 in every 100 to 200 babies

Neural tube defects: defects of the spine (spina bifida) and brain (anencephaly).
1 of 1,000 pregnancies (2.6/1000 worldwide)

Orofacial clefts: include cleft lip, cleft palate, and combined cleft lip and cleft
palate.

1 in 700 to 1,000 babies

Reasons

Genetic and environmental factors
Methyl mercury:

The birth defects are small head size,
cerebral palsy, developmental delay and/or
mental retardation, blindness, muscle
weakness, and seizures.

Knowledge gap!

-------
Research objective

New screening models for developmental toxicity

From Biological
Models of
Developmental
Toxicity to
Computer
Simulations

-------
Main research goals

1. Generate developmental models based on mouse embryonic
stem cells and zebrafish suitable for high-throughput screening.

2. Generate high-information-content models on development
and differentiation using mouse embryonic stem cells and
zebrafish.

3. Develop computational models for developmental toxicity with
the ultimate aims of first recreating normal development (in
wild-type) and then classifying possible mechanisms by which
chemical perturbations cause experimentally observed
developmental defects.

4. Perform proof-of-concept experiments of the in vitro and in
silico test platforms with a blind test of chemicals.

-------
Investigational Areas

Three Investigational Areas:

1. Zebrafish as a model to elucidate the morphological and
mechanistic effects of environmental pollutants.

PI Jan-Ake Gustafsson

2. The effects of environmental contaminants on mouse embryonic

stem cell differentiation.

PI Richard H. Finnell

3. Development of computer simulations facilitating assessment of
toxicity based on perturbed development in zebrafish and mouse
embryonic stem cells.

PI James A. Glazier

-------
Management

US-EPA

4 \

y „

TIVS board

1 representative from each IA
Main decisions

Center Director
Project Manager

Investigational Areas
1,2,and 3

Center Director/Project Manager

Operational management

Reporting

Fiscal responsibility

Quality Control Manager

Donald P. McDonell, Duke University

Advisory board

Advice and Evaluate

George Daston, Procter and Gamble

Nadine Peyrieras, CNRS, Paris

Helen Hakansson, Karolinska Institute^ Stockholm

MenghangXia, NCGC, NIH

Bart van der Burgh, ChemScreen (EC-funded project on ENV.2009.3.3.1.1)
STAR Center representatives

-------
Teaching and information

Courses

Three courses for PhD students
and post docs:

1. Zebrafish development

2. Embryonic stem cells

3. Computer simulations

Posted on our website
www.cnrcs.uh.edu/TIVS-Center

Information
Develop public web
Internal web

Meetings, workshops, newsletters

Collaboration with stakeholders and other projects

OECD, WHO, ChemTRUST
STAR Centers

Chemscreen, Cascade, Crescendo, Ceasar, Carcinogenomics, SafeFoods,
Rainbow, RA-Courses, TRISK

-------
Zebrafish as a model to elucidate the morphological and
mechanistic effects of environmental pollutants

Zebrafish, Danio rerio

•Small size, small test volumes

•Transparent embryos/fish

•External rapid embryonic development

•Hundreds of eggs weekly/pair

•Genome sequenced,

75% of genes have human homologues

•Conserved developmental processes and

signaling pathways

•Many mutants

•Morpholino knockdown

•Cost efficient

•Adaptable to medium to high through put screening

-------
Generation of screening models for teratogens

10 transgenic fish expressing fluorescent markers to follow development
and patterning.

Endpoints:

•Gastrulation and early embryonic cell movements
•Patterning of CNS and neurogenesis
•Hematopoiesis and angiogenesis

•Yolk utilization and morphological effects on somitogenesis

Morphology and GFP/RFP expression will be recorded

during normal development.

Is development changed by teratogenic chemicals?

Scale up and automate for high throughput screening

-------
Transgenic fish for screening

Gene

HTTA

Reporter
Status

Readout

Start time of expected
expression (hpf)

goosecoid

Early patterning, epiboly, early
cell movements and
developmental delay

RFP- to be made

Time of appearance/disappearance, Spatial
distribution of expression domain, intensity of
expression

3.5 hpf

dharma

Early patterning, epiboly, early
cell movements and
developmental delay

GFP-to be made

Time of appearance/disappearance, Spatial
distribution of expression domain, intensity of
expression

3.5 hpf

bmp2b

Patterning (anterior-posterior
symmetry), early cell movements

GFP-to be made

Total length of expression domain, Time of
appearance/disappearance, Spatial distribution
of expression domain, intensity of expression

1 cell stage 0 hpf
(maternal contributed)

wnt8

Patterning (anterior-posterior
symmetry), early cell movements

GFP-to be made

Total length of expression domain, Time of
appearance/disappearance, Spatial distribution
of expression domain, intensity of expression

1 cell stage 0 hpf
(maternal contributed)

bmp4

Patterning (left-right symmetry)

GFP-to be made

Total length of expression domain, Time of
appearance/disappearance, Spatial distribution
of expression domain, intensity of expression

10 hpf

ngnl

Neurogenesis, Axon guidance,
early, developmental delay

GFP/RFP-available

Time of expression, region of expression,
intensity, cell numbers, axonal length and
pathfiinding

10 hpf

flil

Angiogenesis and blood vessel
remodeling, heart morphology
and function

EGFP-available with
us

Time of expression, region of expression,
intensity, angiogenesis, blood flow, heart size,
rate of heart beat, number and size of trunk
vessels

11 hpf

flkl

Angiogenesis and blood vessel
remodeling, heart morphology
and function. Expressed in tip
cells.

GFP-available with us

Time of expression, region of expression,
intensity, angiogenesis, blood flow, heart size,
rate of heart beat, number and size of trunk
vessels

11 hpf

Unc5b

Blood vessel formation,
expressed in tip cells at the
forefront of arterial and venous
sprouts.

RFP-to be made

Time of expression, region of expression,
intensity, angiogenesis

9hpf

unc45b

Muscle development and
somitogenesis

GFP-available

Somite formation, somite size, time of
appearance, muscle formation, intensity,
spontaneous movements, time and region of
appearance

9hpf

-------
Generation of high-information-content models

•Somite formation
•Blood-vessel formation
•Axonal pathfinding

Map expression of
crucial factors
Adhesion factors
Repulsion factors

Immunostaining, In
situ hybridization

Knockdown
of crucial
factors

Morpholino
knockdown

Simulations in silico

-------
Test chemicals R*fK

developmental 11

malformations

37 CERCLA chemicals known
or expected to be teratogens
and associated with

17
24
45

Rank number indicates the 47

potential threat to human health ^
of these environmental 6i

pollutants as determined by 71

ATSDRandthe EPA. 80

Chemical Abstracts Service (CAS), 145
central nervous system (CNS),

Abbreviations:

147
176

gastrointestinal (Gl), 182

genitourinary (GU), 189

musculoskeletal (MS). 224

(The Comprehensive
Environmental Response, 250

240

241
244

264
271

Compensation, and Liability Act,

CERCLA) 272

274

SUBSTANCE NAME

CAS#

arsenic

007440-38-2

lead

007439-92-1

mercury

007439-97-6

vinyl chloride

000075-01-4

polychlorinated biphenyls

001336-36-3

benzene

000071-43-2

cadmium

007440-43-9

chloroform

000067-66-3

trichloroethylene

000079-01-6

dieldrin

000060-57-1

aldrin

000309-00-2

pentachlorophenol

000087-86-5

carbon tetrachloride

000056-23-5

nickel

007440-02-0

endosulfan

000115-29-7

methoxychlor

000072-43-5

toluene

000108-88-3

naphthalene

000091-20-3

methylene chloride

000075-09-2

hydrazine

000302-01-2

hexachlorobenzene

000118-74-1

2,4-dinitrotoluene

000121-14-2

parathion

000056-38-2

selenium

007782-49-2

carbon disulfide

000075-15-0

phenol

000108-95-2

carbon monoxide

000630-08-0

2,4-dichlorophenol

000120-83-2

arsenic trioxide

001327-53-3

dichlorvos

000062-73-7

sodium arsenite

007784-46-5

formaldehyde

000050-00-0

diuron

000330-54-1

methyl parathion

000298-00-0

styrene

000100-42-5

carbaryl

000063-25-2

acrylonitrile

000107-13-1

CNS Eye Heart GI GU MS

-------
Mouse embryonic stem cells as a model to elucidate the morphological
and mechanistic effects of environmental pollutants

House Mouse, Mus musculus

•Mouse genes (99%) have homologues in humans

•Relatively short gestational age

Mouse Embryonic Stem Cells
•Small size, small test volumes

•Conserved developmental processes and signaling pathw
•Mimic in vivo development
•Amendable to genetic manipulation
•Cost efficient

•Adaptable to medium to high through put screening

-------
Embryonic Stem Cell Differentiation

In the Beginning...

ES cells must be isolated and maintained or Esceiis

else...

ES cells differentiate into epiblast

Epiblast gives rise to
embryoid body &
germ layer cells

Germ layer cells
differentiate into specific cell
types

-------
Genetic Manipulation of Mouse ES Cells: Gene Trap

C57BI/6 Gene Trap Library

• Retrovirus inserts transgenic construct

• > 350,000 ES cell clones produced

• > 10,000 genes contain inserts Retroviral gene trapping vector

• ROSA P*geo gene trap vector (marker)

Wildtype Locus

\\\\.

Marker Fusion Transcript

-------
Selection and Generation of ES Based Screening Models

16 transgenic mouse ES cells expressing a reporter ((3-geo) thawed and
cultured:

Selected Genes:

Follow developmental and patterning processes.

Including:

Gastrulation and early embryonic cell movements
Patterning of CNS and neurogenesis
Hematopoiesis and angiogenesis

Expected Reults:

Documentation of morphology and (3-geo expression during:

•normal development Ij#

•teratogenic chemical exposure

Scale up and automate for high throughput screening

\ ''C

-------
Selected Transgenic (3-geo Mouse ES cells for Screening

Gene Name

Nodal Nodal

Wnt3 wingless-related MMTV integration site 3

Fgf4 fibroblast growth factor 4

Gsc Goosecoid

Cdhl cadherin 1 (E-cadherin)

PouSfl POU domain, class 5, transcription factor 1

Meoxl mesenchyme homeobox 1

Bmp4 bone morphogenetic protein 4

Mapt tau

Syn I synapsin I

ABCG2 ATP-binding cassette superfamily G member 2

r tyrosine kinase with immunoglobulin-like and

16 EGF-like domains 1

Pcaml platelet/endothelial cell adhesion molecule 1

GATA3 GATA binding protein 3

Mef2a Myocyte-Specific Enhancer Factor 2a

Myl2 myosin light chain 2V

Function/Expression

Interacts with type I receptor complexes: ALK4 and ALK7, and type II
receptors: activin receptor 2a or 2b

Wnt signaling ligand

FGF signaling ligand

homeodomain transcription factor, executer of cell migration during
gastrulation

calcium ion-dependent cell adhesion molecule in epithelial cells

regulation of pluripotency during normal development

homeobox gene expressed in mesoderm of primitive streak and somites

bone and cartilage development
neuronal microtubule associated protein

synaptic vesicle glycoprotein present in cells involved in synaptic
transmission

stem cell and hematopoietic stem cell marker

angiopoietin receptors and endothelial marker

cell adhesion molecule and endothelial marker
transcription factor in myocytes
transcription factor in myocytes

regulatory light chain associated with cardiac myosin beta

-------
Alternative Transgenic (3-geo Mouse ES cells for Screening

In the event that selected clones do not pass quality control, or are not responsive
to chemical insults, alternative gene/clones are also available, e.g. :

Gene

Smadl

Name

MAD homolog 1

Function/Expression

Proteins that modulate the activity of TGF |3 ligands

Prdml4

PR-domain containing protein 14

Functions in PGC specification

Spred
Zic family

Sprouty-related protein with an EVH1 domain Regulates Ras-ERK signaling pathway

Zinc finger protein of the cerebellum

Neural development

Zic2

Zic5

VEGF

Vascular endothelial growth factor

VEGF signaling ligand

Vegfb
Vegfc

Notch 1

Notch gene homolog 1

Functions in vascular remodeling during development

-------
Generation of High Throughput/Information Content
Models

Detection of transgenic ES cell (3-geo (lacZ) expression:

In Vivo (ImaGene Green, Invitrogen)

Imagene Green staining of ES cell-derived Imagene Green and propidium iodide
spontaneously contracting cardiac staining of in vitro endothelial differentiation

myocytes

• %

-------
Generation of High Throughput/Information Content
Models

Standardized
Embryoid Body
Production

Differentiatation & Detection r
of (3-geo expression

www.cmhd.ca/genetrap/database/search_expressionh
tml

Application of teratogen
or test chemical(s)

Simulations in silico

-------
Development of computer simulations facilitating assessment of toxicity based
on perturbed development in zebrafish and mouse embryonic stem cells

Multi-cell modeling provides a platform to go from molecule to cell behavior to development.

http://amazingphotos4all.blogspot.com/2009_03_01_archive.html

http://nomadlife.org/dna.jpg

http://www.stanford.edu/group/Urchin/LP/
[Lauren Palumbi]

lowestrogensymptoms.com

http://www.kvarkadabra.net/images/articles/Regeneracija-
organov_l_original.jpg

-------
Multi-cell Modeling as a Bridge from in vitro to Organ/Organism

Still a huge gap between level of molecular data and observed developmental
patterns.

Multi-cell Models separate two questions:

How do molecular processes drive cell phenomenology?

How does cell phenomenology drive tissue-level patterning?

Why useful?

Brute force (molecule->organism) computationally intractable.

Allows focus on key molecular pathways. And cell-cell interaction
mechanisms.

Most mammalian cells are fairly limited in their behaviors, simplifying model

construction.

Rapidly developing tools and standards.

-------
Data Inputs for Multi-cell Modeling

Organ/Organism level:

Qualitative selection of model developmental systems.

Quantitative study of normal and perturbed development of these.

Cell tracking (in vivo).

Expression mapping (in vivo and in vitro).

Identification of key ECM & extracellular signals (in vivo and in vitro).

Cell level:

Quantitative identification of key cell types.

Quantitative descriptions of their phenomenology in vivo and in vitro.
Molecular level:

Qualitative identification of key regulatory pathways (in vivo and in vitro).
Quantitative description these pathways and their perturbations (in vitro).

-------
Multi-Modeling Tools (I)—CompuCell3D

CompuCell3D (Indiana University, Bloomington)
Multi-Cell Modeling Environment

Open-Source, Multi-Platform Simulation Environment:
Simulations Based on Cell Behaviors
Simulation Specification in High-Level Language

(CC3DML, Python)

Fast Simulation Development
Reuse of Simulation Components
Connects to Systems Biology Workbench for
Pathway Modeling
http://www.compucell3d.org/

-------
Multi-Modeling Tools (II)—Systems Biology Workbench

Systems Biology Workbench fU. Washington, Seattle)
Reaction-Kinetics Modeling Environment

Systems Biology
Workbench

Open-Source, Multi-Platform Simulation Environment:
Simulations Based on Molecular Reactions
Simulation Specification in High-Level Language
Fast Simulation Development
Reuse of Simulation Components
Connects to CompuCell3D for Multi-Cell Modeling

http://www.sys-bio.org/

-------
Multi-Modeling Tools (III)—Cell Behavior Ontology/CBMSL

Cell Behavior Ontology/ Cell Behavior Model
Specification Language (Under Development)

Community-Oriented Language Development

Implementation-Independent Specification of Multi-
Cell Models

Improved Annotation of Microscopy Data for High-
Throughput Experiments and Model Generation
Unification of SBML and CC3DML

http://bioportal.bioontology.org/ontologies/39336

-------
Information Flow

In vitro Pathway identification and quantification

SBML Pathway Models

SBML Prediction Data Sets
SBML Validation Data Sets

f t_

In vitro Pathway perturbation studies

CC3D Cell Models <•

Library of Cell Types

Cell and Tissue Behavior Focus

In vivo Cell identification, fate mapping —

- In vivo/in vitro interaction identification

In vitro cell phenomenology quantification

Molecule and Cell Focus

Initial Conditions <•

^ CC3D Organogenesis Models -i

CC3D Prediction Data Sets

CC3D Validation Data Sets

)

In vivo Perturbation studies

-------
Existing CC3D Applications (I) Role of VE-Cadherin in Angiogenesis

Vasculogenesis

- The formation of early vascular
plexus from in situ differentiated
Endothelial Cells (ECs)

Mesodermal
precursor

Vasculogenesis

Angioblast

| Vasculogenesis

Endothelial

Sprouting
angiogenesis

-JUl/C.

W<&

1 Pruning
J I Remodelling

WP-y

Primary capillary
plexus

Non-sprouting

angiogenesis

(intussusception)

'Juvenile'
vascular system

Maturation
Remodelling

Mature

vascular system

Werner Risau, Nature 386, 671 - 674 (1997)

Angiogenesis

- The formation of new blood
vessels from pre-existing ones

• Sprouting Angiogenesis

• Non-sprouting Angiogenesis
(Intussusceptive angiogenesis)

In Vitro HUVEC Model

D. Ambrosi et al., Phys. Rev. Letters 90,118101

-------
Existing Applications (I) Role of VE-Cadherin in Angiogenesis

• VE-Cadherin (an adhesion molecule) clusters at adherens junctions between
endothelial cells and suppresses chemotaxis at cell-cell interfaces

Oh 7h 21h

Anti-VE-cadherin antibody inhibits de novo blood-vessel growth in mouse
allantois cultures. (Roeland M. H. Merks , Erica D. Perryn , Abbas Shirinifard, and James A.
Glazier, PLoS Computational Biology 2008)

-------
Existing Applications (I) Role of VE-Cadherin in Angiogenesis

Wild Type Simulation

VE-Cadherin Knockout Simulation

-------
Existing Applications (II) Role of N-Cadherin n Somitogenesis

ing somite

r cells more anterior

Younger cells
more posterior

-------
Existing Applications (II) Role of N-Cadherin n Somitogenesis

FGF Level

Posterior

90 Min (-100 mm)

Adhesion
Strength

Presumptive

Somite

Boundary

Expression
Strength

600

rJL

Lfng

" Axln2

Dusp6

700 800 900

time (min)

1000

(Lewis et al. 2003)
(Goldbeter & Pourquie 2008)

0-*-Hes1 Lfng -*~Nrarp-+ -Nkd1 |5j-»-Spry2 Efna1-*-Hspg2+Egr1j^-*-Axin2 Dactl *-Myc Has2
-«-Hes5 Nrara-»-Hev1 —Bci9i W Husnfi ™ Dkk1 SD5-^Tnfrsf19—Phlda

Hes5 Nrarp-»=Hey1 —Bcl9l Dusp6 Bcl2l11-*-Shp2 w Dkk1 Sp5-*-Tnfrsf19—Phldal
Dequeant et al. 2006 (microarray time series of mRNA in mouse)

-------
Existing Applications (II) Role of N-Cadherin n Somitogenesis

V = (cPSM)

V = (NCAM.Ephff)
x„ = (NCAM,Ephl,)

= ( Ncadherin, SphL)
= (NCAM,ephrin H)

= (NCAM, ephrin z)
Tqj, = (Ncadherin,ephrin z)
= (Ncadherin,ephrin £)
= (Ncadherin)
r^ = (SCikO

N-cadherin knockout

r § s

-------
Multi-Cell Modeling as a Predictive Tool

Multi-cell modeling in CompuCell3D+SBW will integrate molecular,
cellular and whole-organ level data to predict developmental effects of
pathway disruption.

Allows construction of standard libraries for reuse of information.

Lack quantitative experimental data to build/validate simulations:

— Cell Tracking

— Mechanics

— Pathways

— Interactions

— Morphology

TIVS will provide these data.

-------
Collaborations:

Texas-Indiana Virtual STAR Center

Thomas B. Knudsen, PhD

National Center for Computational Toxicology

UNITED STATES PNVIROMMTNTAt PROTECTION AGENCY

Disclaimer: views are those of the presenter and do not necessarily
reflect Agency policy

vvEPA Developmental Toxicity
RFA-EPA-G2008-STAR-W:

Computational Toxicology Research Centers: in vitro
and in siiico models of developmental toxicity pathways

exposures that perturb biological events during
formative stages of the reproductive cycle affecting:

embryo and fetal development
postnatal development
fertility and reproduction
children's health

Some key research issues ...

1. TIMING: morphogenesis and differentiation require
precisely timed genetic signals and responses

2. SENSITIVITY: metabolic and regulatory pathways are
prone to genetic errors and environmental disruptions

3. COMPLEXITY: simple lesions propagated to complex
phenotypes or complex lesions -> simple phenotypes

4. MATERNAL FACTORS: impact of maternal exposure
biology during prenatal and lactational stages

S-EFVX Cellular dynamics

Zebrafish tracked with H2B-EGFP
by DSLM at 90s intervals over 18h

Tocology Source: Keller et al. (2008) Science 322:1065-69

v>ERA

Fundamental processes

Core developmental processes

~ patterning (sets up future events)
~timing (clocks and oscillators)

~ differentiation (cell diversification)

~ morphogenesis (tissue organization)

Cellular primitives

• growth (proliferation)

• death (apoptosis)

• differentiation (function)

• adhesion (DAH)

• shape (geometry)

• motility (cell migration)

• ECM (remodeling)

Morphogenetic movements

:fe°S

• convergent extension

• branching morphogenesis

• cell condensation

• cell sorting

• trans-differentiation

• cavitation

• involution

• fractional forces

i n 1111 ij^rnrn

nrres ^

After: Bard (2005) JAnat 206:1 - 16

~EPA TIVS Project 1: zebrafish development

Zebrafish as a model to elucidate the morphological and
mechanistic effects of environmental pollutants

Padilla (EPA): pathways linking to
developmental & neurodevelopmental
endpoints

data sharing: same compounds to
confirm (+)ves and (-)ves across labs
and fish strains

resource sharing: reporter fish lines,
existing (vegF) and new (STAR), for
functional analysis of specific pathways

¦"

zrru

TJ-,

vr~r

—

-------
SERA

TIVS project 2: Embryonic Stem Cells

The effects of environmental contaminants on mouse

embryonic stem cell differentiation

Hunter (EPA): pathways that control cell

signaling and specify cell fate

data sharing: same compounds to
confirm (+)ves and (-)ves across labs
and ES lines (human, murine)

resource sharing: TIGM gene trap
resources for functional analysis of

specific pathways; genomic profiling

rse,

Profiling DevTox target-pathways

SOURCE: T Knudsen, NCCT SOURCE: H Morten sen, NCCT

vvEPA TIVS project 3: Agent-based models

Development of computer
simulations facilitating
assessment of toxicity
based on perturbed
development in zebrafish
and mouse embryonic stem
cells

a In silico model, CompuCell3D software
^ SOURCE: Glazier et al. (2008) Cur Top Dev Biol 81:205

Hes1 -EG FP time-lapse (3h) clock-wa vefront
SOURCE: Masamizu et al. (2006) PNAS USA 103:1313-18

Q CPA

Opportunities for collaboration

testing chemicals using developmentally-competent in
vitro assays (ES cell and ZF embryos) and targets

use predictive associations from ToxCast™ HTS data to
build hypotheses about mechanisms of action

studies to generate data testing hypotheses and
improving predictive models

improve virtual tissue models to a level that can help
prioritize chemicals for quantitative risk assessment

\>EPA Toward a Virtual Embryo

v-tpryo

information from
literature mining

predictions from
machine learning

epidemiology and
exposure monitoring

knowledgebase simulation

development engine

(VT-KB) (VT-SE)

-------
GfiS

An introduction to

Chem Screen

Bart van der Burg

ChemSereen

Outline

• What is ChemScreen?

• Background

• Approach

Chem creen

Chemical substance in vitro/in silico screening system
to predict human- and ecotoxicological effects

• EU framework program 7 (FP7)

• Collaborative project

• 9 partners from 5 countries

• Not yet started

• 1 Month after signing of the contract: December 1
2009?

• 4 years program, with majority of practical work in the
first three years

Chem creen

Background

¦ very little/no
~ tested

100,106 chemicals on market in 1981 ("existing substances");
1% tested on hazardous properties

EU White Paper: Strategy for a future Chemicals Policy, 2001

Chem Screen Background

Most of 100,000 chemicals on market largely untested: REACH

• Registration Evaluation uthorisation of Chemicals
program to catch up

• Start: June 1,2007

ChemScreen Background

General features REACH

•

Supply chain to provide data

•

Shift responsibilities from authorities towards industry

•

Registration all compounds >1 ton/year

— At central European Chemicals Agency (ECHA)

— Data sharing obligatory (One Substance One Registration:OSOR)

— Substance Information Exchange Forum (SIEF)

•

Evaluation dossiers by ECHA/public authorities

— May request additional data, with animal testing to the absolute

minimum

•

Authorisation required for harmful compounds taking into

account risk, benefits, alternatives, etc.

-------
GhemScreen Background

Which prioritized effects in REA CH?

• CMRs: Carcinogenic, mutagenic or toxic to
reproduction

• PBTs: Persistent, bio-accumulative and toxic

• vPvBs: Very persistent, very bio-accumulative

Ch em Screen Background

How many chemicals?

All chemicals >1 tons per year: -30,000

Chem Screen Background

Estimated costs REACH:

• Costs: 2.8 - 5.2 bn € (EU) (Hartung 2009: x6)

• Carcinogenicity, Mutagenicity and Reproductive toxicity
(CMR): ca 90% costs

Estimated benefit:

• Health improvement: 50 bn € (EU)

Chem Scree#?

Background

Phases

Registration:

Pre-registration:

June 2008

Higher risk (e.g. CMR):

December 2010

> 1000 tons (HPV):

December 2010

Remaining >100 tons:

June 2013

Remaining >1 tons:

June 2018

Evaluation:

Completed:

June 2022

Majority of testing 2011-2017

GhemScreen Background

When traditional animal tests are used progress of REACH will be
seriously hampered by:

1. Ethics: resistance to the excessive use of animals.

2. Costs: particular those linked to labour intensive animal testing

3. Capacity: lack of capacity to carry out these tests.

4. Speed: the use of the same traditional methods will not allow
major advances in speed of the process to be made

>ln order to be successful cost-effective, rapid in vitro tests need
to be adopted

Chem Screen Background

Incentives use of alternative (non-vertebrate) tests in REACH:

• Agency (ECHA) will publish test proposals (by chemical
manufacturers) and invites third parties to submit alternative
proposal

• Explicit allowance for alternative to in vivo tests, including in
vitro and non-testing methods (QSAR, grouping, exposure,read
across)

• Accepts "suitable methods"

• Regular reporting by Agency and Commission on use of
alternative methods

-------
GhemScreen Background

Why reprotox?

• Prioritised in REACH

• Reproductive toxicity is important to assess both human
and environmental toxicity

• Uses the most animals in toxicity testing

• Unfortunately, there are very few alternative methods

Ch em Screen Approach

Our approach:

• Identify sensitive parameters for reproductive toxicity

• Identify critical mechanisms involved in perturbation
of these parameters

• Build high throughput system using this modules

• Expand step-wise

• Integrate with bioinformatics/data interpretation

• Build integrated testing strategies, including non-
testing methods

Chem Screen Approach

Work packages

1 .Establish in silico prescreening and toxicity prediction methods prioritizing in vitro
toxicity testing (WP1, leading partner; DTU)

2. Establish a database and an in silico prescreen to identify potential reproductive
toxicants (WP2, FhG)

3. Establishment of sensitive parameters and a medium throughput 'minimal essential'
in vitro assay panel (WP3, RIVM)

4.Establish a high throughput mechanistic pathway screen, for reproductive toxicants
(WP4, EKUT)

5.lntegrative methods to predict in vivo reprotoxicity allowing informed decisions on
prioritization for eventual further testing (WP5, TNO)

6.lntegration into one user-friendly tool (WP6, P&GEN)

7.Dissemination (WP7, BDS)

ChemScreen

Approach

Light signal
proportional
to amount of biological

active chemical in
v sample ,

JJJ

LUCIFERASE protein

* LUCIFERASE mRNA

[Receptor binding elements LUCIFERASE

BIOLOGICAL EFFECT

ENDOGENOUS GENE

ChemSeceen

Approach

B »

¦ ¦ /

¦ ¦

¦ " y

° m

a> -1-
¦Q

-XV

c £

/ ,

-4.

/ pnrn

b-0.0001

ERa CALUX

-3 -2 -1 0

AR CALUX

— In vitro

Sonneveld et al.2006 Toxicol. Sci., 89:173-87

-------
Chem Screen Approach

Overestimation

Major Means of Discovery

power animal data:

Physfclan In

poor predictions

Rubella

Hydantoins 1963

—-

+
+

Etretinate 1984

+ =strong

Coumarin 1968

nuclear

Alcohol 1967

receptor

Diethylstillbesterol + +

ligands

oUS

Brent (2004)

p g

C hem Scree/? Approach

Screening systems

• Panel (15-50) reporter gene assays in human cells (nuclear
receptors, dioxin receptor, signaling/stress/developmental
pathways)

• Reporter gene assays in mouse ES cells (ReProGlow;
developmental pathways)

• Wildtype ES/transcriptomics

• Metabolising cell systems

• Zebrafish/transcriptomics

• Others for critical endpoints reprotoxicity (e.g. spermatogenesis)

Chem Screen Approach

In silico tools

• Exposure module

• Toxicity screening tool (>70 QSARs)

• In vivo reprotoxicity database (FeDTex, RepDose)

• Automated decision tool

Chem Screen Partners

BioDetection Systems (BDS)

Bart van der Burg

Netherlands

Fraunhofer Institute for Toxicology and
Experimental Medicine (FhG)

Inge Mangelsdorf

Germany

Netherlands Organization for Applied
Scientific Research (TNO)

Dinant Kroese

Netherlands

Simpple (SIM)

Eduard Paune

Spain

National Institute for Public Health and the
Environment (RIVM)

Aided Piersma

Netherlands

Danish Technical University Food Institute
(DTU)

Jay Niemala

Denmark

Procter & Gamble Eurocor (P&GEN)

Joanna Jaworska

Belgium

Eberhard Karls University of Tubingen (EKUT)

Michael Schwarz

Germany

University of Konstanz (UKON)

Daniel Dietrich

Germany

Chem Screen Results

• Sorry, no results yet!

-------
Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop

U.S. Environmental Protection Agency
Office of Research and Development
National Center for Environmental Research
Computational Toxicology Centers Science To Achieve Results (STAR)

Progress Review Workshop

U.S. Environmental Protection Agency
Research Triangle Park, NC

October 1, 2009

MEETING SUMMARY

Overview

The U.S. Environmental Protection Agency (EPA) Office of Research and Development's (ORD)
National Center for Environmental Research (NCER) Computational Toxicology Centers Science To
Achieve Results (STAR) Progress Review was held October 1, 2009, in Research Triangle Park, North
Carolina. The workshop was sponsored by ORD's NCER. Scientists from academia, government, and
nongovernmental organizations assembled to discuss recent computational toxicology research and plan
for future needs. The meeting provided an opportunity for grantees in the EPA-funded STAR Program to
present their research and interact with EPA staff and others conducting computational toxicology
research. Approximately 60 individuals attended the meeting.

Welcome, Introduction, and Review of Meeting Goals

Deborah Segal, EPA, ORD, NCER; and Robert Kavlock, EPA, ORD, National Center for
Computational Toxicology (NCCT)

Ms. Deborah Segal explained that ORD provides leadership in science and conducts the majority of
EPA's research and development. NCER is ORD's extramural research arm, with a research budget of
$440 million, of which $65.5 million is allocated for competitive extramural grants and fellowships, such
as the STAR, Small Business Innovation Research (SBIR), and Greater Research Opportunities (GRO)
Programs. ORD works with other EPA offices to select research topics for the STAR Program, which was
established in 1995 as part of a reorganization of ORD. STAR aims to include the country's universities
and nonprofit centers in EPA's research program to ensure the highest quality science in areas of highest
risk and greatest importance to the Agency. STAR issues approximately 25 Requests for Applications
(RFAs) and awards approximately $65 to $100 million annually.

The STAR Research Program in Computational Toxicology aims to integrate computational methods and
advanced molecular biology techniques and develop the use of computational approaches to provide tools
for quantitative risk assessment and more efficient strategies for prioritizing chemicals for screening and
testing. Five RFAs have been issued under this program. A new RFA is in development for Fiscal Year
2010.

Dr. Robert Kavlock noted that the grand challenge is predicting human toxicity, moving from exposure
conditions to impacts on molecular targets that result in cell changes and ultimately in toxicity to the
organism. Tools that allow scientists to interrogate different levels of this biological complexity now are
being released. These range from high-throughput screening biochemical assays to cell-based assays to
modeling systems. The STAR Center researchers presenting at this progress review are actively involved
in various phases of this work.

The Office of Research and Development's National Center for Environmental Research 1

-------
Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop

A variety of reports guide the Computational Toxicology Research Program (CTRP), including the
National Academy of Sciences 2007 report, Toxicity Testing in the 21st Century: A Vision and a Strategy.
Other reports that have informed the Program in terms of the challenges of the current testing paradigm
and the opportunities available to use innovative technologies to address these important issues include
Applications of Toxicogenomic Technologies to Predictive Toxicology and Risk Assessment; Phthalates
and Cumulative Risk Assessment: The Task Ahead; and Science and Decisions: Advancing Risk
Assessment. Toxicity Testing in the 21st Century: A Vision and a Strategy discusses biological processes
and the changes caused by exposure. At lower doses, cellular changes begin to manifest, but there still is
an adaptive response. At higher doses, the result can be cell injury and morbidity and mortality.
Understanding and developing assays for signaling systems involved in the induction of toxicities will
help researchers to better understand toxicity.

The CTRP's mission is to integrate modern computing and information technology with molecular
biology to improve Agency prioritization of data requirements and risk assessment of chemicals. The
Program provides decision-support tools for high-throughput screening, risk assessment, and risk
management and is committed to transparency and public release of all data. The Program operates under
tight deadlines, initially given 5 years to prove that this type of approach is effective. The recently
completed Board of Scientific Counselors (BOSC) review recommends that the Program be renewed for
an additional 5 years.

The Program supports EPA's strategic plan by focusing on its goals of identifying and screening toxicity
pathways, conducting toxicity-based risk assessment, and providing the information to EPA's regulatory
arm. EPA Administrator Lisa Jackson's priorities include managing chemical risks; she has stressed the
importance of assessing and managing risks of chemicals in consumer products, the workplace, and the
environment as well as the importance of protecting vulnerable subpopulations. The Essential Principles
for Reform of Chemicals Management Legislation includes the review of chemicals against safety
standards based on sound science, reflecting the risk-based criteria protective of human health and the
environment. An initial list of chemicals that EPA is considering for action plan development under these
principles includes bisphenol A, perfluorinated chemicals, and phthalates.

Computational toxicology research is conducted via the NCCT, ORD projects, and the Computational
Toxicology STAR Centers. The STAR Centers are housed at the New Jersey Environmental Bio-
informatics and Computational Toxicology Center, Carolina Environmental Bioinformatics Research
Center, Carolina Center for Computational Toxicology, and Texas-Indiana Virtual STAR Center.
Implications for success include additional closing of the toxicological information gap, providing mode
of action information to risk assessment, more effectively using animal and human resources related to
the evaluation of hazard and risk, and performing ancillary applications related to mixtures, chirals, nano-
materials, green chemistry, and lot variations. This meeting will provide an opportunity for introductions,
reflections on the work accomplished to date, integration of the work, and discussion of next steps.

Carolina Center for Computational Toxicology
Ivan Rusyn, University of North Carolina

Computational toxicology is a synthesis of chemistry, high-throughput screening, in vivo data, and
molecular pathways to generate new knowledge. With increasing amounts of data becoming available,
risk assessors now are better able to understand the risks to human health and the environment. As it is an
interdisciplinary science, computational toxicology represents a tremendous opportunity for incorporating
other disciplines into traditional toxicology research and for training new researchers. Researchers need to
recognize that this should not be simply an academic exercise; it is very important that the value and the
early results of computational toxicology research be communicated to the general public, industry, and
other stakeholders.

The Office of Research and Development's National Center for Environmental Research 2

-------
Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop

The Carolina Center for Computational Toxicology consists of an administrative core and three research
projects and is directed by an internal steering committee assisted by an external advisory board. The
administrative core serves a number of functions, including management, integration, public
outreach/translation, and quality control. Project 1 is focused on predictive modeling of chemical-
perturbed regulatory networks in systems toxicology. Objectives of this project include: developing and
applying data-driven methods for the inference and high-level modeling of regulatory network response
to chemical perturbation, developing mechanistic models of nuclear receptor function, and integrating and
deploying high- and low-level modeling tools. Interactions with EPA have been centered on exploring
toxicity pathways, extending and integrating mechanistic metabolism and other models, and working with
ToxCast™ data. For inference and modeling of biological networks, short-term goals include developing
tools for data analysis and interpretation and helping to establish the biological-chemical context in high-
throughput screening assay datasets. Long-term goals include developing components to systems
(simplistic wiring); developing a framework for understanding systems' properties, pathways, and cross-
talk; and providing a basis for mechanistic models. The first major challenge of this project involves the
integration of different types of data, from genome data to phenotype data. The individual data streams
are not well-defined, and the network context can be viewed in a number of different ways. A software
package that will stratify data for subgraph mining to study various pathways is under development; this
is an innovative approach, as it can define composite assays that will be more predictive than individual
assays. Also under development is a mechanistic model of cellular metabolism that will predict changes
in metabolic flux.

Project 2 is focused on toxicogenetic modeling: population-wide predictions from toxicity profiling. This
project is exploring the promises and challenges of incorporating the knowledge of interindividual genetic
variability as an important dimension of toxicity testing. Objectives of the project include developing
toxicogenetic expression quantitative trait loci (eQTL) mapping tools; performing transcription factor
network inference and integrative pathway assessment; performing toxicogenetic modeling of liver
toxicity in cultured mouse hepatocytes; and discovering chemical-induced regulatory networks using
population-based toxicity phenotyping in human cells. Interactions with EPA have included developing
and testing novel in vitro tools that will enable testing for interindividual susceptibility, developing
statistical methodology and computational tools capable of processing higher order multidimensional
data, and working on future ToxCast™ efforts and current Tox21 datasets. This project is combining
multiple streams of data and adding a level of genetic variability. One basic idea for combining genetic
diversity and biology is through eQTL mapping. The challenge, however, is determining true genetic
susceptibility and doing so in a timely fashion. This project also aims to understand whether the type of
mapping used can determine how genetic polymorphisms can control the molecular pathways perturbed
by environmental exposures. Another aim is to understand genomic context for expression.

Project 3 is focused on the development of validated and predictive quantitative structure-toxicity
relationship models that employ chemical and biological descriptors of molecular structures and take into
account genetic diversity among individuals. Objectives of the project are to develop rigorous endpoint
toxicity predictors based on the quantitative structure-activity relationship (QSAR) modeling workflow
and conventional chemical descriptors, develop novel computational models based on combined chemical
and biological descriptors through QSAR modeling workflow, and develop novel computational
toxicogenetic models based on combined genetic, chemical, and toxicity descriptors through QSAR-like
modeling workflow. Interactions with EPA have focused on integrating chemical descriptors into the
Distributed Structure-Searchable Toxicity (DSSTox) Database Network, ToxCast™, Toxicity Reference
Database (ToxRefDB), and Aggregated Computational Toxicology Resource (commonly known as
ACToR) data analysis. This project integrates chemical descriptors and high-throughput screening
biological descriptors with the QSAR modeling paradigms to predict animal in vivo endpoints and,
hopefully, human disease endpoints. This work has shown that a focus on accurate prediction of external
datasets is much more critical than accurate fitting of existing data. Also, cheminformatics, high-

The Office of Research and Development's National Center for Environmental Research 3

-------
Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop

throughput screening, nor omics data alone is sufficient to achieve the desired accuracy of the endpoint
property prediction.

In the first year of the Center's operation, 12 research papers have been produced and are in various
stages of the publication process. Project 1 short-term goals for Year 2 are to continue in-depth analysis of
ToxCast™ Phase I data, further refining the methods for integration across data types, investigate the
applicability of the metabolism model as a tool for the prediction of the effects of chemical perturbation
of metabolic pathways, integrate the eQTL analyses/approaches with the network-focused methodologies,
and establish the pathway-based biological network context for QSAR. Project 2 short-term goals for
Year 2 are to continue development of FastMap software; construct transcription regulation networks in
the Bayesian framework by combining eQTLs, nucleosome occupancy, and transcriptional regulation
data; complete characterization of the mouse hepatocyte cultures and perform experiments with key
toxicants; and complete genome-wide association studies of the HapMap lymphoblast cell viability and
apoptosis data and correlate the toxicity endpoints with basal gene expression profiles. Project 3 short-
term goals are to complete the analysis of the ToxCast™ data, continue to explore other datasets that
provide both in vivo and in vitro data for chemicals, and build models that could be used by EPA to
prioritize the selection of ToxCast™ Phase II compounds.

Dr. Kavlock asked whether the researchers had identified gaps in pathway coverage for which new assays
are needed. Dr. Rusyn responded that for Project 1, the focus is on current ToxCast™ assays, whereas
Project 2 is searching for the genes and pathways that are most susceptible to interindividual variability;
after those genes and pathways are identified, the next step will be to consider the assays needed.

Dr. David Dix referred to Project 2, asking if there was value in focusing on more specific molecular
endpoints. He asked Dr. Rusyn for his thoughts on moving this type of approach forward. Dr. Rusyn
stated that some of the Center's work has involved taking a leap of faith and moving forward with the
most commonly used assays; he would like to complete this analysis before determining the next steps.

A participant noted that dose-response information for individual assays was missing and asked whether
the researchers had considered using a composite dose-response. Dr. Rusyn replied that the current binary
classification does not necessarily take into account all of the dose-response information. Dose responses
differ between different datasets, making it difficult to align the information. The Center is testing a
number of different approaches to determine the meaning of the dose-response information. Dr. Rusyn
welcomed suggestions on the best features of dose-response to study.

Collaborative Work With EPA
Richard Judson, EPA, ORD, NCCT

EPA studies individual chemicals and determines maximum safe doses for human exposure. The Tox21
Priority List includes 19,000 chemicals, and there is an enormous data gap for many of these chemicals,
so it is imperative that the testing be prioritized and performed in a timely manner. Priority areas for
research methodology and development include prioritization, mechanism of action determination, dose-
response modeling, and susceptible populations.

The Carolina Center for Computational Toxicology's Project 1 is developing and applying data-driven
methods for the inference and high-level modeling of regulatory network response to chemical
perturbation, developing mechanistic models of nuclear receptor function, and developing methods for
integrating and deploying high- and low-level modeling tools. An important issue for NCCT has been
selection of assays to be developed for ToxCast™ and Tox21. The Carolina Center's work will help EPA
with this task. Project 2 is developing fast and efficient toxicogenetic eQTL mapping tools and working to
better understand chemical-induced regulatory networks using population-based toxicity phenotyping in
human cells. The Carolina Center is in the early stages of this work. Project 3 is developing rigorous
endpoint toxicity predictors based on QSAR modeling workflow using conventional chemical descriptors.

The Office of Research and Development's National Center for Environmental Research 4

-------
Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop

In addition, the Center is developing novel computational toxicogenomic models based on combined
chemical and biological descriptors. This project is addressing mechanism of action and should help EPA
to prioritize chemicals for further study. In summary, the Carolina Center is developing promising new
approaches to address EPA computational toxicology research areas of prioritization, mechanism of
action determination, and susceptible population study methodology. The question is whether some of
these methods can be extended to help understand dose-response relationships.

New Jersey Environmental Bioinformatics and Computational Toxicology Center

Panos Georgopoulos and William Welsh, University of Medicine and Dentistry of New Jersey

The objectives of the New Jersey Environmental Bioinformatics and Computational Toxicology Center
are to address the toxicant source-to-outcome continuum through the development of an integrated
modular computational framework, develop predictive cheminformatics tools for hazard identification
and toxicant characterization, and demonstrate the above tools through applications in quantitative risk
assessment. The Center takes a computational/engineering/systems perspective, utilizing a team of
computational scientists and engineers with diverse backgrounds in bioinformatics, cheminformatics, and
enviroinformatics. New frameworks and tools build on an extensive base of past developments. This
research effort emphasizes interaction and collaboration among participating scientists in the STAR
Bioinformatics Centers and with EPA centers and laboratories and other centers and institutes of
excellence. The research is divided into two major areas. Investigational Area I focuses on a source-to-
outcome framework to support risk characterization, and Investigational Area II focuses on hazard
identification. There are three projects under Investigational Area I. The first project involves multiscale
biologically based modeling of exposure-to-dose-to-response processes, the second project involves
hepatocyte metabolism modeling for xenobiotics, and the third project focuses on tools for optimal
identification of biological networks. Under Investigational Area II, a fourth project develops
cheminformatics tools for toxicant characterization, and a fifth project develops optimization tools for in
silico proteomics. The Center's research integration plan is consistent with the 2007 NAS report, Toxicity
Testing in the 21st Century: A Vision and a Strategy. The Center pursues an integrative multiscale
research approach—from molecules to cells to tissues to organs to organisms to populations—recognizing
the importance of processes/signals at all levels of biological organization. Additionally, the Center's
close interaction with EPA has resulted in several publications.

Dr. Georgopoulos described Investigational Area I in further depth, noting that computational toxicology
emphasizes chemicals, pathways, and toxicity, but it also must inform the science of risk assessment. In
addition to biology, risk also depends on the environment, behavior, and time (development and aging).
A general mathematical framework for environmental health risk analysis must consider multiscale
bionetwork dynamics (spanning the genome, transcriptome, proteome, metabolome, cytome, and
physiome) linked with the dynamics of environmental stressor networks in food, air, water, and soil. The
Center has studied how these networks are coupled with the regulatory and metabolic bionetworks using
complex, multiscale modeling. Dr. Georgopoulos displayed a graphic depicting the sequence from
source/stressor formation to dose to toxicokinetic effects to modifications of the environmental agent by
the organism to biological effects to health outcomes. This includes a key element that is missing from
most representations of source-to-effect continuum approaches; this element allows using biological data
and biomarkers to evaluate assessments of exposure, locate source contributions, and perform account-
ability studies. Thus, a general Bayesian framework is being developed to reconstruct exposure from
inversion of biomarker data for individuals and populations.

The Modeling ENvironment for TOtal Risk Studies (MENTOR) employs an anthropocentric (person-
oriented) approach, linking multiple scales of macroenvironmental and local models and information with
microenvironmental conditions and human activities in time/space. It has been applied to study exposures
to a wide variety of contaminants in different media (e.g., metals, dioxins and polychlorinated biphenyls,
air toxics), selecting in particular arsenic and trichloroethylene (TCE) as "model contaminants" for

The Office of Research and Development's National Center for Environmental Research 5

-------
Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop

comprehensive source-to-dose-to-response studies. These studies showed close agreements of predictions
with measurements of population biomarkers. The Center is working to further refine the MENTOR
system and integrate it with the Dose-Response Information Analysis (DORIAN) system.

MENTOR with Physiologically Based Pharmacokinetic Modules for Populations (MENTOR-3P)
combined with the DORIAN system provides a new modular "whole body" platform for consistent
characterization of multicontaminant, toxicokinetic, and toxicodynamic processes in individuals and
populations. This approach incorporates physiology databases to account for intra- and interindividual
variation and variability. Major ongoing research efforts of MENTOR/DORIAN focus on a library of
software modules for "virtual organs" (with primary focus on the liver) that account for heterogeneities
(in metabolism and biological response) within an organ. One case study focused on the spectrum of
cytochrome P450 induction by dioxin within the liver and was able to account for and explain observed
biochemical variability. Research in progress is using arsenic and TCE as model contaminants and aims
to reconcile the biotransformation and transport at both the individual hepatocyte and the whole-organ
scales, as well as on modeling quantitative metrics of oxidative stress resulting from exposure to these
contaminants. The computational models are being used in collaborations with scientists from EPA to
study issues of sensitivity analyses and effects of aging and assess population exposures from biomarkers.

Dr. Welsh further described Investigational Research Area II, noting that in any multiscale enterprise,
molecular scale must be addressed, for which there are three different approaches. Receptor-based
approaches study the protein structure of a receptor associated with a pathway or some aspect of a
toxicological event. Ligand-based approaches seek to gather data about the ligands to determine
commonalities among the ligands that give rise to a certain biological effects. The third approach is
virtual screening.

Receptor-based approaches figure prominently in computational toxicology. Pregnane X receptor (PXR)
is a hepatic nuclear receptor that is responsible, along with other nuclear receptors and proteins, for
modulating a number of metabolic enzymes and more than 36 other genes. PXR ligands are pervasive and
structurally diverse. They come from dietary products and supplements, hormones, prescription drugs,
herbal components, and environmental chemicals. Thus, humans are exposed to PXR ligands constantly.
Published experimental data show that when certain conazoles bind to PXR, they turn off the
transcriptional machinery. Based on this observation, the researchers performed computational docking
studies that show that the conazoles do not competitively bind with the agonist site but instead appear to
bind on an outer surface. This is an important finding that can inform the development of new hypotheses.

Analysis of the ToxCast™ 309 dataset helped the researchers to develop and adapt various new
computational models for data analysis. Traditional QSAR techniques use the structure-based features
(molecular descriptors) of a collection of chemicals to describe and compare their biological activities.
Biological spectra analysis is a new technique that uses the biological response profiles of the chemicals
to describe and compare their molecular structures. Panels of chemicals and protein receptors were
assayed and the numerical values depicted as heat intensity bars. Chemicals were clustered based on
similar abilities to induce a biological response across all of the proteins. Proteins were clustered based on
similarities in their bioresponse profiles. Ultimately, cross-mapping of the toxicological and chemical
similarity profiles showed that 74 percent of the compounds from the TOX1 cluster also were in the
CHEM1 cluster, and 61 percent of the compounds from the TOX2 cluster also were in the CHEM2
cluster. Overall association between the major clusters of the two spaces was found to be 69 percent.

The Center also has developed a novel technique for comparing molecules. Shape signatures compare
molecules by subtracting their histograms. A software program sketches the molecule, and a special
algorithm converts three-dimensional molecules into small, compact representations based on the
molecular shape and surface charge distribution, the two features predominantly associated with receptor
ligand binding. The shape signatures of different molecules then can be compared. The smaller the

The Office of Research and Development's National Center for Environmental Research 6

-------
Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop

difference between the histograms, the more similar the molecules. The Center has created a shape
signature library that houses more than 3 million compounds. A number of shape-based QSAR models
for toxicity prediction have been developed.

New Jersey Environmental Bioinformatics and Computational Toxicology Center - EPA
Collaboration on an Approach to Using Toxicogenomic Data in Risk Assessment: Dibutyl Phthalate
Case Study

Susan Euling, EPA, ORD, National Center for Environmental Assessment (NCEA)

How can genomic data be used effectively in risk assessment? Collaboration between mathematicians and
biologists is needed to answer this question. Genomics technologies are powerful because they are global
or genome-wide and toxicogenomic data can identify precursor events, biomarkers of effect or exposure,
and mechanisms and modes of action. Strengths of microarray data include the ability to identify
pathways, build gene networks, and identify affected processes, pathways, and networks. Challenges
include the size and complexity of the datasets and the fact that statistical cutoffs do not necessarily
indicate biological significance. Limitations of using toxicogenomics technologies have included
reproducibility issues, the need to link affected pathways and genes to an adverse outcome, and the cost
involved in performing dose-response microarray studies.

The overall project goals were to develop an approach for using toxicogenomic data in risk assessment
and perform a case study using this approach. Dibutyl phthalate (DBP) was selected for the case study
because it has a relatively large genomic dataset, and there is phenotypic anchoring for a number of the
observed gene expression changes. There are two well-characterized modes of action for DBP responsible
for the male reproductive developmental effects: a decrease in Insl3 and a decrease in fetal testicular
testosterone. Questions were identified to direct the DBP case study evaluation. The questions were
whether the toxicogenomic data could inform additional modes and mechanisms of action for the DBP
male reproductive developmental effects and whether the genomic dataset could inform interspecies
differences in the reduced testicular testosterone mode of action. To explore modes of action, the
consensus pathways were identified from two different pathway analysis approaches for a selected
microarray study of testes after in utero DPB exposure.

There is concern that the traditional method of first identifying differentially expressed genes and then as
a second step performing pathway mapping might result in a loss of information. Thus, the STAR Center
collaborators took a different approach to identify significantly affected pathways, considering all of the
genes in the pathway and calculating a pathway activity level for different pathways. Advantages of this
approach include the consideration of all genes in a pathway and the ability to compare activity among
pathways.

Methods to inform interspecies differences in mode of action were explored. There is a need for
approaches and metrics to extrapolate from animal model findings to humans for risk assessment.
Available data were used to develop cross-species metrics for the biosynthesis-of-steroids pathway, one
of the pathways that underlies the decrease in fetal testicular testosterone mode of action. Three different
data sources were used to assess rat-to-human pathway similarity, and results showed approximately 85
percent similarity using any of these three approaches. A remaining issue in applying any or all of these
methods to risk assessment is determining whether these are "low" or "high" degrees of similarity. This
issue can be explored further to develop a basis for comparison.

Case study findings include the identification of additional functions (e.g., cell adhesion) and pathways
(e.g., Wnt signaling) affected after in utero DBP exposure that may inform modes of action responsible
for the "unexplained" endpoints. Hypothesis testing studies are needed. Other accomplishments include
the development of a systematic approach for evaluating toxicogenomics data for use in future risk
assessments; the development and exploration of the application of microarray analytical methods to risk

The Office of Research and Development's National Center for Environmental Research 7

-------
Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop

assessment including the pathway activity method, the gene network model over time, and the exploration
of methods to assess cross-species conservation on a given pathway; and the identification of research
needs for toxicity and genomics studies for use in risk assessment.

Recommendations based on the case study are to evaluate genomic and other gene expression data for
consistency of findings across studies for affected genes and pathways, perform benchmark dose response
modeling when high-quality reverse transcriptase-polymerase chain reaction data are available for genes
known to be in the causal pathway for a mechanism of action or outcome, and perform new analysis of
genomic data if re-analysis is expected to yield new information useful to risk assessment.

Dr. Kavlock asked the STAR Center researchers in general whether STAR funding had been useful in
obtaining other grant funding, including stimulus funding. The consensus among the group was that the
STAR funding had been useful for leveraging additional funding.

Carolina Environmental Bioinformatics Research Center
Fred Wright, University of North Carolina

The Carolina Environmental Bioinformatics Center (CEBC) was funded to extend capabilities in
computational toxicology. Specific capabilities include omics expertise and strengths in elucidating
genetic variation. The Center's three research projects focus on biostatistics, cheminformatics, and
computational infrastructure for systems toxicology; each project collaborates directly with environmental
scientists. The Center also includes an administrative unit and an outreach and translational activity unit.
The Center has collaborated extensively with EPA; seven joint papers are in various stages of publication,
and 14 joint abstracts/posters have been accepted at scientific meetings. Whereas the Carolina Center for
Computational Toxicology is more highly focused on biology and mechanistic modeling, the CEBC
focuses on discovering and obtaining valid statistical conclusions.

Project 1, the biostatistics in computational toxicology project, includes an emphasis on strengths in
microarray analysis, elucidation of networks/pathways, and eQTL analysis. There is a new emphasis on
dose-response testing, data mining, and penalized regression. Analysis of ToxCast™ Phase I data from
EPA and development of related methods likely will be a large portion of the remaining activity. Project
objectives include providing biostatistical support to the Center, performing data analysis and developing
methods, and collaborating with EPA and the computational toxicology community. Recent activities
include direct collaborations via data analysis work with Project 2 investigators on toxicity prediction and
data mining methods and work with Project 3 investigators on rodent toxicity modeling. In addition, the
project is performing analysis of clinical toxicity and metabolomic data to explore a large number of
prediction approaches, analysis of ToxCast™ data, and expression QTL mapping relevant to toxicity.
Collaborations have inspired the development of new methods. For example, CEBC scientists worked
with EPA scientists on a microarray dose-response study. This work led to new considerations for using
dose-response data; there currently are relatively few methods for dose-response that are tuned to gene
expression studies and even fewer that consider pathways (gene sets). An important question that arose
from this work was how to aggregate evidence across transcripts within a pathway. For dose-response
modeling for gene expression and pathways, the researchers have performed extensive investigation of
simple (approximate) two-parameter logistic fits, establishing reasonable false positive rates and power
for small sample sizes. A new tool that will perform dose-response pathway analysis for gene expression
data is under development. Other collaborations with EPA include comparing machine learning
algorithms in a simulated model for chemical toxicity and various efforts to predict chemical toxicity.
Another example of methods development is the work on methods for detecting true trans-bands in eQTL
studies and consideration of the importance of PC-based stratification control for eQTL analysis. In the
next year, Project 1 will focus on completing the methodology for open projects and collaboration,
completing the dose-response pathway analysis method, bringing the ToxCast™ data analysis to an

The Office of Research and Development's National Center for Environmental Research 8

-------
Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop

intermediate conclusion, and deepening the ToxCast™ data analysis in terms of choices of endpoints,
sensitivity versus specificity, and domains of applicability.

The objectives of Project 2 (cheminformatics) include coordinating the compilation and mining of data
from relevant external databases, performing analysis and methods development for building statistically
significant and externally predictive QSAR models of chemical toxicology data, and performing joint
work within the Center and with EPA collaborators. Under this project, one subproject works to improve
quantitative models of chemical toxicity through the use of hybrid chemical and biological descriptors.
The Center is working with EPA scientists, using high-throughput screening dose-response curves to
assist QSAR modeling of carcinogenicity. In this work, more than 300 chemical descriptors, 150
biological descriptors, and 400 hybrid descriptors are being used to predict carcinogenicity. Also under
development is a two-step hierarchical QSAR modeling workflow for predicting in vivo chemical
toxicity. Future studies include analyzing the models to identify significant assay-chemical combinations
that are predictive of in vivo outcomes, exploring the entire National Toxicology Program (NTP) dataset,
and applying modeling prospectively to prioritize new compounds for focused toxicity testing. In the next
year, Project 2 will focus on continuing work on QSAR modeling of multiple animal toxicity endpoints
and developing novel QSAR methodology by using in vitro biological information to model in vivo
toxicity endpoints. For all of these activities, the project will continue to use data collected under
ToxCast™, DSSTox, and other EPA projects.

Project 3, the computational infrastructure for systems toxicology project, is using a model for toxicity
profiling in multiple strains of mice to inform and develop an appropriate computational infrastructure,
with a focus on computational methods development and the development of user-friendly software tools
from methods in Projects 1 and 2. Project objectives include developing and implementing algorithms
that aid the analysis of multidimensional data streams in dose-response assessment and cross-species
extrapolation; facilitating the development of a standard workflow for analysis of the omics data, linkages
to classical indicators of adverse health effects, and integration with other types of biological information
such as genome sequences and genetic differences between species; and building Web-based open source
and user-friendly graphical interfaces associated with interoperable computational tools for data analysis
that facilitate the incorporation of new data streams into basic research and decision-making pipelines
(methods from Projects 1 and 2). This project has created a framework for handling emerging omics data
on genetic susceptibility in model organisms, provides programming expertise to create graphical tools
that are used by partners within the Center and in collaboration with EPA personnel and other environ-
mental scientists, and works to strengthen and advance the field of computational toxicology through
direct partnerships and the dissemination of tools used by both bioinformatics and bench scientists. The
driving biological problem is how to make population-wide predictions from toxicity profiling. Efforts
toward integrating varying types of biological information have been informed by examples such as the
study of the genetic factors underlying interindividual susceptibility to acetaminophen toxicity. In this
unique human-to-mouse-to-human work, the researchers have shown that the power of mouse genetics
can be extremely useful in discovering susceptibility genes, even when human data are available from
very small cohorts. Project 3 also is developing software tools, including a graphical interface for the
Significance Analysis of Function and Expression (SAFE) software, which assesses the significance of
biological categories in microarray studies while properly accounting for the effects of correlations
among genes. Investigators in this project also are key players in the integration of existing and new tools
into the Predictive Toxicology Web Portal (http://ceccr.unc.edu). Papers on the algorithms used are in
various stages of publication. In the next year, Project 3 will continue integration/support of tools from
other CEBC projects, continue programming and algorithmic developments, further improve algorithms
in tools and applications, develop specific data-mining algorithms for genomic databases, and continue
biology-driven research that generates appropriate datasets for testing and implementing novel
computational and biostatistical approaches.

The Office of Research and Development's National Center for Environmental Research 9

-------
Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop

Across the Center, there will be more emphasis on dissemination of information and training other
scientists in the use of the tools developed and on bringing open source code and methods to a new stage
in their evolution.

Collaborative Work With EPA
Ann Richard, EPA, ORD, NCCT

The work of the CEBC and the Carolina Center for Computational Toxicology overlaps nicely in terms of
methods development and moving that work into predictive model-building. The recent BOSC review
emphasized the importance of EPA maintaining an ongoing dialogue with academia; the biostatistics
capability at the University of North Carolina has brought high standards of statistical analysis to help
EPA evaluate the new data streams arriving via ToxCast™ and Tox21.

CEBC's cheminformatics project has the ability to generate thousands of QSAR descriptors representing
categories of structure-based computed properties (DRAGON), and the project has developed a sophisti-
cated predictive QSAR workflow. EPA defines the problems and provides data and guidance on how to
approach these problems. The DRAGON descriptors include many categories of chemicals and different
ways of describing these chemicals, which allows for flexibility in determining how best to approach a
problem. CEBC has developed QSAR models based on DSSTox-published data files and structure
inventories. The processed data files and calculated descriptors then are shared with EPA researchers for
public release. EPA and CEBC have co-authored several publications.

DSSTox has published structure annotated toxicity data, which have been used in the cheminformatics
work. A major objective of this project is to try to curate quality structure annotation and publish datasets
that provide representations of activity that are particularly amenable to structure activity modeling.
EPA's contribution to the Project 2 work has been through the ToxCast™ Phase I Chemical Inventory
and the ToxRefDB in vivo endpoints for modeling. CEBC used this information to process datasets
(ZEBET Acute Tox) and to calculate chemical descriptors (DRAGON) for the ToxCast™ Inventory.
CEBC's cheminformatics project overlapped published data for 1,408 compounds from the NTP High-
Throughput Screening Program with data from a carcinogenicity potency database. The aim was to
determine the ability of the NTP high-throughput assays to predict carcinogenicity. Data generated to date
show that the in vitro assays used have some ability to enhance modeling capabilities. The idea is that if
in vitro assays that presumably are unrelated to the endpoint can enhance modeling, in vitro assays that
are related to the endpoint should prove even more useful.

For years, there has been an effort to replace in vivo assays with in vitro screening methods. Many efforts
have been made to correlate in vitro half-maximal inhibitory concentration (IC50) with in vivo rat oral
median lethal dose (LD50), but none have been successful. It is important to consider new ways of
incorporating the IC50 data. Two key questions arose: Can the problem be broken into regions of higher
correlation? Can QSAR methods be used to define those regions based on chemical structure alone?
Moving regression was used to define regions of higher correlation, and a classification QSAR was
applied to assign the chemicals to one of three groups. The LD50 then was predicted for each group.

The Texas-Indiana Virtual STAR Center: Data-Generating In Vitro and In Silico Models of
Developmental Toxicity in Embryonic Stem Cells and Zebrafish

Maria Bondesson Bolin, University of Houston; Richard Finnell, Texas A&M University; James
Glazier, Indiana University

Approximately one in every 33 U.S. infants has a congenital anomaly. Heart defects are the most
common anomalies; others include neural tube defects and orofacial clefts. Although the causes of
congenital anomalies are both genetic and environmental, there is major concern about environmental
compounds as causative agents. In some cases, it is known that specific compounds cause anomalies. For

The Office of Research and Development's National Center for Environmental Research 10

-------
Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop

example, methyl mercury and other heavy metals have been shown to be teratogenic. There remains,
however, a large knowledge gap in terms of which compounds cause congenital anomalies.

The Center's objective is to develop new screening models for developmental toxicity. The aim is to
move from biological models of developmental toxicity to computer simulations. The main research goals
are to generate developmental models based on mouse embryonic stem cells and zebrafish suitable for
high-throughput screening, generate high-information content models on development and differentiation
using mouse embryonic stem cells and zebrafish, develop computational models for developmental
toxicity with the aim of first re-creating normal development (in wild-type) and then classifying possible
mechanisms by which chemical perturbations cause experimentally observed developmental defects, and
perform proof-of-concept experiments of the in vitro and in silico test platforms with a blind test of
chemicals.

The project has been divided into three investigational areas: (1) zebrafish as a model to elucidate the
morphological and mechanistic effects of environmental pollutants, (2) the effects of environmental
contaminants on mouse embryonic stem cell differentiation, and (3) the development of computer
simulations facilitating assessment of toxicity based on perturbed development in zebrafish and mouse
embryonic stem cells. Courses on zebrafish development, embryonic stem cells, and computer simu-
lations for doctoral students and postdoctoral fellows have been developed. The Center regularly
collaborates with stakeholders and other researchers. For all three projects, 37 chemicals that are known
or expected to be teratogenic have been chosen for study. The chemicals have been ranked by potential
threat to human health as determined by the Agency for Toxic Substances and Disease Registry and EPA.

The first investigational area uses zebrafish models to elucidate the morphological and mechanistic
effects of environmental pollutants. Zebrafish were chosen for a number of reasons: they are small,
embryos are transparent, fish can be transparent, they experience rapid external embryonic development
and produce hundreds of eggs weekly, the genome is homologous to humans, the developmental
pathways between fish and mammals are similar, many zebrafish mutants exist, it is relatively easy to
knock down gene expression in zebrafish, and they are cost-efficient and adaptable to medium- to high-
throughput screening.

Transgenic fish embryos will be produced, with the transgenes marking certain cell types during
development. The Center plans to construct 10 transgenic fish expressing fluorescent markers to follow
development and patterning. The endpoints include gastrulation and early embryonic cell movements,
patterning of the central nervous system and neurogenesis, hematopoiesis and angiogenesis, and yolk
utilization and morphological effects on somitogenesis. Morphology and green fluorescent protein/red
fluorescent protein expression will be recorded during normal development, and the embryos will be
treated with different toxicants to determine whether development is altered by teratogenic chemicals. At
the end of the project, the goal is to scale up and automate for high-throughput screening. High-
information content models based on the transgenic fish will be developed.

The second investigational area uses mouse embryonic stem cells as a model to elucidate the morpho-
logical and mechanistic effects of environmental pollutants. A recently created gene trap library contains
more than 350,000 embryonic stem cell clones and between 10,000 and 13,000 inactivated genes. The
aim is to use these embryonic stem cell resources to study specific markers of differentiation and patterns
to determine how environmental agents affect development. Genes have been selected primarily based on
their role in early embryonic development patterning, particularly those involving gastrulation and cell
movements. Expected results include documentation of morphology and (3-geo expression during normal
development and teratogenic chemical exposure.

The third investigational area will develop computer simulations facilitating the assessment of toxicity
based on perturbed development in zebrafish and mouse embryonic stem cells. This work still is in its

The Office of Research and Development's National Center for Environmental Research 11

-------
Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop

infancy. The major research question is how to move from cell phenomenology to tissue-level patterning
and structure. The goal is to build simulations that address some of these missing pieces in the under-
standing of increased developmental defects. Other related projects include the CompuCell3D Multi-Cell
Modeling Environment project, which is developing an open source, multiplatform modeling environ-
ment that allows the building of multicell simulations of developmental phenomena and diseases. Another
project is the Systems Biology Workbench (SBW) Reaction-Kinetics Modeling Environment, which is a
standard for performing reaction kinetic modeling of subcellular regulatory metabolic networks. Multicell
modeling in CompuCelBD and SBW will integrate molecular-, cellular-, and whole-organ-level data to
predict developmental effects of pathway disruption.

The BOSC review that took place immediately prior to this workshop included discussion of platforms
for many possible directions of approaching toxicological development. Questions yet to be answered
include: Is it more useful to develop organ systems that already are NCCT foci (e.g., liver, limb, gastru-
lation) or novel ones (e.g., vasculogenesis)? What are the best ways to integrate the biological approach
of the Texas-Indiana Virtual STAR (TIVS) Center team with the prioritization outcomes for EPA? Should
the focus be on one or two classes of perturbation agents? Should the focus be on tool/data or model
development?

Collaborative Work With EPA
Thomas Knudsen, EPA, ORD, NCCT

The RFA under which the TIVS Center was funded encouraged a different approach that studies expo-
sures that perturb biological events during formative stages of the reproductive cycle affecting embryo
and fetal development, postnatal development, fertility and reproduction, and children's health. Some key
research issues illustrate the complexity of this work, including timing of cellular interactions, sensitivity
of these systems, complexity of interactions, and maternal influence. Many developmental patterns can be
tracked and studied in vitro to define how chemicals disrupt fundamental control in patterning, timing,
differentiation, and morphogenesis.

TIVS will use in vitro models (zebrafish embryos, mouse embryonic stem cells) and in silico compu-
tational models to elucidate the morphological and mechanistic effects of environmental pollutants. For
the research on zebrafish embryos, there are a number of opportunities for connections between EPA and
TIVS, including data and resource sharing to evaluate developmental signaling pathways. Some chemical
effects in this system already are represented in ToxCast™ datasets, and the pathways identified by this
project will be added. TIVS can help EPA to prioritize the most important pathways in developmental
toxicity. For the research on the effects of environmental contaminants on mouse embryonic stem cell
differentiation, there also are opportunities for collaboration and data and resource sharing. The gene trap
studies are a nice adjunct to the zebrafish project. It will be important for TIVS and the other three STAR
centers to collaborate and share data analysis methods.

NCCT is interested in moving predictive capacity of ToxCast™ chemicals to developmental impacts. The
development of computer simulations facilitating the assessment of toxicity based on perturbed develop-
ment in zebrafish and mouse embryonic stem cells will provide a means of incorporating chemical and
biological information into systems complex enough to be relevant but not so complex that they are
intractable. NCCT is interested in developing virtual embryo systems to validate or invalidate predictions
generated by researchers. The opportunities for collaboration with TIVS include merging data from
developmentally competent in vitro assays with cellular and molecular assay targets, using predictive
associations from ToxCast™ high-throughput screening data to build hypotheses about mechanisms of
action, conducting studies to generate data testing hypotheses and improving predictive models, and
improving virtual tissue models to a level that can help prioritize chemicals for quantitative risk assess-
ment.

The Office of Research and Development's National Center for Environmental Research 12

-------
Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop

A Proposal From the European Commission's Complementary Research Program
Bart van der Burg, BioDetection Systems B.V.

The chemical substance in vitro/in silico screening system to predict human and ecotoxicological effects
(ChemScreen) is a collaborative project involving nine partners in five countries. It has not yet begun but
will span 4 years, with the majority of the practical work to be performed within the first 3 years.

Most of the 100,000 chemicals currently on the market are largely untested. To address this, the Registra-
tion Evaluation Authorisation of Chemicals (REACH) Program began in June 2007. Under REACH,
industry is responsible for providing data on chemicals. For compounds manufactured or imported in
quantities greater than 1 ton, manufacturers and importers must register the compounds with the European
Chemicals Agency (ECHA). ECHA also may request additional data as needed. Authorization is required
for harmful compounds. Approximately 30,000 chemicals are covered by REACH. Prioritized effects
under REACH include chemicals that are carcinogenic, mutagenic, or toxic to reproduction; chemicals
that are persistent, bioaccumulative, and toxic; and chemicals that are very persistent and very
bioaccumulative. It is estimated that REACH will cost between 2.8 and 5.2 billion Euros during the
course of 11 years, but REACH is estimated to save 50 billion Euros over 30 years as a result of health
improvements.

When traditional animal tests are used, the progress of REACH will be seriously hampered by ethics,
costs, capacity, and speed. To be successful, cost-effective, rapid in vitro tests need to be adopted.
REACH offers incentives for the use of alternative (nonvertebrate) tests. ECHA publishes test proposals
(by chemical manufacturers) and invites third parties to submit alternative proposals. There is an explicit
allowance for alternatives to in vivo tests, including in vitro and nontesting methods (QSAR, grouping,
exposure, read across). ECHA accepts the use of suitable methods and regularly reports on the use of
alternative methods. ChemScreen focuses on reproductive toxicity because it is important for assessing
both human and environmental toxicity, and its prioritization under REACH. Reproductive toxicity uses
the most animals in toxicity testing, and unfortunately there are few alternative methods.

The ChemScreen approach is to identify sensitive parameters for reproductive toxicity, identify critical
mechanisms involved in perturbation of these parameters, build a high-throughput system using these
modules, expand step-wise, integrate with bioinformatics/data interpretation, and build integrated testing
strategies, including nontesting methods. Work under ChemScreen will be divided as follows:
(1) establishment of in silico prescreening and toxicity prediction methods prioritizing in vitro toxicity
testing, (2) establishment of a database and an in silico prescreen to identify potential reproductive
toxicants, (3) establishment of sensitive parameters and a medium-throughput minimal essential in vitro
assay panel, (4)establishment of a high-throughput mechanistic pathway screen for reproductive
toxicants, (5) development of integrative methods to predict in vivo reprotoxicity allowing informed
decisions on prioritization for eventual further testing, (6) integration into one user-friendly tool, and
(7) dissemination.

Receptor gene assays that have been shown to reasonably predict the in vivo potency of compounds will
be used in ChemScreen. Dr. van der Burg displayed a table from a review in Pediatrics showing that most
compounds that are teratogenic in humans were not identified in animal studies. Screening systems will
include a panel of 15 to 50 reporter gene assays in human cells (nuclear receptors, dioxin receptor,
signaling/stress/developmental pathways); reporter gene assays in mouse embryonic stem cells
(ReProGlow, developmental pathways); wild-type embryonic stem/transcriptomics, metabolizing cell
systems, zebrafish/transcriptomics; and others for critical reprotoxicity endpoints (e.g., spermatogenesis).
In silico tools include an exposure module, a toxicity screening tool, in vivo reprotoxicity databases, and
an automated decision tool.

The Office of Research and Development's National Center for Environmental Research 13

-------
Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop

Discussion on Research Needs
Maggie Breville, EPA, ORD

Ms. Maggie Breville facilitated the discussion on computational toxicology research needs, using the
questions on the handout that was distributed to participants entitled "Research Needs to Advance the
Field of Computational Toxicology."

Question 1. Can the same techniques used by ToxCast™ to identify chemicals with a high likelihood of
being harmful also be used to identify and/or inform the design of safe chemicals that can be
manufactured and used (i.e., green chemistry)? What additional research is needed to make this happen?

Dr. van der Burg said that there is a great opportunity to develop cost-effective screening methods. There
also is a danger if a certain screening method is relied on too much, as one method may not be able to
identify all toxic chemicals. Research should focus on cost-effectiveness. Dr. Ann Richard noted that it
should be recognized that green chemistry products are safer alternatives; a battery of high-throughput
screening on both chemicals and alternatives would be helpful. A participant added that green chemistry
could help guide modeling efforts to develop safer alternatives. Dr. Dix said that green chemistry needs to
be repositioned to serve as a resource for chemical screening and testing.

Question 2. What type of information can we expect toxicity signatures developed through ToxCast™ and
other computational methods to provide regarding dose-response, chronic exposures, and potency?

There were no comments on this question. Participants were asked to send their answers via e-mail to
Ms. Segal.

Question 3. Is the ultimate goal of computational toxicology research to develop a virtual organism?

Dr. Kavlock said that the ultimate goal is to protect human health and the environment. A virtual
organism is a tool to achieve this goal, but it is not the ultimate goal. Dr. Glazier noted that a virtual
organism could be used to address Question 2. Dr. Kavlock added that more sophisticated tools are
needed to address toxicology in the risk assessment context.

Question 4. For results developed using computational techniques to be used in risk assessments, what
research and regulatory questions need to be answered?

A participant observed that this is an important question for regulatory decision-making support. This
issue has been addressed in Europe. It might be helpful for EPA to develop guidelines for computational
science, especially in terms of metrics and asking questions such as, "Why are we doing this?"
Dr. Richard said that to incorporate all of the methods in ToxCast™ there must be a standard for methods
development. Dr. Glazier noted that the answer depends on the goal. If the goal is to replace in vivo/in
vitro with in silico models, then there will be a different set of false positives and negatives than with in
vivo/in vitro. In the medical device field, changing methods opens up legal liability issues because even if
the new method results in fewer false positives and negatives, some people who would not have been hurt
by the older method are inevitably hurt by the new method. Dr. Glazier noted that when using new
approaches, researchers must be prepared for misses. A participant noted that some models and data are
not suited for regulatory purposes.

Question 5. What additional research needs should be addressed?

Dr. Kavlock said that a point raised in the BOSC review was that the current system under which EPA
manages the STAR Centers does not encourage or allow the renewal of the Centers. Much time and effort
are spent developing synergism and tools, but then it all comes to an end. The BOSC suggests examining
ways to evaluate the success of the STAR Centers and keep the research moving forward via renewal of

The Office of Research and Development's National Center for Environmental Research 14

-------
Computational Toxicology Centers Science To Achieve Results (STAR) Progress Review Workshop

the grants. A participant noted that there is some redundancy between the Centers and asked if it would be
possible to have an annual retreat for the Center leaders to allow for more collaboration. Ms. Segal said
that the Centers are asked to set aside funding for attending the progress reviews; having retreats in place
of progress reviews is an alternative that could be considered in the future.

Ms. Breville thanked the presenters and attendees for their contributions to the workshop and the support
contractors for their logistical assistance. She adjourned the meeting at 4:29 p.m.

The Office of Research and Development's National Center for Environmental Research 15

-------
U.S. Environmental Protection Agency (EPA) and National Center for Environmental

Research (NCER)

Computational Toxicology Centers STAR Progress Review Workshop

October 1, 2009

EPA Main Campus
Building C, Auditorium C111A/B
109 TW Alexander Drive
Research Triangle Park, NC 27711

POST PARTICIPANTS LIST

Cal Baier-Anderson

Environmental Defense Fund

Marianne Barrier

U.S. Environmental Protection Agency
Timothy Barzyk

U.S. Environmental Protection Agency

Maria Bondesson Bolin

University of Houston

Carole Braverman

U.S. Environmental Protection Agency

Maggie Breville

U.S. Environmental Protection Agency
Kelly Chandler

U.S. Environmental Protection Agency
Chris Corton

U.S. Environmental Protection Agency
Alva Daniels

U.S. Environmental Protection Agency
Sally Darney

U.S. Environmental Protection Agency
Jimena Davis

U.S. Environmental Protection Agency
Rob DeWoskin

U.S. Environmental Protection Agency

David Dix

U.S. Environmental Protection Agency
Peter Egeghy

U.S. Environmental Protection Agency
Susan Euling

U.S. Environmental Protection Agency
Richard Finnell

Institute of Biosciences and Technology
Elaine Francis

U.S. Environmental Protection Agency

Panos Georgopoulos

University of Medicine and Dentistry
of New Jersey

James Glazier

Indiana University

Najwa Haykal-Coates

U.S. Environmental Protection Agency

David Herr

U.S. Environmental Protection Agency
Ross Highsmith

U.S. Environmental Protection Agency
Keith Houck

U.S. Environmental Protection Agency

Elaine Cohn Hubal

U.S. Environmental Protection Agency

-------
Sid Hunter

U.S. Environmental Protection Agency
Lora Johnson

U.S. Environmental Protection Agency
Bonnie Joubert

U.S. Environmental Protection Agency
Richard Judson

U.S. Environmental Protection Agency
Robert Kavlock

U.S. Environmental Protection Agency
Thomas Knudsen

U.S. Environmental Protection Agency
Robert MacPhail

U.S. Environmental Protection Agency
Matthew Martin

U.S. Environmental Protection Agency

Catherine McCollum

University of Houston

Larry McMillan

U.S. Environmental Protection Agency
Leonard Mole

U.S. Environmental Protection Agency
Holly Mortensen

U.S. Environmental Protection Agency
Michael Morton

U.S. Environmental Protection Agency
Stephanie Padilla

U.S. Environmental Protection Agency
Heidi Paulsen

U.S. Environmental Protection Agency

James Rabinowitz

U.S. Environmental Protection Agency

Joseph Retzer

U.S. Environmental Protection Agency

Ann Richard

U.S. Environmental Protection Agency

Michael Rountree

Rountree Consulting

Ivan Rusyn

University of North Carolina at Chapel Hill
Paul Schlosser

U.S. Environmental Protection Agency
Deborah Segal

U.S. Environmental Protection Agency
Imran Shah

U.S. Environmental Protection Agency
Linda Sheldon

U.S. Environmental Protection Agency
Amar Singh

Lockheed Martin Contractor

Richard Spencer

Lockheed Martin Contractor

Bart van der Burg

BioDetection Systems

John Vandenberg

U.S. Environmental Protection Agency
Vikrant Vijay

National Institutes of Health
John Wambaugh

U.S. Environmental Protection Agency
William Ward

U.S. Environmental Protection Agency
William Welsh

University of Medicine and Dentistry of New
Jersey

ClarLynda Williams-Devane

U.S. Environmental Protection Agency

-------
Maritja Wolf

Lockheed Martin Contractor
Fred Wright

University of North Carolina at Chapel Hill

Contractor Support

Ramona Spencer

The Scientific Consulting Group, Inc.

-------