Protection Agency
Office of Research and Development
Center for Computational Toxicology and Exposure
Toxicity Reference
Database Version 2.1
User Guide
EPA601B22001 | August 2022 | www.epa.gov/research
Xs, EPA
United States
Environmental
-------
EPA Report Number 601B22001
August 2022
Toxicity Reference Database
Version 2.1
User Guide
by
Madison Feshuk, Sean Watford, Lori Kolaczkowski,
Katie Paul Friedman
US Environmental Protection Agency
Office of Research and Development
Center for Computational Toxicology and Exposure
Research Triangle Park, North Carolina
-------
Purpose
The purpose of this document is to provide documentation on how to technically
access and use the Toxicity Reference Database (ToxRefDB) version 2.1. The latest
data can be accessed through EPA's Clowder site (https://clowder.edap-
cluster.com/datasets/6 7 747fefe4b0856fdc65639b#folderld=62c5cfebe4b0 Id27e3b2d85 7)
. More information about ToxRefDB version 2.0 and its development can be found in
the publications below.
Watford, S., Pham, L.L., Wignall, J., Shin, R., Martin, M.T., and Friedman, K.P. (2019).
ToxRefDB version 2.0: Improved utility for predictive and retrospective toxicology
analyses. Reproductive Toxicology, 89, 145-158. DOI: 10.1016/j.reprotox.2019.07.012
Pham, L.L., Watford, S., Friedman, K.P., Wignall, J.A., and Shapiro, A.J. (2019).
Python BMDS: A Python interface library and web application for the canonical EPA
dose-response modeling software. Reproductive toxicology. DOI:
10.1016/j.reprotox.2019.07.013
This user guide does not necessarily reflect U. S. EPA policy.
2
-------
Abstract
ToxRefDB contains in vivo study data from over 5900 guideline or guideline-like studies for
over 1100 chemicals. This is largely comprised of curated animal study data from repeat dose
studies conducted according to Health Effects Series 870 guidelines, and many of these
studies (over 3,000 of them) come from registrant-submitted toxicity studies known as data
evaluation records (DERs) from the U.S. EPA's Office of Pesticide Programs (OPP). By
employing a controlled vocabulary for enhanced data quality, ToxRefDB serves as a resource
for study design, quantitative dose response, and endpoint testing status information given
guideline specifications. The database can aid in the validation of in vitro high throughput
screening of chemicals and serve as a resource for retrospective and predictive toxicology
applications.
3
-------
Table of Contents
Purpose 2
Abstract 3
Overview 5
Summary of v2.1 Update 7
Table 1: v2.1 Summary Statistics 7
Figure 1: Study-Level Data Landscape 8
Figure 2: Chemical-Level Data Landscape 9
Changes between v2.0 and v2.1 10
Table 2: Changes between v2.0 and v2.1 10
Accessing information in ToxRefDB 11
Installing MySQL and loading ToxRefDB 11
Example queries using MySQL 12
Programmatic Access 13
Python 13
R 13
Database Structure 15
Figure 3: ToxRefDB v2.1 ERD 15
Figure 4: Schema Overview 16
Data Curation Process 17
Figure 5: Data Extraction and Review Workflow 17
Quality Assurance in Data Extraction 18
Efforts to Reduce Error Rate 18
Unit Standardization 19
Study Reliability with ToxRTool 19
Table 3: ToxRTool Guideline Adherence Score 20
Guideline Profiles 21
Table 4: Guideline Profile Coverage 21
Endpoint Terminology 24
Figure 6: Hierarchical endpoint terminology example 24
Ontology mappings 24
Figure 7: Cross-referenced Terminology Sources 25
Negative Endpoints and Effects 26
Table 5: Endpoint Observation Status 26
Figure 8: Decision tree for identification of negative endpoints and effects 27
Figure 9: Example Observation Status Interpretation 27
Ongoing Work 28
Data Dictionary 29
4
-------
Overview
The Toxicity Reference Database (ToxRefDB) serves as a resource for structured animal
toxicity data for many retrospective and predictive toxicology applications. ToxRefDB
contains in vivo study data from over 5900 guideline or guideline-like human health relevant
studies for over 1100 chemicals.
The study types covered in ToxRefDB include the following repeat dose study designs utilizing
various administration routes (predominantly oral): chronic (CHR; 1-2 year exposures
depending on species and study design) conducted predominantly in rats, mice, and dogs;
subchronic (SUB; 90 day exposures) conducted predominantly in rats, mice, and dogs;
subacute (SAC; 14-28 day exposures depending on the source and guideline) conducted
predominantly in rats, mice, and dogs; prenatal developmental (DEV) conducted
predominantly in rats and rabbits; multigeneration reproductive toxicity studies (MGR)
conducted predominantly in rats; reproductive (REP) toxicity studies conducted largely in rats;
developmental neurotoxicity (DNT) studies conducted predominantly in rats; and a small
number of studies with designs characterized as acute (ACU), neurological (NEU), or "other"
(OTH).
Many of the studies (over 3,000) come from registrant-submitted toxicity studies known as data
evaluation records (DERs) from the U.S. EPA's Office of Pesticide Programs (OPP). Since
2009, continued curation efforts have expanded ToxRefDB to include toxicity studies from ten
additional sources, including the National Toxicology Program (NTP), peer-reviewed primary
research articles (OpenLit), and pharmaceutical pre-clinical toxicity studies (Pfizer, Sanofi,
GSK, Merck), among others (RIVM, PMRA, unpublished and unassigned sources). 90% of the
studies with completed curation (processed=1) correspond pesticide actives and inerts.
Although most studies in the database correspond to pesticides, curation of other study
sources incorporated additional functional use types of chemicals.
ToxRefDB serves as a resource for study design, quantitative dose response, and endpoint
testing status information given guideline specifications from the US Environmental Protection
Agency (US EPA) and the National Toxicology Program (NTP) headquartered at the National
Institute of Environmental Health Sciences. The legacy and current data curation workflow is
described in more detail in later sections. An important component of ToxRefDB is its
controlled vocabulary for studies and effects observed for enhanced data quality.
The first version of ToxRefDB (ToxRefDB 1.0) was initially released as a series of
spreadsheets, which are still available on EPA's FTP site and referenced in FigShare
(https://doi.Org/10.23645/epacomptox.6062545.v1). ToxRefDB underwent significant updates
that are described in the recent publication (Watford et al., 2019) and was released as
ToxRefDB v2.0. ToxRefDB v2.0 and associated summary files can be found
here: https://doi.orq/10.23645/epacomptox.6062545.v3.
ToxRefDB v2.1 is a minor update of ToxRefDB v2.0 to correct issues discovered with the
compilation script that caused some extracted values to not import properly from AccessDB
curation files, such as failure to import some effects. The .sql export of ToxRefDB v2.is
available for public download here: https://doi.org/10.23645/epacomptox.6062545. Although
the overall number of studies and chemicals remains unchanged, the v2.1 update includes
additional data as previously curated studies with extracted dose treatment groups and effects
are now fully accessible. This added data can improve the utility of ToxRefDB as a resource
5
-------
for curated legacy in vivo information by providing more complete information of the past
animal studies conducted. Moving forward, an application-driven workflow with the Data
Collection Tool (DCT) will be utilized to create a more sustainable process for loading curated
information to a database and support a more regular release cycle.
In addition to the accessing data via SQL downloads, ToxRefDB information is also
summarized with calculated point-of-departure values at the chemical and study level for
inclusion in the summary-level database, the Toxicity Value Database (ToxValDB), which is
accessible via the CompTox Chemicals Dashboard. This list aggregates chemicals associated
with curations in ToxRefDB v2.0: https://comptox.epa.gov/dashboard/chemical-
lists/T0XREFDB2.ToxRefDB v2.1 values will be incorporated in the next ToxValDB release.
6
-------
Summary ofv2.1 Update
ToxRefDB v2.1 is a minor update to ToxRefDB v2.0 to correct issues discovered with the
compilation script that caused some extracted values to not import properly from AccessDB
curation files, such as failure to import some effects. ToxRefDB v2.1 contains summary
information from 5986 studies for 1143 chemicals.
For ToxRefDB v2.0, quantitative (i.e. dose-response) data was extracted. This curation was
completed for 3871 studies with plans to extract and release the remaining data in subsequent
data releases. No additional curation was performed for the v2.1 update. To provide the reader
with a summary of the scope and coverage of the database, ToxRefDB was filtered to present
only data where a full curation with guideline profile observations was complete. This is
achieved using a 'processed' flag set to 1 within the study table.
Table 1 is a summary table of the number of chemicals and number of studies for each study
source, study type, and species. Study type abbreviations are as follows: CHR = Chronic, DEV
= Prenatal-Developmental, MGR = Multigeneration Reproductive, SAC = Subacute, SUB =
Subchronic.
Table 1: v2.1 Summary Statistics
Study type
Study source
Species
Number of studies
Number of chemicals
CHR
NTP
mouse
178
173
rat
169
164
OpenLit
mouse
4
4
rat
5
5
OPP DER
dog
331
298
hamster
4
3
mouse
342
303
primate
1
1
rat
398
328
Total CHR
1432
557
DEV
NTP
mouse
1
1
rabbit
3
3
rat
6
6
OpenLit
rat
1
1
OPP DER
mouse
18
16
rabbit
431
372
rat
508
433
Other
mouse
1
1
rabbit
1
1
rat
4
4
Total DEV
974
486
MGR
OpenLit
rat
1
1
OPP DER
mouse
2
2
rat
339
310
Other
rat
19
19
7
-------
Total MGR
361
331
SAC
NTP
mouse
29
26
rat
30
29
OPP DER
dog
1
1
mouse
3
3
rabbit
6
6
rat
15
13
Total SAC
84
51
SUB
NTP
hamster
1
1
mouse
119
107
rat
127
114
OpenLit
mouse
2
2
rat
4
4
OPP DER
dog
214
195
hamster
4
4
mouse
123
112
primate
3
3
rabbit
5
4
rat
418
335
Total SUB
1020
498
Database totals
3871
748
Figure 1 depicts a breakdown of studies by study source, study type, and species.
Figure 1: Study-Level Data Landscape
Type
Species
Source
1500
3000
to
.92
D
0
-Q
2000
1000
663
ll
CHR SUB DEVMGR SAC
study type
<*- ^ .# # .i?
-------
Figure 2 depicts a breakdown of chemicals by study source, study type, and species.
Figure 2: Chemical-Level Data Landscape
Type
Species
Source
600
J/5
03
o 400
E
0
0
| 200
c
800
600
_t/>
CD
o
E
0
jC
v 400
>~—
o
0
.Q
E
c
200
600
S 400
E
0
_£I
o
0
_Q
| 200
c
CHR SUB DEV MGR SAC
study type
<
-------
Changes between v2.0 and v2.1
The following table details a summary of differences between ToxRefDB v2.0 and v2.1.
ToxRefDB v2.1 is a minor update to recover thousands of extracted values that failed to import
properly from the original AccessDB curation files as described in the Data Curation Process
section. Although the overall number of studies and chemical remains unchanged, the v2.1
update includes additional data as previously curated studies (+594 studies with extracted
effects) with extracted dose treatment groups (+5226 dose treatment groups with effects) and
effects (+21756 effects) are now fully accessible. This added data can improve the utility of
ToxRefDB as a resource for curated legacy in vivo information by providing more complete
information of the past animal studies conducted.
Table 2: Changes between v2.0 and v2.1
Output
v2.0
v2.1
Change
Total number of studies with complete curation
3882
3871
-11
Number of studies with extracted effects
3068
3662
594
Total number of chemicals
748
748
0
Total database rows, including studies with no extracted
328623
344868
16245
effects
Total effects extracted
313525
335281
21756
Dose treatment groups with effects
35679
40905
5226
Unique effects: Cholinesterase endpoint category
5323
6008
685
Unique effects: Developmental endpoint category
8502
9640
1138
Unique effects: Reproductive endpoint category
4691
5775
1084
Unique effects: Systemic endpoint category
284352
302674
18322
Unique critical effects: Cholinesterase endpoint category
713
796
83
Unique critical effects: Developmental endpoint category
1118
1276
158
Unique critical effects: Reproductive endpoint category
488
645
157
Unique critical effects: Systemic endpoint category
18757
20989
2232
10
-------
Accessing information in ToxRefDB
A MySQL database export and summary files of ToxRefDB v2.1 are available for public
download, available here. The summary spreadsheet contains study and chemical-level
information for reference. ToxRefDB information is also summarized with calculated point-of-
departure values at the chemical and study level for inclusion in the summary-level database,
the Toxicity Value Database (ToxValDB), which is accessible via the CompTox Chemicals
Dashboard. ToxRefDB v2.1 values will be incorporated in the next ToxValDB release.
Below is documentation on how to install MySQL, load ToxRefDB, and access the data using
both SQL and programmatic access using either Python or R. Another useful tool to access
the data is MySQL Workbench, which provides a user interface to interact with any MySQL
database.
Installing MySQL and loading ToxRefDB
Steps to install MySQL load ToxRefDB are detailed below. More comprehensive
documentation for using MySQL can be found online.
• Download the ToxRefDB MySQL database
• Download the latest version of the MySQL community server.
• Select the appropriate installer for your operating system
o For Windows, download the MSI installer
o For MAC and Linux, download the DMG installer
• The installer will walk you through the installation. During the installation, be sure to
copy the temporary root password. You will need it later.
o For Windows, MySQL should automatically be added to your PATH
o For MAC and Linux, if MySQL was not added to your PATH automatically you
will have to add it manually
• Open the terminal and type:
» echo 'export PATH=/usr/local/mysql/bin:$PATH'
» ~/.bash_profile
Open the command line (Windows) or terminal (MAC and Linux) to login to the MySQL
server with the command
>>
mysql -u root -p
Enter the temporary root password when prompted for a password. Change the root
password following instructions detailed here.
Create the ToxRefDB database, select it as the default database, and load the dump file
following instructions detailed here:
mysql> CREATE DATABASE IF NOT EXISTS toxrefdb_2_0;
mysql> USE toxrefdb_2_0;
mysql> source toxrefdb_2_0.sql
11
-------
Example queries using MySQL
Once the ToxRefDB instance is established, the user is ready to begin querying the database.
These example queries can be tailored for exploratory data analysis, specific research
questions based the individual's use case, or risk assessment workflows.
# Get number of studies per study type
SELECT studyJype, COUNT(study_id) FROM study
GROUP BY studyjype;
# Get number of studies per study type and species
SELECT study_type,species, COUNT(studyJd) FROM study
GROUP BY studyjype,species;
# Get number of studies per source
SELECT study_source, COUNT(study_id) FROM study
GROUP BY study_source;
# Get all study information for chronic studies
SELECT * FROM study WHERE study_type="CHR";
# Get all treatment group and dosing information for a single chemical
SELECT * FROM chemical
INNER JOIN study ON chemical.chemical_id=study.chemical_id
INNER JOIN tg ON tg.study_id=study.study_id
INNER JOIN dose ON dose.study_id=study.study_id
INNER JOIN dtg ON dtg.tg_id=tg.tg_id AND dose.dose_id=dtg.dose_id
WHERE casrn="42509-80-8";
# Get number of studies per endpoint
SELECT endpoint_category, endpoint_type, endpoint_target,
COUNT(DISTINCT study.studyjd) AS "number of studies" FROM study
INNER JOIN tg ON study.study_id=tg.studyjd
INNER JOIN tg_effect ON tg.tg_id=tg_effect.tg_id
INNER JOIN effect ON effect.effect_id=tg_effect.effect_id
INNER JOIN endpoint ON endpoint.endpoint_id=effect.endpoint_id
GROUP BY endpoint_category,endpoint_type,endpoint_target;
# Get all study-level LELs and LOAELs for effect profile 2
SELECT * FROM pod WHERE effect_profiIe_id=2 AND studyjd IS NOT NULL AND podjype IN("loael","lel");
# Get chemical-level PODs for effect profile 2
SELECT * FROM pod WHERE effect_profileJd=2 AND studyjd IS NULL;
# Get study-level PODs for effect profile 2 and for a specific endpoint
SELECT DISTINCT pod.* FROM pod
INNER JOIN podJg_effect ON pod.podJd=podJg_effect.podJd
INNER JOIN tg_effect ON tg_effect.tg_effectJd=podJg_effect.tg_effectJd
INNER JOIN effect ON effect.effect_id=tg_effect.effect_id
INNER JOIN endpoint ON endpoint.endpoint_id=effect.endpoint_id
WHERE effect_profiIe_id=2 AND studyjd IS NOT NULL
AND endpointjarget LIKE "thyroid%";
# Get all dose-response data for a study
SELECT * FROM chemical
INNER JOIN study ON study.chemicaljd=chemical.chemicaljd
INNER JOIN tg ON tg.studyjd=study.studyjd
INNER JOIN dose ON dose.study_id=study.studyjd
INNER JOIN dtg ON dtg.tg_id=tg.tg_id AND dose.doseJd=dtg.dose_id
INNER JOIN tg_effect ON tg.tgjd=tg_effect.tg_id
INNER JOIN effect ON effect.effect_id=tg_effect.effect_id
INNER JOIN endpoint ON endpoint.endpoint_id=effect.endpoint_id
INNER JOIN dtg effect ON tg effect.tg effect id=dtg effect.tg effect id AND dtg.dtg id=dtg effect.dtg id
12
-------
WHERE study.study_id=687;
Programmatic Access
The user is not limited to SQL queries in MySQL Workbench to access ToxRefDB. You can
also programmatically access the data with several languages. Below are examples of
accessing the data into datasets for further work in Python and R. You will still have to connect
to the database through the language specific connector.
Python
In the example below, the python packages sqlalchemv, pandas, and pymysql are required.
You can, however, use any type of connector. Any SQL query can replace the one provided in
this example.
# Load libraries
import sqlalchemy as sa
import pandas as pd
# Establish connection
username = ""
password = ""
host = ""
database = ""
engine =sa.create_engine(f mysql+pymysql://{username}:{password}@{host}/{database} )
# Get guideline profiles
results = pd.read_sql(
SELECT guideline.guidelinejd,
guideline.guideline_number,
guideline.name,
guideline.profile_name,
guideline.description,
g u i d el i n e_profi le. g u i d e I i n e_p rof i I e_i d,
guideline_profile.obs_status,
guideline_profile. description,
endpoint.endpointjd,
endpoint.endpoint_category,
endpoint.endpoint_type,
endpoint.endpoint_target FROM guideline
INNER JOIN guideline_profile ON guideline.guideline_id=guideline_profile.guidelinejd
INNER JOIN endpoint ON endpoint.endpoint_id=guideline_profile.endpoint_id
.engine)
# Export to excel
writer = pd.ExcelWriter("guideline_profiles.xlsx")
results.to_excel(writer,index=False,merge_cells=False)
writer. saveQ
R
In the example below, the R package RMySQL required. Any SQL query can replace the one
provided in this example.
# Load library
library(RMySQL)
# Establish connection
13
-------
con <-dbConnect(drv = RMySQL::MySQL(), user="",
password = "",
host = "", database ="")
# Get all ToxRefDB information for subchronic studies
output <-dbGetQuery(con, "SELECT chemical.casrn,
chemical. preferred_name,
study.studyjd,
study. study_type,
study. study_year,
study. study_source,
study.species,
study.strain_group,
study.admin_route,
study. admin_method,
endpoint.endpoint_category,
endpoint.endpoint_type,
endpoint.endpoint_target,
endpoint.endpoint_id,
tg_effect.life_stage,
tg_effect.tg_effect_id,
effect, effectjd,
effect. effect_desc,
tg.sex,
tg.generation,
dose.dosejevel,
dtg.dose_adjusted,
dtg.dose_adjusted_unit,
dtg_effect.treatment_related,
dtg_effect.critical_effect,
tested_status,
reported_status FROM chemical
INNER JOIN study ON chemical.chemical_id=study.chemical_id
LEFT JOIN dose ON dose.study_id=study.study_id
LEFT JOIN tg ON tg.study_id=study.study_id
LEFT JOIN dtg ON tg.tg_id=dtg.tg_id AND dose.dose_id=dtg.dose_id
LEFT JOIN tg_effect ON tg.tg_id=tg_effect.tg_id
LEFT JOIN dtg_effect ON tg_effect.tg_effect_id=dtg_effect.tg_effect_id AND dtg.dtg_id=dtg_effect.dtg_id
LEFT JOIN effect ON effect.effect_id=tg_effect.effectjd
LEFT JOIN endpoint ON endpoint.endpoint_id=effect.endpoint_id
LEFT JOIN obs ON obs.study_id=study.study_id AND obs.endpoint_id=endpoint.endpoint_id
WHERE study_type='SUB'")
14
-------
Database Structure
This entity-relationship diagram (ERD) can be used to understand the relationships between
tables. BMDExpress software (Pham et al, 2019) was not run to calculate benchmark dose
values for v2.1, therefore BMD tables were dropped from the v2.1 schema.
Figure 3: ToxRefDB v2.1 ERD
J taxrefdb_dd
toxrefidb _tabl e TEXT
toxreftlb J eld TEXT
description TEXT
J endpoint
endpointjd INT(ll)
endpoint_category VARCHAR(255)
endpoint_type VARCH AR(255)
H-
endpoint_target VARCHAR(255)
~
r~
K
~ chemical
chemical jd INT(ll)
dsstox_substance_id VARCH AR(45)
casrn VARCHAR(255)
preferred_name VARCHAR(255)
_J guideline
guideline Jd INT(11)
guideline_number VARCHAR(64)
name VARCHAR(512)
profile_name VARQHAR(64)
description VARCHAR(1024)
i i
H negative_endpoint
negative.endpointjd INT(ll)
endpointjd IWT(ll)
> studyjd INT(ll)
~3 unit_standardization
unitstandardizationjd INT(ll)
- original_unit VARCHAR(255)
corrected_unit VARCHAR(255)
dose
dosejd INT(11)
> studyjd INT(ll)
Odosejevd INT(ll)
Oconc DOUBLE
0 concjjnit VARCHAR{255)
0 vehicle VARCHAR(255)
dose_comment VARCHAR(1024)
Z] study_clowder ~
study_dowderJd INT (11)
studyjd INT(ll)
filename VARCHAR{128)
- filetype VARCHAR{3)
'dowder_uid VAROiAR(128)
J effect_profile_group_toxrefdb
effectjirofile_group_toxrefdbJdINT(ll)
^ groupjd INT(11)
> effect_profileJd INT(ll)
' tg_effectjd INT(11)
"3 study
studyjd INT(ll)
' chemicaljd INT(ll)
study_sourceJd VARCHAR(255)
-• studyjatatian VARCHAR(1024)
study_year INT(ll)
'•> study_source VARCHAR(255)
studyjype VARCHAR(255)
study_tvp e_gu ideli ne VARCHAR(255)
speaes VARCHAR(255)
stranjroup VARCHAR(255)
strain VARCHAR(255)
adminjoute VARCH AR(255)
adminjnethod VARCHAR(255)
• substerice_source_name VARCHAR(255)
substance_purity VARCH AR(255)
substanceJot_bath VARGHAR(255)
> substance_comment VARCHAR(255)
dose_sfart INT(11)
dose_start_unit VARCH AR(255)
dose_end INT(ll)
dose_end_unit VARCH AR(255)
• study_com ment VARCH AR(2048)
guideline Jd INT(ll)
processed TINYINT(ll)
Z] study_toxrtool
studyjoxrtooljd INT(ll)
0 toxrtooljd INT(ll)
- studyjd INT(ll)
O score INT(ll)
- toxrtool_comm ent VARCHAR(1024)
^filename VARCHAR{128)
_] ontology_toxrefdb T
ontology _toxrefdbJd INT(ll)
¦> ontologyjd INT(ll)
toxrefdbjd INT(11)
¦toxrefdbjable VARCH AR(64)
toxreftib Jdd VARCHAR(45)
~
Z] ob s
obsjd INT(ll)
> studyjd INT(ll)
endpointjd INT(ll)
• status VARCH AR(64)
default TINYINT(1)
tested_status TINY INT (1)
reported_status TINYINT(1)
guideline_profileJd INT(ll)
obs_comment VARCHAR(1024)
Z] effect
effectjd INT(ll)
' endpointjd INT(ll)
effect_desc VARCH AR(255)
•> cancer_related TINYINT(l)
J
_zzzx
I
D dtg_effect
dtg_effectjd INT(ll)
^ tg_effectjd INT(ll)
~dtgjd INT(ll)
treatmentjelatedTINYINT(l)
' critical_efFectTINYINT(l)
sample_size VARCH AR(32)
0 effectual DOUBLE
efFect_val_unit VARCHAR(128)
effect_var DOUBLE
effect_var_type VARCHAR(32)
time DOUBLE
• time_unit VARCHAR(64)
dtg_effect_comm ent VARCH AR(1024)
-L
~ toxrtool
toxrtooljd INT(11)
criteria_group VARCHAR(256)
-"criteria INT(ll)
> question VARCHAR(1024)
~3 ontology
ontology Jd INT(11)
ontology_name VARCHAR(64)
> uid VARCHAR(45)
> uid_type VARCHAR(45)
label VARCHAR(256)
description VARCHAR(2048)
' uri VARCHAR(45)
~3 effect_profile_group
effect_profile_groupJd INT(ll)
> groupjd INT(11)
- group_name VARCH AR(123)
' group_descri pti on VARCHAR(2048)
- effect_profileJd INT(ll)
—H-
~ dtg
dtgjd INT (11)
* dosejd INT(ll)
~ tgJd INT(ll)
• dose_adjusted DOUBLE
dose_adjusted_unit V ARCHAR(32)
dtg_comment VARCHAR(1024)
mg_kg_day_value DOUBLE
\-K
\
I
I
~ tg_effect
tg_eflfectjd INT(ll)
> tgjd INT(ll)
> life_stage VARCHAR(32)
effect_desc_free VARCHAR(25S)
target_site VARCHAR(64)
- direction TINYINT(l)
effect_com ment VARCHAR(1024)
* effectjd INT(11)
no_quant_dala_reported TINYINT(l)
guideline_profile
guidelinejjrofileJd INT(ll)
> endpointjd INT(ll)
> guidelinejd INT(11)
obs_status VARCHAR(64)
¦ description VARCHAR(1024)
J pod_tg_effect ~
pod_tg_effectJd INT(ll)
•> podjd INT(ll)
¦> tg_efFectJd INT(ll)
tgjd INT(ll)
~ studyjd INT(ll)
- sex VARCH AR(8)
generation VARCHAR(16)
dosejieriod VARCHAR(32)
dose_durat'on INT(11)
dose_duraiion_unit VARCHAR(16)
'¦n FLOAT
tg_comment VARCHAR(1024)
--K
Z] negative_effect ~
negative_effectJd INT(ll)
studyjd INT(ll)
• endpointjd INT(ll)
effectjd INT(ll)
~
~ effectjprofile
effect_profileJd INT(ll)
effect_profilejiame VARCHAR(128)
effect_profile_description VARCHAR(2048)
~ pod
' podjd INT(ll)
•' pod_type VARCH AR{45)
- sex VARCH AR(8)
> adminjoute VARCHAR(255)
species VARCHAR(255)
qualifier VARCHAR(8)
pod_vdue DOUBLE
> pod_unit VARCHAR(45)
mg_kg_day_value DOUBLE
¦ dosejevd INT(ll)
max_doseJevel INT(ll)
• stcggered_dosingTINYINT(l)
> chemicaljd INT(11)
j studyjd INT(ll)
effect_profil eJd INT(ll)
groupjd INT(ll)
15
-------
Figure 4: Schema Overview
metadata, dosing, and significant treatment-related and critical effects.
Part 2:
B
Observation status for ToxRefDB endpoints
Reported status
Was the endpoint described in the study literature?
Tested status
is "assumed" based on the
default from guideline profile
Tested status
Were data collected for the
endpoint?
~No~
(not tested)
No effect data recorded for
the endpoint in database
Treatment group effect data
• Life stage
• Direction of net change
across all doses
(increase/decrease)
Qualitative
• Treatment
related?
• Critical effect?
Treatment-related endpoint effects
Was the data collected described as at least
one of the following?
1. Toxicologicallysignificant
2. Biologically significant
3. Statistically significant
4. Used to derive LOEL/NOEL
5. Treatment-related or Dose-related
6. Quantitative data suggests trend across doses
Yes
Effect data information
Method information describing the
data collected for each applicable
endpoint's effect
Part 2 provides more context about the data entry method. Portion of ToxRefDB 1.0 that
carried over to version 2.0 unchanged. The previously extracted information from ToxRefDBvl
was checked for accuracy and modified/added for QA purposes.
A. Curator assigns endpoint testing status according to guideline profile. Uses decision
tree to classify 400 standardized endpoints as described in study reports. Guideline
profiles were developed that match language found in the studies. These guideline
profiles were used for inference of negative endpoints/effects.
B. Observed Endpoints classified as "tested" are evaluated for treatment-related effects.
Treatment-related effects are indexed by endpoint and method information pertaining to
the data collected.
C. Where available, complete dose-response effect qualitative and/or quantitative data for
each dose was extracted.
16
-------
Data Curation Process
Initially, ToxRefDB v1.0 provided only summary effect levels and lacked quantitative dose
response information. This task initially proceeded using an Excel file-based extraction;
however, the process required manual corrections after uploading study extractions to the
ToxRefDB MySQL database, including inconsistent comments, different number of animals for
the same treatment group, and added effects outside of the controlled terminology. The
quantitative information and its application in ToxRefDB v2.0 served as a strong impetus to re-
extract the studies.
An Access database file was generated from the MySQL database for each study in v1.0, and
this approach offered several improvements including standardized options for more consistent
reporting in some fields, such as the units on time and dose, dose-treatment group, and effect
information; checkbox reporting for observation status on each endpoint and effect; and a log
for tracking changes and facilitating QA. Nearly 32% of the studies were extracted using the
Excel-based approach, with the remaining studies extracted using the Access database
approach. Switching to Access database files from Excel files significantly reduced errors and
increased standardization of reporting items such as units, endpoints, and effects.
Figure 5: Data Extraction and Review Workflow
Generate Access database tees for
Figure 5 details the workflow of the overall data extraction process for ToxRefDB v2.0. Access
databases files were generated for each study in ToxRefDB v1.0 and bundled with the
corresponding source files for data extraction. The data in the Access databases are curated
with additional data extracted from the source files with up to three levels of review. The
Access databases are returned by the reviewers and the data is imported back into the MySQL
database with the study table designation of processed=1.
ToxRefDB
(MySQL) /*
17
-------
ToxRefDB v2.0 curation also included the implementation of guideline profiles to guide
curation. Endpoints were annotated (e.g. "required", "not required") according to guidelines for
subacute, subchronic, chronic, developmental, and multigenerational reproductive designs,
distinguishing negative responses from untested. Implementation of controlled vocabulary
improved data quality; standardization to guideline requirements and cross-referencing with
United Medical Language System (UMLS) connects ToxRefDB v2.0 observations to
vocabularies linked to UMLS, including PubMed medical subject headings (MeSH). The
endpoint terminology and its hierarchical nature is described in later sections.
Moving forward, an application-driven workflow with the Data Collection Tool (DCT) will be
utilized to create a more sustainable process for loading curated information to a database.
The DCT improves upon the legacy ToxRefDB curation workflow to provide document
allocation, curation and workflow management among users, and management review with
data conflict resolution, resulting in records that directly link quality-controlled curations to
source documents. The DCT offers flexibility via its modular workflow for curating the
heterogeneous and complex in vivo study designs.
A multi-layer review process will continue to be implemented with the DCT to ensure data
integrity and minimize data entry error.
Quality Assurance in Data Extraction
Guidance for data extraction was stratified first according to study type (e.g., CHR, SUB, DEV,
MGR) then by study source (e.g., OPP DER and NTP) because of the differences in both
study design and adverse effects required for reporting as stated in guidelines. The process
used to extract study information was also an important aspect of QA efforts for ToxRefDB
v2.0. First, a primary reviewer extracted study, dose, treatment group, effect, and endpoint
observation information. The instructions detailed how to review the toxicological data and
extract it from the original data sources consistently across reviewers using the Access
database. This was reviewed by a second, senior reviewer, who was asked to review all
extracted information as if they were extracting it again and, also, to review the comment log
from the primary reviewer. Finally, if either the primary or secondary reviewer noted that it was
necessary, an additional senior toxicologist reviewed the comment logs, extracted information,
and resolved any conflicts or questions prior to finalization of the extraction. The final, tertiary
review occurred for approximately 10% of the studies. Review by a manager to resolve any
differences between the primary and secondary reviewer serves to inform any training needs
or gaps for the reviewers. During this process, subject matter experts can also be consulted to
resolve questions. For release of ToxRefDB v2.0, the full quantitative data extraction for all
CHR and SUB studies were completed, with quantitative data extraction completed for many
other study types and sources as well.
Efforts to Reduce Error Rate
Error rate is an inherent problem for legacy databases as much of the source information was
entered manually and human errors resulting from transcription are impossible to completely
avoid. However, as part of the ToxRefDB v2.0 curaion effort, more robust QA processes were
implemented to promote greater fidelity of the information extracted and numerous quality
control (QC) checks to verify data integrity.
18
-------
First, studies were extracted utilizing a defined QA process, with multiple levels of review and
Access form-based entry (described previously) to prevent extraction errors. Upon uploaded
into ToxRefDB v2.0, these extractions were required to pass specific QC checks because,
although the Access database files enforce the MySQL database constraints as well minimize
data entry error by standardizing vocabulary used, logical errors can persist. After the
extracted data was uploaded through the import script, a series of potential logical errors were
identified through unit tests where their curated value could be assumed. Flagged logical
errors that have been corrected included:
• Dose level numbering did not correspond to the total number of doses;
• Duplication of concentration/dose values, including two control doses;
• No concentration and no dose adjusted value for a reported effect (possible extraction
error or possibly that the effect was qualitatively reported);
• The critical effect level is at a dose below where treatment-related effects were
observed; and/or,
• The control was incorrectly identified as a critical effect level.
Any of these issues that could not be resolved systematically were flagged to undergo a
second round of extraction and review to correct. Though QC is an ongoing and evolving
process, these QC checks are serving as an improvement to the overall database and
database development process.
Unit Standardization
An additional ongoing problem for reporting quantitative data from clinical or related laboratory
findings is unit standardization. No guidance is provided on how to report findings in the
OCSPP guidelines nor from any other sources, so units were extracted exactly as they were
presented in the reports. The units were standardized by eliminating duplicate entries for the
same units that were originally entered differently or with typographical errors. Units were only
standardized, and no conversions were introduced in the current database. Ongoing efforts
include further standardization of units and defining conversions that cannot be systematically
automated.
Study Reliability with ToxRTool
Most studies referenced within ToxRefDB were extracted via summaries from OPP DERs, and
these studies typically follow OCSPP 870 series Health Effects Testing Guidelines. As
ToxRefDB was expanded, additional studies needed to be assessed for reliability and
guideline adherence.
The Toxicological Data Reliability Assessment Tool (ToxRTool) was adapted for reliability
assessment. ToxRTool is an Excel application that includes questions across 5 criteria with
numerical responses that are summed to lead to a Klimisch score: a score ranging from 1-4
that captures an overall assessment of reliability.
A total of 522 OpenLit studies were assessed with the ToxRTool with scores ranging from 8 to
23. As explained in the table below, most studies reviewed for ToxRefDB v2.0 corresponded to
Klimisch quality scores of 1 (ToxRTool score of > 18) or 2 (ToxRTool score of 13-18). The
ToxRTool scores could be used as a quality flag both to qualify and prioritize studies for the
extraction process, or by users who are performing reviews of information on a single chemical
basis.
19
-------
Table 3: ToxRTool Guideline Adherence Score
Score
Description
5
Adheres to modern* OECD/EPA guideline for repeat-dose toxicity studies
(explicitly stated by authors; broad endpoint coverage and ability to assess
dose-response)
4
Adheres to an existing or previous guideline (explicitly stated by authors;
previous version of OECD/EPA guidelines or FDA guidelines)
3
Not stated to adhere to guideline but guideline-like in terms of endpoint
coverage and ability to assess dose-response (e.g., NTP). Please see Quick
Guide to EPA Guidelines for chronic and subchronic studies. In this table, you
can easily assess whether the study was guideline-like in terms of the animals
used (species, sex, age, number), dosing requirements, and reporting
recommendations.
2
Unacceptable adherence to guideline (intended to adhere to guideline but had
major deficiencies)
1
Unacceptable (no intention to be run as a guideline study, purely open
literature or specialized study)
A study is considered as adhering to "modern" OECD/EPA guidelines if it was published after
1998, which is the date that many Health Effect 870 series guidelines were re-published. Note
that many of the studies extracted, particularly from sources like the NTP and OpenLit, were
never intended to adhere to a guideline and as such "unacceptable" in this case only refers to
their guideline adherence and not the study design itself.
20
-------
Guideline Profiles
Within a curation, study records are linked to a guideline profile. OPP DERs follow the Series
870 - Health Effects Test Guidelines, described here. NTP reports follow NTP specifications.
Other subsources cannot be uniformly mapped, but some curations may be assigned a
guideline profile based on how closely the study design adheres to a guideline.
Guideline profiles for study endpoints were created from the Office of Chemical Safety and
Pollution Prevention (OCSPP) series 870 Health Effects Testing Guidelines and NTP
specifications (Table 2). This allows for analysis of guideline adherence for both guideline and
non-guideline studies.
Table 4: Guideline Profile Coverage
Additional efforts are underway to develop new profiles. The Guideline Profile column is a
concatenated entry of ToxRefDB's guideline id, guideline number (usually OCSPP Guideline
No. or NA for NTP specifications), guideline name, and abbreviated guideline profile name.
Study Type
Guideline Profile
Guideline Profile Description
CHR-
Carcinogenicity
• 9 | 870.42 | Carcinogenicity
| CHR_carc
The objective of a long-term carcinogenicity study is to observe test
animals for a major portion of life span for development of neoplastic
lesions during or after exposure to test substance by an appropriate
route of administration. The dose period generally lasts a year or
longer, typically 12, 18, or 24 months, and observations will exclude
developmental and neuroloqical effects. See OPPTS 870.4200
Carcinogenicity.
CHR - Chronic
Toxicity
• 17 | NA| 2-Year Toxicity |
CHR_ntp
• 8 | 870.41 | Chronic
Toxicity| CHR_chr_tox
The objective of a chronic toxicity study is to determine the effects of
a substance in a mammalian species following prolonged and
repeated exposure. A chronic toxicity study should generate data to
identify chronic effects and define long-term dose-response
relationships. The dose period generally lasts a year or longer,
typically 12, 18, or 24 months, and observations will exclude
developmental and neuroloaical effects. See OPPTS 870.4100
Chronic Toxicity.
CHR-
Combined
Chronic Toxicity
/ Carcinogenicity
• 10 | 870.43 | Combined
Chronic Toxicity /
Carcinogenicity |
CHR_chr_canc
The objective of a combined chronic toxicity/carcinogenicity study is
to determine the effects of a substance in a mammalian species
following prolonged and repeated exposure. Following updates to the
870 Series Health Effects Guidelines in 1998, this combined study
was preferred to separate submissions of 870.4100 and 870.4200.
The design and conduct should allow for the detection of neoplastic
effects and a determination of the carcinogenic potential as well as
general toxicity. The dose period generally lasts a year or longer,
typically 12, 18, or 24 months, and observations will exclude
developmental and neuroloaical effects. See OPPTS 870.4300
Combined Chronic Toxicitv/Carcinoqenicitv.
DEV - Prenatal
Developmental
Toxicity Study
• 6 | 870.37 | Prenatal
Developmental Toxicity
Study | DEV_pren_dev
This guideline for developmental toxicity testing is designed to
provide general information concerning the effects of exposure of the
pregnant test animal on the developing organism; this may include
death, structural abnormalities, or altered growth and an assessment
of maternal effects. The dose period is usually gestational (in utero)
and the animal is sacrificed prior to delivery. See OPPTS 870.3700
Prenatal Developmental Toxicity Study
21
-------
MGR - Multi-
generational
reproductive
toxicity study
• 7 | 870.38 | Reproduction
and Fertility Effects |
MGR rep fert
• 13 | 13 | 870.38 |
Reproduction and Fertility
Effects |
MGR_rep_fert_pre98
Note: There are two guideline
profiles due to a 1998 guideline
change. The post-1998
guideline was likely used for
MGR studies that started in
1996.
This guideline for two-generation reproduction testing is designed to
provide general information concerning the effects of a test
substance on the integrity and performance of the male and female
reproductive systems, including gonadal function, the estrous cycle,
mating behavior, conception, gestation, parturition, lactation, and
weaning, and on the growth and development of the offspring. The
study may also provide information about the effects of the test
substance on neonatal morbidity, mortality, target organs in the
offspring, and preliminary data on prenatal and postnatal
developmental toxicity and serve as a guide for subsequent tests.
Additionally, since the study design includes in utero as well as
postnatal exposure, this study provides the opportunity to examine
the susceptibility of the immature/neonatal animal. The dose period
begins in adolescent F0 males and females and continues until the
terminal generation. Some of the litters deliver their pups, while
others mav be sacrificed Driorto deliverv. See OPPTS 870.3800
Reproduction and Fertility Effects.
REP - Fertility
(Segment 1)
• 5 | 870.355 |
Reproduction/Development
Toxicity Screening Test |
REP_rep_dev
This guideline is designed to generate limited information concerning
the effects of a test substance on male and female reproductive
performance such as gonadal function, mating behavior, conception,
development of the conceptus, and parturition. This screening test
guideline can be used to provide initial information on possible
effects on reproduction and/or development, either at an early stage
of assessing the toxicological properties of chemicals, or on
chemicals of high concerns focused on early postnatal evaluation,
with sacrifice of dams and offsprinq at postnatal dav 4. See OPPTS
870.3550 Reproduction and Fertility Effects.
REP - Peri- and
post-natal
toxicity study
(Segment III)
• 5 | 870.355 |
Reproduction/Development
Toxicity Screening Test |
REP_rep_dev
The study may provide information about the effects of the test
substance on neonatal morbidity, mortality, target organs in the
offspring, and preliminary data on prenatal and postnatal
developmental toxicity and serve as a guide for subsequent tests.
Additionally, since the study design includes in utero as well as
postnatal exposure, this study provides the opportunity to examine
the susceptibility of the immature/neonatal animal (F1 generation).
See OPPTS 870.3550 Reproduction and Fertility Effects.
REP-
Reproductive /
developmental
toxicity
screening test
• 5 | 870.355 |
Reproduction/Development
Toxicity Screening Test |
REP_rep_dev
This guideline is designed to generate limited information concerning
the effects of a test substance on male and female reproductive
performance such as gonadal function, mating behavior, conception,
development of the conceptus, and parturition. This screening test
guideline can be used to provide initial information on possible
effects on reproduction and/or development, either at an early stage
of assessing the toxicological properties of chemicals, or on
chemicals of hiqh concern. See OPPTS 870.3550 Reproduction and
Fertility Effects.
SAC - Sub-
acute dermal
toxicity
• 3 | 870.325 | 90-day Dermal
Toxicity | SUB_sub_derm
A 21/28 day repeated dose dermal study will provide information on
possible health hazards likely to arise from repeated dermal
exposure to a test substance for a period of 21/28 days. Dose period
is typically 21-28 days with dermal exposure route, and observations
will exclude developmental and neuroloqical effects. See OPPTS
870.3200 21/28-Dav Dermal Toxicity.
SAC - Sub-
acute repeat
dose toxicity
• 14 | 870.305 | 28-day Oral
Toxicity in Rodents |
S AC_o ra l_ro d e_2 8
• 15 || 14-day Toxicity in
Rodents | SAC_ntp
The objective of a sub-acute repeat dose toxicity study is to
determine the adverse effects of a substance in a mammalian
species occurring after short-term dosing duration. Determination of
acute toxicity is usually an initial step in the assessment and
evaluation of the toxic characteristics of a substance. Dose period is
typically 21-28 days with varied exposure routes, and observations
will exclude developmental and neurological effects. See
https://www.regulations.gov/document/EPA-HQ-OPPT-2009-0156-
0009
22
-------
SUB-
Subchronic
dermal toxicity
• 16 | | 13-Week Toxicity in
Rodents | SUB_ntp
• 3 | 870.325 | 90-day Dermal
Toxicity | SUB_sub_derm
The subchronic dermal study has been designed to permit the
determination of the no-observed-effect level (NOEL) and toxic
effects associated with continuous or repeated exposure to a test
substance for a period of 90 days. It can provide useful information
on the degree of percutaneous absorption, target organs, the
possibilities of accumulation, and can be of use in selecting dose
levels for chronic studies and for establishing safety criteria for
human exposure. The dose period is typically 90 days or 13 weeks,
but may be as long as 6 months, via dermal routes of exposure.
Observations will exclude developmental and neurological effects.
See OPPTS 870.3250 90-Dav Dermal Toxicity.
SUB-
Subchronic
inhalation
toxicity
• • 4 | 870.3465 | 90-Day
Inhalation Toxicity |
SUB_sub_inha
The subchronic inhalation study has been designed to permit the
determination of the no-observed effect-level (NOEL) and toxic
effects associated with continuous or repeated exposure to a test
substance for a period of 90 days. It will provide information on target
organs and the possibilities of accumulation, and can be used to
select concentration levels for chronic studies and establishing safety
criteria for human exposure. The dose period is typically 90 days
or13 weeks, but it may be as long as 6 months, via inhalation routes
of exposure. Observations will exclude developmental and
neuroloqical effects. See OPPTS 870.3465 90-Dav Inhalation
Toxicity.
SUB-
Subchronic oral
toxicity in
nonrodent
• 2 | 870.315 | 90-day Oral
Toxicity in Nonrodents |
SUB_oral_nonr
The subchronic oral study has been designed to permit the
determination of the no-observed-effect level (NOEL) and toxic
effects associated with continuous or repeated exposure to a test
substance for a period of 90 days. It provides information on target
organs, the possibilities of accumulation, and can be of use in
selecting dose levels for chronic studies and for establishing safety
criteria for human exposure. The dose period is typically 90 days or
13 weeks, but it may be as long as 6 months, via oral routes of
exposure in any nonrodent species. Observations will exclude
developmental and neuroloaical effects. See OPPTS 870.3150 90-
Dav Oral Toxicity in Nonrodents.
SUB-
Subchronic oral
toxicity in
rodents
• 1 | 870.31 | 90-day Oral
Toxicity in Rodents |
SUB_oral_rode
• 16 | | 13-Week Toxicity in
Rodents | SUB_ntp
The subchronic oral study has been designed to permit the
determination of the no-observed-effect level (NOEL) and toxic
effects associated with continuous or repeated exposure to a test
substance for a period of 90 days. It provides information on target
organs, the possibilities of accumulation, and can be of use in
selecting dose levels for chronic studies and for establishing safety
criteria for human exposure. The dose period is typically 90 days or
13 weeks, but may be as long as 6 months, via oral routes of
exposure in rodent species, typically rats and mice. Observations will
exclude developmental and neuroloaical effects. See OPPTS
870.3100 90-Dav Oral Toxicity in Rodents.
23
-------
Endpoint Terminology
ToxRefDB employs controlled terminology standardized to better reflect both the OCSPP
Health Effects 870 series guidelines and DER summary reporting. This hierarchical
relationship of effects and endpoints was adapted from the vocabulary developed for earlier
versions of ToxRefDB based on the data types curated. Novel values can be added when
found during a curation.
Figure 6: Hierarchical endpoint terminology example
Observation
endpoint category, endpoint type, and
endpoint target.
Observation
reproductive | reproductive performance |
postimplantation loss
Effect
specific condition associated endpoint
taraet
Effect Description
postimplantation loss
TreatmentGroup
Life Stage
Target
Effect Description Free
Effect
Site
Lifestage, location,
adult
uterus
postimplantation site
verbatim text
pregnancy
loss: mean
An example of the terminology hierarchy is demonstrated for an effect described as
"postimplantation loss". The finding is recorded as is in the "effect description free" field, which
is the verbatim wording used in the study report. The remaining fields are part of the ToxRefDB
controlled terminology. The endpoint category is reproductive, the endpoint type is
reproductive performance, the endpoint target is postimplantation loss, the effect description is
postimplantation loss, and the specific observation of "postimplantation loss" was made in the
adult pregnancy life-stage at the specific target site, the uterus.
Ontology mappings
It is increasingly apparent that many toxicology research questions will require the integration
of public data resources, both with those containing the same types of information, as well as
with other databases to connect different kinds of information. ToxRefDBv2.0 allows for
increased connections to other resources, which has greatly enhanced its quantitative and
qualitative utility for predictive toxicology.
For example, efforts linking in vitro effects in ToxCast to in vivo outcomes using predictive
models may help to identify rapid, more efficient chemical screening alternatives. To connect
the ToxRefDB endpoint and effect terminology with other resources, the ToxRefDB
terminology was standardized and cross-referenced to the United Medical Language System
(UMLS). UMLS cross-references enable mapping of />? vivo pathological effects from
ToxRefDB to PubMed (via Medical Subject Headings or MeSH terms), which may be relevant
for toxicological research and systematic review. This enables linkage to any resource that is
also connected to PubMed or indexed with MeSH.
24
-------
Figure 7: Cross-referenced Terminology Sources
Over 1,800 UMLS concept codes were mapped to endpoints and effects in ToxRefDB via a
manual process. Only 500 of those concept codes are a part of the CDISC-SEND terminology.
All of the concept codes are a part of vocabularies within both National Cancer Institute
Thesaurus (NCIt) as well as UMLS.
(NCIt)
Additionally, the Entity MeSH Co-
occurrence Network (EMCON)
consists of ranked lists of genes for a
given topic. This resource can be
used to identify genes related
to adverse effects observed in
ToxRefDB Subsequently,
ToxCast can be integrated since the
intended targets are mapped to
Entrez gene IDs.
The result of updating the ToxRefDB
terminology and linking to the UMLS
concepts is that
ToxRefDB may be used to better
anchor or compare to new approach
method (NAM) information, including
data from ToxCast or structure-
activity relationship models, as well
as other in vivo databases of
toxicological information, such as eChemPortal, and e-TOX. Integration of these data
resources is a major hurdle toward to evaluating the reproducibility and biological meaning of
both traditional, legacy toxicity information and the data from NAMs.
Additional work may be performed to link to other ontologies and to assist stakeholders in
mapping their ontologies to the ToxRefDB and UMLS ontologies.
25
-------
Negative Endpoints and Effects
As part of the v2.0 update to ToxRefDB, negative endpoints and effects can be inferred from
guideline profiles and the testing and reporting statuses of endpoints. Given the list of all
observations required for the relevant guideline profile, the curator indicates which endpoints
were missing (meaning not tested) or negative (meaning tested with no effect observed) by
setting tested and reported status accordingly. Endpoint observation status enables automated
distinction of true negatives and a better understanding of false negative effects. Users can
access the current inferred negatives and calculate inferences for a specific subset.
The MySQL database has inferred study-level negative effects and negative endpoints
available in two tables: "negative_effect" and "negative_endpoint". These tables were created
from stored procedures (repopulate_negative_effect and repopulate_negative_endpoint) that
are also available with the full MySQL database. The logic for the stored procedures follows
the inference workflow seen in Figure 6. Endpoint Observation Status distinguishes negative
and missing (not tested) effects based on the study's specific guideline requirements. An effect
is negative if the study has gone through the data extraction process, the effect was tested
(regardless of being reported), and no effect was seen in the study. An endpoint is negative for
a study if all effects for that endpoint are also negative in the study.
Table 5: Endpoint Observation Status
Tested
Status
Reported
Status
Assumption
Yes
Yes
The text of the study document explicitly stated the endpoint was
measured, or data was presented in tables for the endpoint. This is the
combination if required by the guideline for that study type and data is
provided within the document, even the effects measured were not
significant.
No
Yes
This is the combination if the study document explicitly states the endpoint
was not measured or data was not collected, even though the endpoint
was required by the study guidelines.
Yes
No
The text of the study document does not state the endpoint was measured
and data for the endpoint is not present. However, other evidence
suggests that the endpoint was measured. This is the default for endpoints
required by the study guideline and should only be changed in the face of
direct evidence from the document.
No
No
Within the long table of observations from all study guidelines, this is the
default setting for the endpoints not required by the alternative study
guidelines and they should not be changed. Interpret these observations
as irrelevant since they are not serving the selected guideline, therefore
not required to be tested nor reported.
26
-------
Figure 8: Decision tree for identification of negative endpoints and effects
Negative endpoints and effects can only be identified in studies that have gone through data
extraction and any subsequent QA processes because this ensures confidence in decisions
made about the adherence and/or deviations from the corresponding guideline profiles. We
can infer negatives based on whether or not an endpoint was tested and no treatment group-
related effects were seen. The example below shows how reported results are intrepreted
given the study's guideline profile.
Figure 9: Example Observation Status interpretation
X Yolum
X?k
X protein
(Iumm
x" fettOMS
X bilirubin
X specific
gravity
X occult blood
X urobilinogen
X t^petruc*
o«»ol«iltjrR1.T 0
»lcro«coplc
oat loo of «edl- |R1.T 1
nenta
3. Secroger Croee lea lone ver« not«d. For organs with histopath. R= 1, T= 1
«. «eli;hed ommat
X Liver X Spleen X 3r*ln
X Kl&ney* H«»rt X Teste*
X T.ungs X Thyroid X Mrenal»
X Ovsrles (with pa-
rathyroids)
rTjTI
4)
Groaa Necropsy:
Animals which died or were sacrificed In moribund
condition prior to end of exposure period and were
subjected to complete gross pathological examinations:
all animals were necropsied on the day the event occurred.
Uterine weight, pregnancy status and uterine contents
were recorded.
Animals sacrificed at the end of the treatment/observation
period which were subjected to complete gross pathological
examinations: All sacrificed by l.v. infection of T-61
euthanasia solution on day 29 of presumed gestation.
Thoracic and abdominal cavities examined for gross
lesions.
RO; T1 for the "Required by
Guideline" organs
- UNLESS there are results on gross
pathology for a given organ, which
would make that organ Rl; T1
27
-------
Ongoing Work
Moving forward, an application-driven workflow with the Data Collection Tool (DCT) will be
utilized to create a more sustainable process for loading curated information to a database.
The technical requirements of this application are that it:
• Replicate extraction of all of the data fields from ToxRefDB's legacy AccessDB curation
system;
• Include a "wizard" to walk the data curator through entry of study meta-data, chemical
composition information, dose information, dose-treatment group information,
quantitative data extraction for dose-treatment groups, and evaluation of the endpoint
observation status according to guideline specification;
• Offer flexibility for curating the heterogeneous and complex in vivo study designs via a
modular workflow;
• Continue to implement and improve controlled vocabularies for experimental design
elements as well as endpoint and effect language;
• Provide document allocation, curation and workflow management among users (internal
and external) with manager review and data conflict resolution for data provenance and
progress tracking;
• Link a quality-controlled curation to Clowder source documents; and
• Create a sustainable pipeline for data integration.
There are several critical advantages inherent in the success of this application. Automating
the data extraction creates a new more systematic and sustainable workflow. Following data
curation, ETL could be managed using Pentaho for direct loading to a database. Overall, this
effort would allow for the continued expansion of the ToxRefDB resource by providing a more
efficient process for curation of study information.
Following conclusion of the initial development phase, curation of developmental toxicity (DEV)
data evaluation records (DERs) from recent pesticide submissions were the selected focus for
Phase I DCT extraction. Future curation efforts were prioritized from the DER documents from
an initial web scrape of all documents that were published since 2008, adhering to existing
guideline profiles, and not currently captured in ToxRefDB. Additional extraction may include
studies previously extracted using Excel and AccessDB files followed by comparison of the
results to look for accuracy, as well as new study types following the generation of new
guideline profiles and vocabularies. Feedback from data curators will help inform further
development enhancements.
Future versions with expanded chemical and study data collected via its new application-driven
curation workflow (DCT) and the creation of a ToxRefDB dashboard will increase ToxRefDB's
utility. Standardization efforts will continue to provide more detailed effect and study-level
information and will allow for more streamlined interoperable database efforts.
Without a user interface, ToxRefDB information is only accessible from the MySQL database
download or via ToxValDB hazard summary section. Complete ToxRefDB information will
soon be integrated into the CompTox Chemicals Dashboard and available via batch search
functionality.
28
-------
Data Dictionary
A data dictionary is found in the database in the toxrefdb_dd table.
ToxRefDB Table
Field
Field Description
chemical
chemicaljd
PK: Autoincremented unique identifier for a
chemical
dsstox_substance_id
Unique identifier from DSSTox
casrn
CAS Registry Number
preferred_name
Preferred name of the chemical substance tested
in the study.
dose
dosejd
PK: Autoincremented unique identifier for a dose
studyjd
FK: A unique numeric identifier for each study in
the database.
cone
Concentration of a test chemical, typically
reported in ppm within the exposure matrix (e.g.,
feed or water).
conc_unit
Unit associated with a concentration of a test
chemical, typically reported as ppm.
dose_comment
This field can be used to explain any differences
in dosing over the dosing interval or provide
clarifying comments on how the dose was
administered. Specific concentrations of the
vehicle should be listed here when relevant. For
example, if methylcellulose was used as a
vehicle, the concentration of methylcellulose may
be included in the comment field (e.g., 0.5% w/v
aqueous methylcellulose).
dosejevel
Numeric rank indicating the level of dose
administered to test animals, with lower dose
levels indicating lower concentrations of a
chemical (e.g., 0 = vehicle, 1 = lowest dose, etc.).
The dose level for some studies may be
staggered since concentrations may vary by sex
(e.g., male treatment group: 0 = vehicle, 1 =
lowest dose, 3 = second lowest dose, etc.).
vehicle
The media used in administration of chemical
dtg
dtg_id
PK: Autoincremented unique identifier for a
dosed-treatment group
dose_id
FK: A unique numeric identifier for each dose in
the database.
tg_id
FK: A unique numeric identifier for each
treatment group in the database.
dose_adj usted
The amount of the chemical administered in
mg/kg of body weight/day (mg/kg/day). This
value is typically different between male and
female groups receiving the same dose
concentration (cone) due to differences in
bodyweight. If dose_adjusted values were not
29
-------
provided in a study, then they were calculated
using species scaling factors (FAO/WHO, 2000).
dose_adjusted_unit
Unit associated with the adjusted dose of a
chemical, typically reported in mg/kg/day.
dtg_comment
NULL if no additional comment needed; explains
any difference in the dose-treatment-group over
the course of the study (i.e., interim sacrifice or
changes due to toxicity and/or morbidity); quality
assurance (QA) flags indicate discrepancies
between the reported and correct values for the
study; differences in any dose_adjusted
calculations are provided.
mg_kg_day_value
The mg/kg/day species-specific, converted value
from ppm concentration
dtg_effect
dtg_effect_id
PK: Autoincremented unique identifier for a
dosed-treatment group effect
dtg_id
FK: A unique numeric identifier for each dosed
treatment group in the database.
tg_effect_id
FK: A unique numeric identifier for each
treatment group effect in the database.
critical_effect
Binary description (0,1) for an effect by dose
treatment group. "1" corresponds to a toxic or
adverse effect denoted in the study summary or
via expert judgement using a weight-of-evidence
approach. "0" indicates that although an effect is
produced at this level, it is not considered
adverse, nor immediate precursors to specific
adverse effects. If there are several critical
effects, the no observed adverse effect level
(NOAEL) is determined from the highest dose
level without critical effects. The lowest dose
level at which the critical effect was observed in a
study is the lowest observed adverse effect level
(LOAEL.)
dtg_effect_comment
NULL if no additional comment needed; provides
additional explanation of the dose-treatment-
group-effect row in the table, including statistical
significance.
effect_val
Numeric value of a measured effect, can be
continuous or dichotomous (incidence) data.
effect_val_unit
Unit associated with the effect value.
effect_var
Measurement of the variance for a set of data
associated with a measured effect, generally
reported as the standard deviation (SD) or
standard error (SE).
effect_var_type
Name of the variance metric used to determine
the effect variance, typically the standard
deviation (SD) or standard error (SE). Other
effect_var types include: interquartile range, 95%
confidence limit, and none.
sample_size
Number of animals used for an examination for a
particular effect.
30
-------
time
Numeric value associated with the duration of the
exposure at which a particular effect was
measured or observed, typically reported in
hours, days, weeks, or months.
treatment_related
Binary description (0,1) for an effect by dose
treatment group. "1" indicates there was a
statistically significant difference from the control
group for the effect; "0" indicates there was no
difference from control group. The highest dose
level at which no significant observable adverse
effects were observed corresponds to the no
effect level (NEL). The lowest effect level (LEL)
can be inferred by treatment related effects.
effect
effectjd
PK: Autoincremented unique identifier for an
effect
endpointjd
FK: A unique numeric identifier for each endpoint
in the database.
effect_desc
More specific description for an effect than
endpoint_category, usually detailing a specific
condition associated with an endpoint_target
(e.g. dysplasia, atrophy, necrosis, etc.).
effect_profile
effect_profile_id
PK: Autoincremented unique identifier for an
effect profile
effect_profile_description
Description of the effect profile
effect_profile_name
Name of the effect profile
effect_p rof i 1 e_g ro u p
effect_profile_id
FK: A unique numeric identifier for each effect
profile in the database.
groupjd
Unique identifier for a group
group_description
The description of a group
group_name
The name of a group
endpoint
endpointjd
PK: Autoincremented unique identifier for an
endpoint
endpoint_category
The broadest descriptive term for an endpoint.
Possible endpoint categories include: systemic,
developmental, reproductive, and cholinesterase.
endpoint_target
Describes more specific information than
endpoint_type, indicating where/how the sample
was collected to supply data for a particular
endpoint. Typically describes an organ/tissue or
metabolite/protein measured.
endpoint_type
The subcategory for endpoint_category, which is
more descriptive for a particular endpoint (e.g.
pathology gross, clinical chemistry, reproductive
performance, etc.)
guideline
guideline_id
PK: Autoincremented unique identifier for a
guideline
description
Information pertinent to a study guideline. For
example, MGR studies conducted post-1998
required the testing of developmental landmarks,
which is notable for observation status.
guideline_number
Number associated with the particular guideline,
that a study adheres to or most closely adheres
31
-------
to. OPPTS/OCSPP guideline numbers are
differentiated by the distinct number proceeding
870, as dictated by the Office of Chemical Safety
and Pollution Prevention (OCSPP)
name
Name of the particular Office of Chemical Safety
and Pollution Prevention (OCSPP) guideline that
a study adheres to or most closely adheres to.
profile_name
Abbreviated name of the particular Office of
Chemical Safety and Pollution Prevention
(OCSPP) guideline that a study adheres to or
most closely adheres to. See abbreviations
section for profile name list.
guideline_profile
guideline_profile_id
PK: Autoincremented unique identifier for a
guideline profile
endpointjd
FK: A unique numeric identifier for each endpoint
in the database.
guideline_id
FK: A unique numeric identifier for each guideline
in the database.
description
Provides a description of the rationale for an
endpoint observation status.
obs_status
Indicates whether or not an endpoint is required
to be tested according to the particular guideline
a study adheres to. The observation status for an
endpoint can be required, not required, or
triggered.
obs
status
The status regarding whether or not an endpoint
was tested and reported in a study. Assumes that
an endpoint was tested if the guideline the study
adheres to requires that endpoint to be tested.
default
An endpoint is considered tested and reported if
the endpoint appears in the text of the study
source indicating that data was collected. If an
endpoint is required to be tested by the guideline,
tested and reported are the defaults.
tested_status
Indicates if an endpoint was tested (1) or not
tested (0). If an endpoint was tested, it was
examined or measured.
reported_status
Indicates if an endpoint was reported (1) or not
reported (0). If an endpoint was reported, it
appears somewhere in the text of the report.
ontology
ontology_id
PK: Autoincremented unique identifier for an
ontology class
description
The associated description for the identifier
label
The associated label for the identifier
uid
Unique identifier from respective terminology
resource
uid_type
Type of identifier
uri
Uniform resource identifier
ontology_toxrefdb
ontology_toxrefdb_id
PK: Autoincremented unique identifier for an
ontology class associated with a concept in
ToxRefDB
32
-------
ontologyjd
FK: A unique numeric identifier for each ontology
class in the database.
toxrefdb_table
The associated table in ToxRef
toxrefdb_field
The associated field from toxrefdb_table linked to
a term
toxrefdb_id
Primary key from associated toxrefdb_table
pod
pod_id
PK: Autoincremented unique identifier for a point
of departure or associated effect level
chemicaljd
FK: A unique numeric identifier for each chemical
in the database.
effect_profile_id
FK: A unique numeric identifier for each effect
profile in the database.
groupjd
FK: A unique numeric identifier for each effect
profile group in the database.
study_id
FK: A unique numeric identifier for each study in
the database.
dose_level
Dose level at which the POD was seen
max_dose_level
Maximum dose level tested with relation to where
the POD was captured
mg_kg_day_value
Converted mg/kg/day value
qualifier
A
A
ii
V
V
ii
ii
pod_type
LEL, NEL, LOAEL, or NOAEL
pod_value
Value of the POD or associated effect level
pod_unit
Corresponding unit of the POD or associated
effect level
pod_tg_effect
pod_tg_effect_id
PK: Autoincremented unique identifier for a POD
associated with a treatment group effect
pod_id
FK: A unique numeric identifier for each POD or
associated effect level in the database.
tg_effect_id
FK: A unique numeric identifier for each
treatment group effect in the database.
study
study_id
PK: Autoincremented unique identifier for a study
chemicaljd
FK: A unique numeric identifier for each chemical
in the database.
guidelinejd
FK: A unique numeric identifier for each guideline
in the database.
admin_method
Describes specifically how the chemicals were
administered via the route (e.g., capsule, diet,
gavage, topical, etc.)
dose_end
Time during an animal's life that the
administration of a test substance stopped.
dose_end_unit
Unit of time associated with the end of the dose
(dose end).
dose_start
Time during an animal's life that the
administration of a test substance began.
dose_start_unit
Unit of time associated with the start of the dose
(dose_start).
species
Species of the animal test subject used in a
study.
strain
Intraspecific description of group of animals used
in a study; generally, a stock of animals that
33
-------
share a uniform morphological or physiological
character, or group that is genetically uniform.
strain_group
Descriptive category for a group of test animals
that is more general than the strain.
study_comment
Pertinent information the curator deemed helpful
to be noted about the study in general, such as
poor document quality (e.g., poor scan), missing
pages, etc.
study_type
Classification to describe animal toxicity testing
that was conducted. ACU (acute): Dose period
typically a day or less. Excludes developmental
and neurological studies.; SAC (subacute): Dose
period is typically 21-28 days. Excludes
developmental and neurological studies.; SUB
(subchronic): Dose period is typically 13 weeks,
but may be as long as 6 months. Excludes
developmental and neurological studies.; CHR
(chronic): Dose period is typically 12, 18, or 24
months (generally any dosing lasting a year or
longer). Excludes developmental and
neurological studies.; DEV (developmental):
Gestational (in utero) dose period. Sacrificed
prior to delivery.; MGR (multigenerational
reproductive): Dose period begins in adolescent
FO males and females and continues until
terminal generation. At least some of the litters
deliver their pups, some may be sacrificed prior
to delivery.; NEU (neurological): Study contains
functional observation battery or other battery of
behavioral testing that occurs during or after
dosing. Pathology has specific interest in the
brain (i.e. regions, morphology, biochemistry, et
cetera), excludes developmental studies; DNT
(developmental neurotoxicity): dose period
occurs anytime during development (i.e. in utero,
lactational, adolescent [after weaning, before
adulthood]). Study contains functional
observation battery or other battery of behavioral
testing that occurs during or after dosing, typically
during adulthood. Pathology has specific interest
in the brain (i.e. regions, morphology,
biochemistry, etc.)
study_type_guideline
Description that combines the study_type and
guideline name for a study.
substance_comment
Pertinent information regarding a substance's
origin (generally the manufacturer/importer that
produced the substance), purity, or other notable
information about the substance in general.
substance_lot_batch
Identifier specific to the origin of a batch of the
test substance used in a study.
substance_purity
Percentage of the administered solution that is
composed of the chemical to be tested after
dilution.
34
-------
substance_source_name
Name of the supplier that provided the chemical
substance for testing during the study.
tg
tgjd
PK: Autoincremented unique identifier for a
treatment group
study_id
FK: A unique numeric identifier for each study in
the database.
dose_duration
Amount of time a group is dosed. This varies
within studies depending on the dose period of a
particular treatment group.
dose_duration_unit
Unit of time associated with the dose duration.
Typically in days or months.
dose_period
Time point that best characterizes when the
treatment group was evaluated for effects.
Interim: Group sacrificed and examined within the
dosing period. Terminal: Group sacrificed and
examined at study completion and after the
dosing period. These animals are not mated.
Recovery: Group examined after a recovery
period that followed the dosing period at the
study end. Post first mating: Group examined
after first mating. Post second mating: Group
examined after second mating. Post third mating:
Group examined after third mating. Satellite:
Group of animals included in the design and
conduct of a toxicity study, treated, and housed
under conditions identical to those of the main
study animals, but used primarily for some
separate purpose to be defined as needed in the
Comment section. Other: Group of animals that
may have deviated from the full study design, to
be defined as needed in the Comment section.
generation
Generation of the test animal group. FO is the
default choice for animals exposed in non-
reproductive studies (chronic CHR, subchronic
SUB, subacute SAC), dams in reproductive DEV
studies, and the first-generation mating group for
multigenerational MGR studies. F1 is the second-
generation, born to FO. F2 is the third-generation,
born to F1. F3 is the fourth-generation, born to
F2. The fetal generation is the group produced by
FO matings in DEV studies, typically removed
from a female via cesarean section. Pups from
live births are not fetal.
sex
Sex of a test animal group. The gender of fetal
groups is denoted as MF for both males and
females.
tg_comment
NULL if no additional comment needed; contains
information that the extractor/curator found
helpful in describing issues related to a
treatment-group (e.g. animals dosed via capsule
so concentration not reported, added recovery
groups, etc.).
35
-------
tg_effect
tg_effect_id
PK: Autoincremented unique identifier for a
treatment group effect
effectjd
FK: A unique numeric identifier for each effect in
the database.
tg_id
FK: A unique numeric identifier for each
treatment group in the database.
direction
Description of the net change across all doses
that indicates whether the numerical data
increased, decreased, or stayed the same. This
can also be used to describe effects that did not
have numerical data, but were still described in
the study source.
effect_comment
NULL if no additional comment needed; contains
information that the extractor/curator found
helpful in describing issues related to a
treatment-group-effect (e.g. units not reported,
effect only reported for certain treatment groups,
etc.).
effect_desc_free
Brief verbatim text from study file that was
entered if the effect description differed from
predetermined endpoint terminology.
life_stage
Stage of life that a measurement was taken.
CHR, SUB, and SAC studies typically only have
adult for life_stage, whereas DEV and MGR
studies will always be characterized by multiple
life stages. The different life stages in the
database include: fetal, juvenile, adult, adult-
pregnancy and pregnancy.
target_site
A more specific description than effect_target.
Can describe a specific tissue within an organ,
type of cell, etc.
toxrtool
toxrtool_id
PK: Autoincremented unique identifier for a
toxrtool question
criteria
The ToxRTool comprises a list of evaluation
criteria to assess study reliability that are
subdivided into five groups: test substance
identification, test system characterization, study
design description, study results documentation,
and plausibility of study design and data.
question
Question used as part of the ToxRTool
evaluation criteria to assess study reliability.
question_number
Number indicating the question as part of the
ToxRTool evaluation criteria to assess study
reliability.
study_toxrtool
study_toxrtool_id
PK: Autoincremented unique identifier for a
ToxRTool question associated with a study
toxrtool_id
FK: A unique numeric identifier for each
ToxRTool question in the database.
study_id
FK: A unique numeric identifier for each study in
the database.
score
The associated score for the ToxRTool question
toxrtool_comment
The corresponding comment further describing
the score
36
-------
37
-------
SEPA
United States
Environmental Protection
Agency
PRESORTED
STANDARD POSTAGE
& FEES PAID EPA
PERMIT NO. G-35
Office of Research and Development (8101R)
Washington, DC 20460
Official Business
Penalty for Private Use
$300
------- |