EPA/600/R-21/232
Sequence Alignment to Predict Across
Species Susceptibility
(SeqAPASS)
VERSION 6.0
User Guide

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) User Guide
Quick Notes: Use Chrome for optimal performance and PLEASE DO NOT submit more than 10 Level 1
queries at a time. Wait until jobs run to completion prior to submitting more.
Table of Contents
Background	page 2
Accessing SeqAPASS	page 3-4
Returning Users (page 3)
First Time Users (page 4)
Messages from the SeqAPASS Development Team	page 4
SeqAPASS Home Tab	page 5
Request SeqAPASS Run Tab	page 5-11
Identify a Protein Target (page 6)
Oner}' "By Species " (page 7)
Onery "ByAccession" (page 10)
SeqAPASS Run Status	page 12-13
View SeqAPASS Reports	page 14-19
View Report (page 15)
Save Report(s) (page 15)
Level 1: Primary Amino Acid Sequence Alignment	page 20-26
Primary Report Settings (page 22)
Susceptibility Cutoff Box for Level 1	page 26-29
No Orthologs Detected (page 28)
ECOTOX Widget	page 30-31
Level 2: Functional Domain(s) Alignment	page 31-33
View Level 2 Data Page	page 33-38
Primary Report Settings (page 36)
Susceptibility Cutoff Box for Level 2	page 39-42
No Orthologs Detected (page 41)
Level 1 and Level 2 Data Visualization	page 42-51
Level 1 and 2 Information Page (page 44)
Level 1 and 2 BoxPlot Page - Controls (page 45)
Level 3: Individual Amino Acid Residue Alignment	page 52-61
View Level 3 Individual Amino Acid Query and Data Page	page 62-67
Level 3 Data - Primary Report (page 65)
Level 3 Data - Full Report (page 66)
Heat Map	page 68-72
Decision Summary Report	page 72-75
Download DS Report as PDF	page 75-78
Moving Between Level 1, Level 2, and Level 3 Data Pages	page 78-79
Search, View, and Download Data Tables	page 79-80
Log out	page 80
Pop-up Messages	page 80-83
SeqAPASS Documentation	page 83-92
1

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Background
The SeqAPASS tool has been developed to predict across species relative intrinsic susceptibility
to chemicals with known molecular targets (e.g., pharmaceuticals, pesticides) as well as evaluate
conservation of molecular targets from high-throughput screening assays (i.e., U.S. Environmental
Protection Agency ToxCast Program) and molecular initiating events (MIEs) and early key events in the
adverse outcome pathway framework, as a means to extrapolate such knowledge across species. The term
"relative" is used because it is recognized that molecular target similarity is one consideration, though an
important one, for making predictions of susceptibility to a chemical. Other important considerations for
susceptibility that are not evaluated using the SeqAPASS methodology include how well a chemical is
absorbed, distributed, metabolized, and eliminated, life stage, and other life history traits. Also, "relative"
indicates that the determination of sequence similarity between proteins is based on comparison to a
single protein sequence for a specific species. Additionally, we describe "intrinsic susceptibility" as the
vulnerability (or lack thereof) of an organism to chemical perturbation due to its inherent biological
composition.
Cross-species comparisons of proteins can be conducted through examination of sequence and
structural information, depending on how well the protein has been characterized and what is known
about a chemical-protein interaction. SeqAPASS allows the user to assess various levels of protein
sequence detail across species including comparisons of primary amino acid sequence (including ortholog
detection), functional domain(s), and individual amino acid residue positions. Each level requires a
greater understanding of the protein and its interaction with a chemical of interest (or similar ligand).
Because human and veterinary drugs, as well as pesticides, are designed to act specifically on well
characterized molecular targets, these chemical classes have proven useful for demonstrating the utility of
the SeqAPASS tool and its application to various hazard assessment/research scenarios.
The pertinent information necessary to begin a SeqAPASS query includes: the identification of a single
(or multiple) query species and a query protein, which would be the molecular target(s) of interest (e.g.,
receptor or enzyme).
The SeqAPASS algorithms mine, collect, and collate information from the National Center for
Biotechnology Information (NCBI) protein database (http://www .ncbi.nlm .nih. gov/protein/). conserved
domains database (http://www.ncbi.nlm.nih.gov/cdd/). taxonomy database
(http://www.ncbi.nlm.nih.gov/taxonomv/). strategically utilizes the Stand-Alone Basic Local Alignment
Search Tool for proteins (BLASTp)
(http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE TYPE=BlastDocs&DOC TYPE=Download
and the Constraint-based Multiple Alignment Tool (COBALT)
(http://www.st-va.ncbi.nlm.nih.gov/tools/cobalt/re cobalt.cgi).
2

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Accessing SeqAPASS
For optimal SeqAPASS performance use Chrome
Access SeqAPASS using the following URL: https://www.seqapass.epa.gov/seqapass/
Returning Users
Click "Login""
New to SeqAPASS Version 6 (See user guide for more details)
• A widget has been developed to rapidly connect SeqAPASS sequence-based predictions of chemical susceptibility to existing curated empirical toxicity data for terrestrial and aquatic species. The
ECOTOXlcotogy Knowledgebase (hitps •¦'cfpub.epa.cov-'ecoiox') is a publicly available resource for single chemical environmental toxicity data on aquatic life, terrestrial plants and wildlife. Therefore, an
ECOTOX Widget is now available within SeqAPASS for users to readity select species from SeqAPASS output on the Level 1 results page and chemlcal(s) of interest to pass to the EC0T0X Knowledgebase
Explore feature and identify relevant toxicity data.
Log In to SeqAPASS	Version 6.0
Welcome to SeqAPASS
0
Login

For optimal SeqAPASS performance use Chrome 0

Want an account? Click here for instructions.

About SeqAPASS
Select either "Login with EPA LAN User ID & Password" or "Login with Single Sign-On"'.
EPA Enterprise Authentication
Login with ...
User ID & Password
Single Sign-On
Login with User
ID & Password
Login with Single Sign-On
3

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
First time users
To request a usemame and password to access the SeqAPASS tool, select "here" below the login and
follow the directions 011 the next page. The directions are different for the internal EPA user versus the
external non-EPA user, however the user type does not limit access to the tool. Everyone that requests an
account will be given one in a timely manner. Individual account allows users to store all previous
SeqAPASS runs. Once the user has obtained their usemame, external users will select "Login w ith EPA
LAN User ID and Password."
EPA Users
1.	Go to https://waa.epa.aov and login with your existing EPA LAN id and password.
2.	Under the "Community Access" menu, select "Request Web Community Access"
3.	Select the "SeqAPASS Users" community and click submit.
4.	Return to the SeqAPASS login page to access SeqAPASS
External Users
1.	Go to https://waa.epa.gov and click on the "Self Register" link.
2.	Fill out the form using the following EPA Contact information:
o EPA Contact Name - Carlie Lalone
o EPA Contact's Email Address - lalone carlie@10pa.gov
o EPA Contact's Phone Number 218 529-5038
3.	Select the "SeqAPASS Users" community from the dropdown menu at the bottom of the page.
4.	Once you submit the form you will receive an email confirming your request and a follow-up email with your usemame once
your account has been activated.
On the Log in screen the user will provide the necessary Login information:
EPA User: EPA LAN User ID & Password or Login with Windows Single Sign-On
External User: Username and Password
Upon creating your password, login to SeqAPASS as described above for Returning Users. To change a
password at any time, go to waa.epa.gov and select "User Profile" to reset. The user will then use the new
password to login.
Messages from the SeqAPASS development team
Look for messages about planned version releases, data updates, and/or fixes to the SeqAPASS tool.
These will occasionally be displayed below the SeqAPASS banner when the development team has
information to share with SeqAPASS users.
New to SeqAPASS Version 6 (See user guide for more details)
• A widget has been developed to rapidly connect SeqAPASS sequence-based predictions of chemical susceptibility to existing curated empirical toxicity data for terrestrial and aquatic species. The
ECOTOXicology Knowledgebase (hrjs: cfDub.epa.gov/ecotox/) is a publicly available resource for single chemical environmental toxicity data on aquatic life, terrestrial plants and wildlife. Therefore, an
ECOTOX Widget is now available within SeqAPASS"for users to readily select species from SeqAPASS output on the Level 1 results page and chemical(s) of interest to pass to the ECOTOX Knowledgebase
Explore feature and identify relevant toxicity data.
Log In to SeqAPASS	Version 6.0
4

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
SeqAPASS Home Tab
The "Home"' tab indicates who is logged in to the tool (right-hand of the screen) and contains links to
obtain information about the SeqAPASS tool (About SeqAPASS), including contact information for
support and references to published articles describing the SeqAPASS tool and its applications. Other
relevant references to databases and tools are also referenced. A link to the SeqAPASS User Guide can
also be found on this page. To Submit a Comment/Question click on the "Submit Comment/Question"
link to email the developer. "Log out" icon in upper right-hand corner of screen can be clicked at any time
to log out. '"Information' buttons are present throughout SeqAPASS to give the user additional
information or instruction regarding features and functionality of the tool. "Exit" buttons are also present
by each external (non-EPA) link that takes the user to a page NOT maintained by the EPA.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Log out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports
Settings

Welcome to SeqAPASS Version 6.0

Logged in as: Donovan Blatz

SeqAPASS Home
About SeaAPASS


SeqAPASS User Guide I exit


Submit Comment/Question or Report a Problem ©



Request SeqAPASS Run Tab
Clicking the "Request SeqAPASS Run" tab opens a page to enter the query information necessary for a
SeqAPASS run. Each section of the "Request SeqAPASS Run" will be described below:
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Loaout
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

Request Level 1 SeqAPASS Run

Version 6.0

Logged in as: Donovan Blatz
5

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Identify a Protein Target
SeqAPASS is designed to predict cross species chemical susceptibility. Protein targets are often decided
based on chemical, adverse outcome pathway (AOP), or high-throughput screening (HTS) assay target.
Resources have been provided, as links, to aid the user in searching for appropriate protein targets and can
be accessed by selecting the drop-downs found in the "Identify a Protein Target" box.
Identify a Protein Target
SeqAPASS is designed to predict cross species chemical susceptibility based on a protein molecular target. The following resources have been identified to guide the user to an
appropriate protein target based on the chemical, adverse outcome pathway (AOP), or high-throughput screening (HTS) assay target of interest. Click the help buttons below for
descriptions of how to find relevant protein target information from these resources.
All links will open in a new tab.
The following links exit the site i EXIT
Pharmaceutical protein targets:
https://www.druobank.ca
http://sitem.herts.ac-ukfeeru/vsdb/indexhtm
httD://bidd.nus.edu.SQ/aroup/cjttd/TTD_HOME.asp
» Pesticides and other chemical protein targets:
http://www.t3db.ca
AOP chemical intiators:
httDs://aoDwiki.ora
*• ToxCast HTS results by chemical:
https://comptox.epaaov/dashboard
Select Search
There are two options for entering query information: "By Species" or "By Accession"' (See radio buttons
to the right of "Select Search''). Selecting "By Species" will allow the user to enter text and select from a
dropdown list of species and then select a protein from any sequence available for that species in the
NCBI protein database. Selecting "By Accession" allows the user to enter aNCBI protein accession.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Loo out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
Request Level 1 SeqAPASS Run Version 6.0
Logged in as: Donovan Blatz

Identify a Protein Target


Compare Primary Amino Acid Sequences
O
& By Species

Select Search:

By Accession

6

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Query "BySpecies"
Type the name of the query species of interest in the '"Query Species Search" text box. The species
common name, scientific name, or Taxid (ID number derived from the NCBI taxonomy database) may be
typed into the search bar. This is the species you would like to compare all other species to. The search
bar has an auto-complete function and will generate a list of species with corresponding Taxid. When text
is typed into the search bar, the auto-complete function queries the database in the order of "starts with"
then "contains/' If an integer is typed in the search bar the auto-complete function queries the database in
the order of "Taxid", "starts with'', then "contains."
Home Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports Settings

Request Level 1 SeqAPASS Run

Version 6.0
Logged in as: Donovan Blatz

Identify a Protein Target
Compare Primary Amino Acid Sequences

Query Species Selection
©
Query Species Search:
homo sap

||Add Query Species^





Query Species:
Homo sapiens Linnaeus. 1758 (Taxid:9606)



Homo sapiens neanderthalensis (Taxld:63221)



Homo sapiens neanderthalensis King, 1864 (Taxld:63221)



Homo sapiens ssp. 'Denisova' (Taxid:741158)



Homo sapiens ssp. Denisova (Taxld:741158) ,






Note: The user can also use the NCBI taxonomy database to identify query species using the NCBI link
on the right-hand side of the "Add Query Species" button.
Select species of interest by clicking on the name in the drop-down box. Once species is selected, click
"Add Query Species" button. This advances the species of interest to the "Query Species" box and fills
the "Query Proteins" box with all available protein sequences for that species from the NCBI protein
database (although the box only displays the initial 200 proteins/species based on lowest numerical
accession number). The protein list includes the protein NCBI accession, protein name, and species
scientific name.
Query Species Selection
Query Species Search:


Add Query Species
NCBI Taxonomy Database |EHj|

Query Species: j
iHomo sapiens (Taxid:9606)	



Query Protein Selection
O
Query Protein Search-


Filter Protein NCBI Database Ml
Query Proteins:
[NP 000005.2] alpha-2-macroglobulin isoform a precursor


fNP 000006.2) arylamine N-acetyltransferase 2


[NP 000007.1] medium-chain specific acyl-CoA dehydrogenase, mitochondrial isofori


[NP 000008.1] short-chain specific acyl-CoA dehydrogenase, mitochondrial isoform 1


[NP 000009 1] very long-chain specific acyl-CoA dehydrogenase, mitochondrial isofo „




7

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
To filter the query protein list, type the query protein name or partial name in the "Query Protein Search"
box and click the "Filter Protein" button. This action will filter the protein list in the "Query Proteins" box
to only display proteins that contain the user defined text (this search query does not contain an autofill
feature due to the filter feature). Proteins will be listed in alphabetical order based on NCBI accession
Example: typing "estrogen" retrieves all proteins that contain the word "estrogen" in the protein name
(the user can scroll to identify proteins of interest).
Query Protein Selection
Query Protein Search: estrogen|
_Filter Protein
NCBI Protein Database 1.1EXIT
Query Proteins:
[NP 000116.2] estrogen receptor isoform 1
[NP	001035055.1 ] G-protein coupled estrogen receptor 1
[NP_001035365.1] estrogen receptor beta isoform 2
[NP_001091671.1] G-protein coupled estrogen receptor 1
[NP_001116212.1] estrogen receptor isoform 1
Add Selected Protein(s)
Note: To explore details associated with a protein of interest, click the "NCBI Protein Database" link to
the right of the "Filter Protein" button to open NCBI proteins database (See SeqAPASS Documentation
section of user guide for details about searching for query proteins using NCBI database).
Highlight the protein or proteins of interest (Ctrl left click to select multiple proteins) in the "Query
Proteins" box and click "Add Selected Protein(s)" button. This moves the protein(s) of interest to the
"Final Query Protein(s)" box. To remove proteins from the "Final Query Protein(s)" box highlight those
to be removed and click the "Remove Selected Protein(s)" button. Select "Remove All Proteins" to
discard all proteins from "Final Query Protein(s)" box. The clear button removes all information
previously entered on the "Request SeqAPASS Run" page.
Query Protein Selection
Query Protein Search:
Query Proteins:
NCBI Protein
[NP_001258805.1] estrogen receptor beta isoform 5
[NP_001258806.1] estrogen receptor beta isoform 6
[NP_Q01278170.1] estrogen receptor isoform 3
[NP_001278641.1 J estrogen receptor beta isoform 2
Add Selected Prot
SeqAPASS Submission
Final Query Protein(s) [NP001258805.1] estrogen receptor beta isoform 5
[NP_001278159.1] estrogen receptor isoform 2
[NP_001278641.1] estrogen receptor beta isoform 2
Remove Selected Pre
Remove All Prote
8

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Once the user identifies the protein(s) to be queried, select "Request Run/' A message will briefly appear
in upper right-hand corner of the screen for 10 seconds to alert the user of the request status.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
j Success
NP 00123D447..1;



nzi:—i
submitted
Home
Request SeqAPASS Run
SeqAPASS Run Status View SeqAPASS Reports
bettings
Request Level 1 SeqAPASS Run
Version 6.0

» Success
np nni?3fM4ai-
Identify a Protein Target
Compare Primary Amino Acid Sequences
Select Search:
Q) By Species
By Accession
Multiple proteins can be added to the final list for multiple SeqAPASS runs. If another query species is
desired, return to "Query Species Search" to select the next species. Follow the process described above
for selecting the proteins associated with this species. The proteins populated in the "Query Proteins" box
will always be associated with the species highlighted in the "Query Species" box.
Note: In the current version of SeqAPASS, PLEASE do not request more than 10 query proteins at a
time to avoid longer wait times for the completion of a run.
Query Species Selection
Query Species Search:
Add Query Species NCBI Taxonomy Database jftST
Query Species: Homo sapiens (Taxid:9606)
Bos taurus (Tawd:9913)
Query Protein Selection
Query Protein Search:
Query Proteins: [NP 001001133.21 protein argonaute-3
[NP_001001134.1] solute carrier organic anion transporter family member 3A1
[NP_001001135.2] collagen alpha-l(ll) chain isoform 1 preproprotein
[NP_001001136.2] hepatoma-derived growth factor-like protein 1
[NP_001001137.1] UAP56-interacting factor
Add Selected Proteinic)
Note: A user may check the progress of the run by clicking on the "SeqAPASS Run Status" tab. (See
SeqAPASS Run Status section of the user guide for more information)
9

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Oner}' "ByAccession"
Users familiar with the NCBI database can utilize NCBI protein accessions (e.g., NP_000116.2) to query
the SeqAPASS tool. This is done by selecting the "By Accession" radio button to the right of the "Select
Search" text on the "Request SeqAPASS Run" page.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Log out
Home Request SeqAPASS Run
SeqAPASS Run Status View SeqAPASS Reports
Settings

Request Level 1 SeqAPASS Run
Version 6.0

Logged in as: Donovan Blatz
Identify a Protein Target
Compare Primary Amino Acid Sequences
£ By Species
Select Search:
By Accession
Upon selecting the "By Accession" radio button, a new query page will be displayed. Type the NCBI
protein accession (e.g., NP_000116.2) for the protein of interest (this Accession comes from the NCBI
protein database; See "SeqAPASS Documentation" for details) in the "NCBI Protein Accession" box. If
desired, more than one NCBI Accession may be entered into the "NCBI Protein Accession" box by
clicking the enter key after each additional NCBI Accession entry.
Upon clicking the "NCBI Protein Accession" text box, a pop-up message will appear in the middle of the
text box, to provide an example for the proper format of Accessions to be entered.
SeqAPASS Submission
NCBI Protein Accession:
Request Run Clear
NCBI Protein Database

Note: To avoid longer wait times for the completion of a run, in the current version of SeqAPASS, please
do not request more than 10 NCBI Accessions at a time.
10

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

Request Level 1 SeqAPASS Run

Version 6.0

Logged in as: Donovan Blatz
Identify a Protein Target
Compare Primary Amino Acid Sequences
Select Search:
*> By Accession
SeqAPASS Submission
NCBI Protein Database TfXIT
NCBI Protein Accession: NP 000116|
Request Run Clear
After the NCBI accession(s) of interest have been typed in the "NCBI Protein Accession" box, click the
"Request Run" button. To remove proteins from the "NCBI Protein Accession" box click the "Clear"
button. A message will briefly appear in the upper right-hand corner of the screen to alert the user of their
run request status.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
j Success
NP 001230447.1:.
Home
Request SeqAPASS Run
SeqAPASS Run Status View SeqAPASS Reports
Settings
|
Request Level 1 SeqAPASS Run
Version 6.0
Identify a Protein Target
Compare Primary Amino Acid Sequences
Select Search:
) By Species
^ By Accession
Note: All NCBI Accessions can include the version number (one digit after the decimal place, e.g.,
NPOOO 116.2). Otherwise, if the version is not included, the most recent version of the accession will be
queried automatically .
11

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
SeqAPASS Run Status
Level 1 SeqAPASS (primary amino acid sequence comparisons) status is displayed as the default. The
Accession in the column "'Level 1 Query Accession" is that selected and queried by the user. For a query
to finish it must display "complete"' in the BLASTp column, 100% in the "Common Domains'' column,
and 100% in the "Ortholog Candidate" column. The "Common Domains" column displays the %
completion for running Reverse Position Specific (RPS)-BLAST (Default E-value of <0.01) on the
Accessions from the Level 1 Full Report. RPS-BLAST, and therefore "Common Domains" status, will
take the longest to complete. The "Ortholog Candidate" column displays the % completion for running a
reciprocal best hit BLAST evaluation for each hit sequence. The status for the "BLASTp" column is
described as "started," "'analyzing.' or "complete." If the user's successfully submitted query has entered
the run queue, the position of the submitted query in the queue will be indicated in the column (e.g ., 2nd in
queue). The "Common Domains" and "Ortholog Candidate" columns will also describe the position of
the user's submitted query in the run queue. Once the run has begun processing, the % completed for
RPS-BLAST or reciprocal best hit BLAST, respectively, will be displayed. Please see example below:
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Log out
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

SeqAPASS Run Status

Version 6.0

Logged In as: Donovan Blatz

$ Level 1 Status
Level 2 Status	Refresh Data
Level 3 Status
SeqAPASS Level 1 Run Status
Search: Enter keyword
SeqAPASS
Run Id »
Data Version
User i
Level 1 Query
Accession c
BLASTp C
Common
Domains t
Ortholog
Candidate :
Start Data 9
Date Completed S
SeqAPASS Run Du
2304
6
blalz donovantfgepa gov
NP 0012304481
complete
100%
100%
2021 08 2612 12 43
2021 08 26 12 32 42
19 minute(s) 59 se<
2304
6
blatz donovan@epa gov
NP_001230447 1
complete
100%
100%
2021 08 26 12 12.43
2021 08 26 12:32 41
19 mmute(s) 58 se»
2304
6
blatz donovan@epa gov
NP_001248338 1
complete
100%
100%
2021 0826 12 12 43
2021 08 26 12 37 25
24 minule(s) 42 se<
2303
6
blatz donovan@epa gov
XP_006582363 1
complete
100%
100%
2021 08 2511 06 53
2021 08 25 1120 21
13 minute(s) 28 se>
2302
S
blatz donovan@epa.gov
NP_001166434 t
complete
100%
100%
2021 08 24 08:53.06
2021 08 24 08 53:06
1 seconds
2301
6
lalone carlte@epa gov
BAE92310 1
complete
100%
100%
2021 08 240818 41
2021 0B 24 08 25 19
6 minute(s) 38 sec
2300
6
Transue Tom@epa gov
NP 001118204 1
complete
100%
100%
2021 0716 08 52 17
2021 07 16 09:05 45
13 mmute(s) 28 se.
2300 6
Transue.Tom@epa gov
A8P98939 1
complete
100%
100%
2021 07 16 08 52 17
202107 16 0901 14
8 mmute(s) 57 sec
2300
6
Transue Tom@epa gov
AAW21996 1
complete
100%
100%
2021 07 16 0852 17
2021 07 16 08 59 01
6 mmute(s) 44 sec
2299
6
Transue Tb(Ti@epa gov
AAW219961
complete
100%
100%
2021 07 16 08:51:51
2021 07 16 08 59 01
7 minute(s) 10 sec
(1 of 601)	1 2 3 4 5 6 7 8 9 10 *' * 10- Download Table: —*
The user can view the status of requested SeqAPASS runs. Each Run is assigned a unique "SeqAPASS
Run Id." A Run is considered a query that was requested either individually or as a batch in the "Request
SeqAPASS Run" tab. The user can view run start and end dates/times, and the duration of the run. (See
Search, View, and Download Data Tables section of user guide for more infonnation). The ""Data
Version" column indicates which version of NCBI data is being used (See "About" page for details on
Data Versions)
The user is also able to view the status of Level 2 (Functional domain(s)) and Level 3 (individual amino
acid residue alignments).
12

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View Level 2 Status by selecting the radio button. Also, while viewing the page, the user can click the
"Refresh Data" button to refresh the data. "Level 1 Query Accession"' column displays the NCBI
accession selected and queried by the user. Please see below:
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
iggoui
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Run Status Version 6.0
Logged In as: Donovan Blatz
Level 1 Status
$ Level 2 Status	. Refresh,|
Level 3 Status
SeqAPASS Level 2 Run Status
Search: t -iter keyword
SeqAPASS
Run Id -
Data Versioi
User ;
Level 1 Query
Accession }
NCBI Accession i
Domain Type 5
BLASTp S
Starr Date i
Data Completed S
SeqAPASS Run Duration i
6078
6
latone ca«1ie@epa gov
NPJJ001M.2
NP .000116 2
NR_LBD_£R
^c»nptet^j
2021 0713 16 33 43 |
202107131634 01
18 seconds
5039
5
TgxCast@epa gov
NPQ02829 3 I
NPJX)28?93
PTPc
complete
2021 03 1B 15 35 58
2(01031815 3804 1
3 mmute(s) 6 second(s)
5038
5 ToxCast@epagov
NP 0028293
NP 002829 3
FN3
complete
2021 03 18 15 35 57
2021031815 3902 |
3 mmutefs) 5 second(s)
5037
5
ToxCastgepa gov
NP_002244 1
NP_002244 1
lfl_2
complete
2021 03 18 15:36:55 ]
2021 03 18 15 39 01
3 mmute(s) 6 second(s)
5038
5 TaxCasigepa gov
NP0Q2244 1
NPJXC244 1
IGe2
complete
2021 03 18 15 35 54
2021 03 1815 3844
2 nwiute(s) 50 secondfs)
5035
5
ToxCast@cpa gov
NPJJ00921 1
NP.000921 1
Kitngle
complete
2021 03 18 15 35 53 |
20210318 15 38 33
2 mmule
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View SeqAPASS Reports Tab
The "View SeqAPASS Reports" tab provides a table of completed SeqAPASS runs. From this page the
user can choose to either '"View Report" or "Save Report(s)."
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Log out
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

SeqAPASS Reports

Version 6.0

Logged in as: Donovan Blatz
^Partial Protein Sequence
Request Selected Report
Refresh Available Reports
9 View Report


Q Save Report(s)


The completed runs, by default, are listed in the order in which they were completed, with the most recent
runs at the top. The table includes information for each run, such as SeqAPASS Run ID (unique for every
run regardless of if it is the same protein/species combination ran twice), Data Version, Ortholog Count
(number of orthologs detected from the aligned hit sequences in Level 1; see Detailed Documentation
page 79), NCBI Accession, Query Protein Name, taxonomy information for the query species, and the
date/time of run completion.
While viewing the page, the user can click the "Refresh Available Reports" button to refresh the table
with additional completed runs. Partial protein sequences are highlighted in yellow as illustrated in the
example below. (See Search, View, and Download Data Tables section of user guide for more
information).
Home Request SeqAPASS Run
SeqAPASS Run Status View SeqAPASS Reports Settings

SeqAPASS Reports
Version 6.0
Logged in as: Donovan Blatz

PjPartial Protein Sequence
Refresh Ayailable.Reports
# View Report

Q Save Report(s)

Available Reports
Search: Enter keyword

SeqAPASS
Run Id *
Data Version
Ortholog Count
Level 1 Query
Accession i
Query Protein Name v
Taxonomy ID c 0u,r» S'l*ci's Na™ 5
Query Common Name 0


2304
6
545
NP_001248338.1
breast cancer anti-estrogen resistance protein 3 isoform 1
9606
Homo sapiens
Human


2304
6
192
NP_001230448.1
estrogen-related receptor gamma isoform 2
9606
Homo sapiens
Human


2304
6
11
NP_001230447.1
estrogen-related receptor gamma isoform 6
9606
Homo sapiens
Human


2303
6
20
XP_006562363.1
cytochrome P450 9e2
7460
Apis mellifera
Honey bee


2302
6
827
NP_001166434 1
5-hydroxytryptamine receptor 4
10141
Cavia porcellus
Domestic guinea pig


2301
6
709
BAE92310 1
putative odorant receptor
8020
Oncorhynchus masou
Cherry salmon


2300
6
350
NP_001118204.1
sodium channel protein type 4 subunit alpha A
8022
Oncorhynchus mykiss
Rainbow trout


2300
6
0
ABP98939 1
voltage-gated sodium channel alpha type V, partial
8022
Oncorhynchus mykiss
Rainbow trout


2300
6
3
AAW21996 1
GABA (A) receptor associated protein
7159
Aedes aegypti
Yellow fever mosquito


2297
6
0
CAC38767 1
cytochrome P450 aromatase
90988 Pimephales promelas
Fathead minnow

(1 of 454)	1 2 3 4 5 6 7 8 9 10 " 10- Download Table:
14

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View Report
To select a completed run and view Level 1 data, select the corresponding radio button in the first column
of the table and click "Request Selected Report/' This will open the Level 1 page to view the Level 1 data
and to set up queries for Level 2 and Level 3.
Note: The user MUST select a radio button PRIOR to clicking "Request Selected Report." If the user
fails to select a radio button and clicks "Request Selected Report"' a Spinning Wheel will appear and
disappear, and no completed run will be opened. Further, there is no pop-up message indicating that the
user did not select a radio button.
Home Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports Settings

SeqAPASS Reports

Version 6.0
Logged in as: Donovan Blatz

^Partial Protein Sequence
Requeit Sele
cted Report
Refresh Available Reports
# View Report



$ Save Report(s)



Available Reports
SeqAPASS
Data Version
Ortholog Count
9
Level 1 Query
Accession :
Query Protein Name J
NCBI
Taxonomy 10 I
Query Species Nome :
Query Common Nome 0

2304
2304
8
545
NPOC1248338 1
breast cancer anti estrogen resistance protein 3 isoform 1
S606
Homo sapiens
Hunan

6
192
NP_001230448 1
estrogen-related receptor gamma isofotm 2
9606
Homo sapiens
Human


2304
6
11
NP_001230447 1
estrogen-related receptor gamma isoform 8
9606
Homo sapiens
Human

.








—

2302
6
S27
NP_001166434 1
5-hydroxytrypJarnine receptee 4
10141
Cavia porceOus
Domestic guinea pig


2301
6
709
BAE92310 1
putative odorant receptor
8020
Oncortrynchus rnasou
Cherry salmon


2300
6
350
NP_001118204 1
sodium channel protein type 4 subunit alpha A
8022
Oncortiynchus mykiss
Rainbow trout


2300
6
0
ABP98939 1
voltage-gated sodium channel alpha type V, partial
8022
Oncortiynchus my loss
Rainbow trout


2300
6
3
AAW21996 1
GABA (A) receptor associated protein
715®
Aedesaegypti
Yellow fever mosqurto


2297
6
«
CAC38767 1
cytochrome P450 aromatase
90988
Pimephales promelas
Fathead minnow

{1 of 454)	1 2 3 4 5 6 7 8 9 10 - 1' 10- Download Table:
Save Report(s)
To download completed Level 1, 2, and/or 3 data, select the "Save Report(s)" radio button. Upon doing
so the user can select which accession(s) to download by clicking the checkbox in the first column of the
table associated with desired accession and click "Save Selected Report(s)."
Home Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports Settings

SeqAPASS Reports

Version 6.0
Logged in as: Donovan Biatz
SPartial Protein Sequence	Save Sticctpd Report^) Ro
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The user can also deselect data that is not wanted in the download by scrolling to the far right of the table
and deselecting the checkboxes for the different levels of the SeqAPASS analysis. By default, all
available data for the selected accession will be downloaded in a zip file.
Home Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports Settings

SeqAPASS Reports

Version 6.0
Logged in as: Donovan Blatz
fHPartial Protein Sequence
Save Selected Reports)
Refresh Available Reports
ipi View Report


9 Save Report(s)


Search: hnter keyword
Level 1 Query
Accession e
Query Protein Name 3
NCBI
Taxonomy ID S
Query Species Name Z
Query Common Name S
Taxonomy 5
Level 1
Level 2
Level 3


¦LrtJ





•IP_001230448 1
estrogen-related receptor gamma sdorm 2
9606
Homo sapiens
Human
Mammalia









=<|
r—

A8P98839 1
voBage-gated sodium channel alpha type V partial
8022
Oncorhynchus mytoss
Rainbow troot
Actinopten



AAW219961
GABA (A) receptor associated protein
715®
Aedesaegypti
Yellow fever mosquito
Insecta



CAC387671
cytochrome P450 aromatase
90988
Ptmephales prometas
Fathead minnow
Actinopten



123456789 10
A pop-up seqapass.zip file should appear with data files for each selected report. The naming convention
is the NCBI Protein Accession and the Data Version (e.g., AAG31441.2 \ 6).
16

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
@ ^ fl v I seqapass - WinZip	GEQjOSfi
Unzip/Share Edit Backup Tools Settings View
Help Upgrade

W'
Files >
seqapass.zip

Actions
Recent Zip Files


Unzip All Files


jaw seqapass zip
i 1
, AAG31441.2_v2
!• • Type: Folder
Date modified; 5/17/2017 8:58 AM
A Unzip to:
If V\Aa.ad.e-.\seqapass


seqapass-l.zip
m . 1
i AAK85198.1_v2
t- Type: Folder
Date modified: 5/17/2017 8:58 AM
Convert 8t Protect Files


gjj • seqapass-2.zip
. AAQ03208.1_v2
Date modified: 5/17/2017 8:58 AM
When adding files to this zip:



P- Type: Folder

H Encrypt Off



j ACD44939.1_v2
Date modified: 5/17/2017 8:58 AM



Places
py Convert to PDF Off


1 . Type: Folder




, Favorites

Date modified: 5A7/2017 8:58 AM
Resize Photos Off


i CAA10110.1_v2



t Type: Folder

i Watermark Off


"*3 Libraries
wN
, N P_001267576.1_v2
• Type: Folder
Date modified: 5A7/2017 8:58 AM
Save or Share Zip


Computer
382 GB free of 464 GB
. P68279.2_v2
1- • Type: Folder
Date modified: 5/17/2017 8:58 AM
Mj. Save as...
-
Network


05 Email

~ 7 item(s)
Zip File: 44 item(s), 130 MB

By clicking on one of the Reports for a Protein Accessionversion, all available files for each Level of the
SeqAPASS evaluation are available.
Note: This download includes default settings only. If susceptiblity cut-off or any defaults were
manipulated on Level 1 or 2 pages they will NOT be downloaded here and can ONLY be downloaded
directly from the Level 1 or Level 2 page where the setting was manipulated by the user. Also, data
visualizations can ONLY be downloaded from the Level 1 and 2 pages. They DO NOT populate in the zip
fde folders.
R | » 1 seqapass-2
WinZip




Unzip/Share Edit
Backup Tools Settings View
Help Upgrade

#

Files
Recent Zip Files
> ©
AAB53939.1.
seqapass-2.zip
_v2

Actions
Unzip Selected Files


seqapass-2.zip
&
LevellReports
Type: Folder

Date modified: 5/17/2017 9:03 AM
|f. Unzip to:
\\Aa.ad,..Aseqapass-2


seqapass-l.zip
JL
Level2Reports
Type: Folder

Date modified: 5/17/2017 9:03 AM
Convert & Protect Files


qh) seqapass.zip
M,
Level3Reports

Date modified: 5/17/2017 9:03 AM
When adding files to this zip:



p..
Type: Folder


Encrypt Off








=

Places




Convert to PDF Off






Resize Photos Off


Favorites











Watermark Off H


' • | Libraries
IhamI




Save or Share Zip


iiJLgi Computer
1 ^9 382 GB free of 464 GB




H. Save as...


Network




^5 Email


~ 3 'item(s)

Zip File: 78 item(s), 1.88 MB


17

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
By selecting "Level 1 Reports", both full and primary reports are available as csv files as well as a graphic
of the density plot for determining the susceptibility cut-off.
0 M' ' seqapass-2
WinZip

1 - 11 IKH
MLJJMm Unzip/Share Edit
Backup Tools Settings View Help Upgrade
4

Files
Recent Zip Files
> ©
Levell Reports
seqapass-2^ip ~ AAB53939.1_v2
Actions
Unzip Selected Files

—u seqapass-2.zip
1
£l
AAB53939.l_Full_v2.csv
Type; Microsoft Excel Comma Separated Values File
Date modified; 5/17/2017 9:03 AM -K Unzip to:
Size: 167 KB -* 44.8 KB S* \\Aa.ad....\seqapass-2

p seqapass-l.zip
3Jl 1

AAB53939.l_Full_v2_cutoff.png
Type; PNG Image
Date modified: 5/17/2017 9:03 AM
Size: 16.0 KB •+ 14.6 KB Convert & Protect Files

seqapasszip
1
G
AAB53939.l_Primary_v2.csv
Type Microsoft Excel Comma Separated Values File
, 		, ... When adding files to this zip:
Date modified: 5/17/2017 9:03 AM
Ske; 105 KB -» 26.3 KB Encrypt C D

Places
Favorites

AAB53939.l_Primary_v2_cutoff.png
Type; PNG Image
Date modified; 5/17/2017 9:03 AM Convert to PDF OS
Size: 161 KB -> 14.7 KB
Resize Photos Off
^ Watermark Off

' * I Libraries


Save or Share Zip

llJLi Computer
382 G8 free of 464 GB


S- Save as-.

Network


$5 Email


I | 4 item(s)
Zip File: 78 item(s), 158 MB

By selecting ""Levcl2Reports", all completed domain comparisons will be available and named by NCB1
domain accession with the starting amino acid residue position for the domain (e.g., pfam00001(54)).
3,' ] HH b H> v seqapass-2 - WinZip
Unzip/Share Edit Backup Tools Settings View Help
Upgrade
#
Files >
(£) Level2Reports
N-' seqapass-2.zip » AAB539391_v2

Actions
Recent Zip Files

Unzip Selected Files
seqapass-2.zip
i
j pfam00001(54)
1 Type; Folder
Date modified: 5/17/2017 9:03 AM
A Unzip to:
tl \\Aa.ad....\seqapass-2
gu seqapass-l.zip
i 1
i pfaml0320(54)
F Type: Folder
Date modified: 5/17/2017 9:03 AM
Convert 8t Protect Files
3u seqapass.zip
^ I 1
Places
jV. Favorites
j pfaml3853(54)
Type: Folder
Date modified: 5/17/2017 9:03 AM
When adding files to this zip:
Encrypt Off
py Convert to PDF Oft


Resize Photos


J. Watermark Off |||^H
"'" | Libraries


Save or Share Zip
IK—1 Computer
9 382 GB free of 464 GB


fH. Save as...
Network


SSl Email

~ 3 item(s)
Zip File: 78 item(s), 1.88 MB

18

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Upon selecting a domain file to view, both full and primary reports are available as csv files as well as a
graphic of the density plot for determining the susceptibility cut-off.
- B b »• : seqapass-2 - WinZip	|-or|l^s] ¦llaEiJ
Unzip/Share Edit
Backup Tools Settings View Help Upgrade

#
Files
Recent Zip Files
> (£) pfam00001(54)
seqapass-2.zip ~ AAB539391_v2 ~ Level2Reports

Actions
Unzip Selected Files

jyv seqapass-2.zip
- pfam00001(54)_Full_v2.csv
W3,l Type: Microsoft Excel Comma Separated Values File
Date modified: 5/17/2017 9:03 AM
Size: 191 KB -> 45.0 KB
^ Unzip to: v
0 \\Aa.ad....\seqapass-2

g/ seqapass-Lzip
pfam00001(54)_Full_v2_cutoff.png
Type: PNG Image
Date modified: 5/17/2017 9:03 AM
Size: 18.4 KB -> 171 KB
Convert & Protect Files

seqapass-zip
Ipi pfam00001(54)_Primary_v2.csv
Type: Microsoft Excel Comma Separated Values File
Date modified: 5/17/2017 9:03 AM
When adding files to this zip:


Size 162 KB 4 37.4 KB
ft Encrypt |

Places
pf a rnOOOOl (54)_Pri m a ry_v2_cutoff .p ng
Date modified: 5/17/2017 9:03 AM
Convert to PDF . i
5
Type: PNG Image
Size: 18.4 KB -* 171KB
qII Resize Photos ~

Favorites






Watermark Off H

"" • '| Libraries


Save or Share Zip

' Aa Computer
^9 382 GB free of 464 GB


P, Save as...
-
Network


® Email


|~| 4 item(s)
Zip File: 78 item(s), 1.88 MB


By selecting ""Levcl3Reports'", all user defined Level 3 alignments are available as csv.
Note: These csv files show the alignments across the entire sequence, not just those amino acid residues
selected by the user.
0 E» 7 seqapass-2 ¦
Unzip/Share Edit
Recent Zip Files
seqapass-2.zip
WinZip
Backup Tools Settings View Help Upgrade
seqapas5-l.zip
1W 1
£jy seqapass.zip
**1 i 1
Places
nJlta1 ^°mPuter
a^' 382 GB free of 464 GB
£
> Level3Reports
seqapas5-2.zip ~ AAB53939.1 v
Q
seqapass-2.zip ~ AAB53939.1_v2
3 try(318)_v2.CSV	Date modified: 5/17/2017 9
Type Microsoft Excel Comma Separated Values FileSize: 22.0 KB -¥ 4.77 KB
closer yet(310)_v2.CSV	Date modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 513 KB -> 738 KB
fOUr(316)_v2.CSV	modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 28.6 KB -¥ 4.98 KB
multi part teSt(313Lv2.CSV	Dste modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 34.7 KB -> 8.06 KB
multijest with non canonicals(320)_v2.... Date modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 31.8 KB -~ 7.95 KB
not yet working(309)_v2.csv	Date modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 51.2 KB -¥ 8.57 KB
repeat of 301(311)_v2.csv	Date modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 31.5 KB -> 8.02 KB
Should be 3(319)_v2.CSV	Date modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 25.2 KB -+ 4.76 KB

Actions
Unzip Selected Files
A Unzip to:
I* \\Aa.ad....\seqapass-2
Convert & Protect Files
When adding files to this zip:
Encrypt	Off ¦¦
pv/ Convert to PDF Oft |
Resize Photos Of: J™
Save or Share Zip
fH Save as.,.
Email
| | 14 'rtem(s)
Zip File: 78 item(s), 1.88 MB
19

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1: Primary Amino Acid Sequence Alignment
From the "View SeqAPASS Reports" tab, upon selecting a radio button and clicking "Request Selected
Report" the Level 1 data will be displayed.
The "Level 1 Query Protein Information" box contains the SeqAPASS Run ID, Query Accession,
Ortholog Count (# of hits identified as ortholog candidates to the query species protein sequence), NCBI
Data updates ("Protein and Taxonomy Data:" displays the date that NCBI databases were downloaded
and incorporated into the SeqAPASS database; BLAST Version: and Software Version: displays the
version being used by the SeqAPASS tool for the selected data), Query Species, and Query Protein. Other
information in this box will be described below.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Log out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports
Settings

SeqAPASS Reports Version 6.0

Logged in as: Donovan Blatz

Main Level 1 DS Report
Level 1 Query Protein Information
Hit proteins are identified for the following query protein. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 2295	Query Accession: NP 000116.2 oqr	Ortholog Count: 656	Protein and Taxonomy Data: 04/28/2021
Query Species: Homo sapiens	BLAST Version: 2.11.0
Query Protein: estrogen receptor isoform 1	Software Version: 5.1
Susceptibility Cut-off	® |	Level 2	© +	Level 3	© ^
Primary Report Settings	© *
Visualization	©~	Rslresh Lev.l 2 and 3 runs
The default table displayed at the bottom of the page is the "Primary Report", which includes query
protein information in the first row below the column titles, followed by hit proteins whose sequences
aligned with the query protein. The hit proteins are ordered from the highest to lowest percent similarity
(Maximum percent similarity =100%). For each hit protein, Data version, NCBI Accession and species
information is provided including the "Protein Count" which indicates the number of protein records per
species in the NCBI protein database, taxonomic information (See Primary Report Settings section
below in user guide for more detail on "Taxonomic Group" versus "Filtered Taxonomic Group"
columns), and species names. Also included are the NCBI protein accession, protein name, BLASTp
bitscore (describes overall quality of the alignment, See NCBI BLASTp tutorials), and percent similarity
([hit bitscore/query bitscore]* 100). If the hit protein has been identified as an ortholog candidate (using
reciprocal best hit blast method), it will be noted with a "Y" for yes or if not an ortholog candidate, a "N",
for no. If the hit protein is predicted to be susceptible according to the susceptibility cut-off criteria, that
will also be noted with a "Y" for yes or alternatively an "N" for no. The date the analysis was completed
is also identified. The data also includes a column describing the number of ortholog candidates identified
using the reciprocal best hit BLAST method. The susceptibility cut-off is also listed in a column. The cut-
off is determined through identifying local minimums in the density plot of the percent similarity values
for the primary report data set and evaluation of ortholog candidates. Additionally, there is a column that
identifies if the species is a Eukaryote noted with a "Y" for yes or alternatively an "N" for no. Links out
to the NCBI Protein Database, NCBI Taxonomy Database, and ECOTOX Knowledgebase (specific to the
data row) are embedded in the Level 1 data table for "NCBI Accession," "Species Tax ID," "Scientific
Name," "Protein Name", and "ECOTOX" columns. (See Search, View, and Download Data Tables
section of user guide for more information).
20

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Default highlights identify partial protein sequences, sequences with a bitscore higher than the query
sequence and therefore percent similarity greater than 100% (commonly synthetic constructs), and when
zero ortholog candidates are identified (in this case a user should consider a different query sequence or
check the full report). Please see Susceptibility Cutoff Box for Level 1 section of user guide for details
when no orthologs are detected. Additionally, the default setting for the report shows only eukaryote data
if a eukaryote is selected as the query protein, excluding prokaryote data from the table with the "Show
Only Eukaryotes" checkbox checked. To view prokaryote data, deselect this checkbox. If a prokaryote is
selected as the query protein, the default setting will include both eukaryote and prokaryote data and the
"Show Only Eukaryotes" checkbox will not be selected. To limit the data to eukaryotes only, the user
would check the "Show Only Eukaryotes" checkbox.
Columns in left side of table:

ft Partial Hit Protein Sequence
O


$1 Primary Report
B IBM


View Level 1 Summary Report O
Full Report
fi) Show Only Eidtaryotes


Push Level 1 To DS Report ©
Level 1 Data - Primary

The foflowing links exit the site	Download Current Level 1 Report Settings . ©
ECOTOX Widget | ©
Search: Enter keyword ®

Data
Version
NCBI Accession :
Protein
Count 8
SK
Taxonomic
Group C
Filtered
Taxonomic
Group S
Scientific Name S
Common Name c
Protein Name c

6
NP 000116 2
2603582
9606
Mammalia
Mammalia
HOroo sapiens
Human


6
ABY64717 1
1706 |
9593
Mammalia
Mammalia
Gorilla oonBa
Western gorilla
estrogen receptor aioha

6
XP_QQ3311W<51
171683 |
9595
Mammalia
Mammalia
Pan troglodytes
Chimpanzee
estrogen receptor isoform X?

6
XP_fl»868114 1
52137
2525
Mammalia
Mammalia
Gorilla oontta oonlla
Western lowland gorilla
estrooen receptor isoform X2

6
XP_0Q3SJ1&44.1
71982
9597
Mammalia
Mammalia
Pan paniscus
Pygmy chimpanzee
estrooen receptor

8
ABY64718 1
1609
9600
Mammalia
Mammalia
Ponoo pygmaeus
Borneari orangutan
earsgen receptor aicha
j~B|
6
XP 002817538 1
141069
9601
Mammalia
Mammalia
Ponqo abelii
Sumairan orangutan
estrogen recertor isoform X2

6
XP 011751932 1
68712
9545
Mammalia
Mammalia
Macaca nemestnna
Pig-tailed macaque
estrogen receptor isoform X2

6
XP 011922091 1
66421
2521
Mammalia
Mammalia
Cercocetxjs atys
Sooty mangabey
PREDICTED estrooen receptor isoform X2

6
XP 005552209 1
98680
9541
Mammalta
Mammalia
	 Macac? fascicular* 	
C/a£>-ea!ing macaque
PREDICTED estrooen receotor isoform XI
(1 of 137)
123456789 10 *;
* 10" Download Table: ~

Columns in right side of table:
Level 1 Data - Primary
The following links exit the site EJOT.	Download Current Level 1 Report Settings _ O
ECOTOX	O
Search: Enter keyword **
i Name 0
Protein Name S
BLASTp
Bitscore ;
Ortholog
Candidate 5
Ortholog
Count
Cut-off s
Percent
Similarity C
Susceptibility
Prediction C
Analysis Completed 5
Eukaryote
ECOTOX
mm
estrogen receptor isoform 1
124187
Y
656
34 43
100 00
Y
2021 07 13 1526 04
Y

ngonna
estrogen receotor alpha
122054
Y
656
34 43
9901
Y
2021 07 131S26M
Y

panzee
estrooen receptor isoform X2
1229 64
Y
656
3443
99.01
Y
2021 07 1315:26 04
Y

wtand gorilla
estrooen receotor isoform X2
1229 54
Y
656
3443
9901
Y
202107 13152604
Y

untpanzee
estrooen receptor
1228 00
Y
656
34 43
9888
v
2021 07 13 15:26 04
Y

orangutan
estrogen receptor alpha
122762
V
656
34 43
98 85
Y
2021 07 1315:26 04
Y

orangutan
estrogen receptor isoform X2
1227 62
Y
656
34 43
9885
Y
2021 07 13 15:26 04
Y

macaque
estrooen receotor isoform X2
1227 23
Y
656
34 43
9882
y
202107 13152604
Y

langabey
ig macaque
PREDICTED estrooen receoloi isoform X2
1227 23
1227 23
Y
656
3443
34 43
9882
Y
2021 07 131526 04
Y










(1 of 137)
12 3 4
5 6 7 8
9 10 "
^ 10 v
Download Table:

21

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1: Primary Report Settings
Default settings
The "Primary Report Settings" drop down allows the user to view default settings on the table below and
manipulate certain settings. "Primary Report Settings" are only available on the "Primary Report"
display, not the "Full Report." The default settings show data for hits whose E-value are < 0.01 and have
been identified to have > 1 domain in common with the query sequence. The default setting for the
"Sorted by Taxonomic Group" is "class," therefore the "Filtered Taxonomic Group" column in the table
is set to identify and report the taxonomic lineage of "class" from the NCBI Taxonomy Database.
However, if class is not identified in the NCBI Taxonomic Hierarchy associated with the hit accession,
then the algorithm will report the next available Taxonomic Group moving from class to subclass, to
superorder, to order, to suborder, to superfamily, to family, to subfamily, to genus. Finally, the
susceptibility predictions are set by using species read-across. (Please view Documentation Section of
the User Guide for details on Read-Across settings). Briefly, Species Read-Across is used to set the
susceptibility prediction, where all ortholog candidates are Susceptible = Y; all species listed above the
susceptibility cut-off are Susceptible = Y; all species below the cut-off from the same taxonomic group of
one or more species above the cut-off are Susceptible = Y; and those below the cut-off that are not
ortholog candidates and do not belong to a taxonomic group above the cut-off are Susceptible = N.
Primary Report Settings
o -



E-value:
0.01
~~i °
Sorted by
Taxonomic


class *
j ©
Group:
Common rj
Domains: I	
Species Read- |'v I , |
Across:		!	I
Use Default Settings
Changing Default Settings
The "E-value" and "Common Domains" settings can be manipulated by the user by entering the desired
E-value or number of Common Domains in the respective text boxes and clicking "Update Report." The
table and data visualization will automatically be updated after a few seconds. The user may choose to
change the level of the taxonomic hierarchy that is used for the susceptibility prediction. From the "Sorted
by Taxonomic Group" dropdown the user may choose to display a different taxonomic group in the
"Filtered Taxonomic Group" column of the data table.
Update
Report
22

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Primary Report Settings
E-value:	0.01
Sorted by
Taxonomic
Group:
Common
Domains:
Species Read-
Across:
[ order
Update	.
Report
Visualize
class
subclass
superorder
suborder
superfamily
family
subfamily
genus
If the user chooses "order' for example, the "Filtered Taxonomic Group" column in the data table will
report the taxonomic lineage of "order" from the NCBI Taxonomy Database and all species read-across
for the susceptibility prediction will be based on order instead of class. The data visualization will also
update. As described previously, if order is not identified in the NCBI Taxonomic Hierarchy associated
with the hit accession, then the algorithm will report the next available taxonomic group moving from
suborder, to superfamily, to family, to subfamily, to genus. Upon selecting the taxonomic group from the
dropdown and clicking "Update Report," the Level 1 Data for the Primary Report will update to the
selected taxonomic level.
Search: Enter keyword **
wData NCBI Accession 0
Version
Protein
Count 0
_ _ . Filtered
Spec.es Taxonomic Taxonomic Scientific Name 0
Tax ID 0 Group 0 Qroup $
Common Name 0

6
NP 000116.2
2603582
9606
Mammalia
Primates
Homo sapiens
Human


6
ABY64717.1
1708
9593
Mammalia
Primates
Gorilla qoriila
Western gorilla

u
6
XP 003311596.1
171683
9598
Mammalia
Primates
Pan troglodytes
Chimpanzee


6
XP 030868114.1
52137
9595
Mammalia
Primates
Gorilla aorilla qoriila
Western lowland gorilla

~
6
XP 003811544.1
71982
9597
Mammalia
Primates
Pan paniscus
Pygmy chimpanzee


6
ABY64718.1
1609
9600
Mammalia
Primates
Pongo pyamaeus
Bornean orangutan


6
XP 002817538.1
141069
9601
Mammalia
Primates
Pongo abelii
Sumatran orangutan

~ 6
XP 011751932.1
68712
9545
Mammalia
Primates
Macaca nemestrina
Pig-tailed macaque

u
6
XP 011922091.1
66421
9531
Mammalia
Primates
Cercocebus atys
Sooty mangabey

6
XP 005552209.1
98680
9541
Mammalia
Primates
Macaca fascicularis
Crab-eating macaque

(1 of 137)	1 213 J| 4 jl 5 6 7 8 9 10 ~ || »' 10~ Download Table:®"*-
23

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level One Summary Report
The user can view a summary of the data for each taxonomic group by clicking on the "View Level 1
Summary Report" button. The data includes, number of species, mean percent similarity, median percent
similarity and susceptibility prediction. This data can also be downloaded.

¦
Partial Hit Protein Sequence
0


¦,ih Primary Report
¦
100*

View Level 1 Summary Report
«
Full Report



Push Level 1 To DS Report
o
s
Show Only Eukaryotes

Level One Summary Report
Taxonomic Group S
Filtered Taxonomic
Group C
Number of Mean Percent Median Percent Susceptibility
Species 0 Similarity J Similarity 0 Prediction 0
Mammalia
Mammalia
195
73.47
87.25
Y
Testudines
Testudines
13
67.66
79.16
Y
Aves
Aves
122
67.00
78.40
Y
Crocodylia
Crocodylia
7
69.23
78.29
Y
Lepidosauria
Lepidosauria
25
63.76
74.50
Y
Amphibia
Amphibia
25
48.39
64.98
Y
Chondrichthyes
Chondrichthyes
8
41.11
39.30
Y
Dipnoi
Dipnoi
3
43.11
57.01
Y
Coelacanthimorpha
Coelacanthimorpha
2
46.56
46.56
Y
Actinopteri
Actinopteri
204
36.19
40.90
Y
(1 of 6) 1 2 3 4 5 6 " 1' 10- Download Table: ^—
The user may also choose to turn species read-across off, by using the "Species Read-Across" drop-down
and selecting "No"' and clicking "Update Report:' When "No" is selected, the susceptibility predictions
will only be "Y" in the table below if Percent Similarity is above the Cut-off or if the hit is identified as
an Grtholog Candidate, yes or ""Y." Any hit below the cut-off will yield a susceptibility prediction of no
or "N."
Primary Report Settings

o9

E-value: [ 0.01


~ ®
Sorted by Taxonomic
Group:




order
"





Common Domains: 1



m °





Species Read-Across:

No
-
0


Yes

Update Report Use
ngsg


24

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The user can select the "Full Report" on the "Level 1" page, which includes the same information as the
"Primary Report" and additional information pertaining to the alignment of the protein sequence using
BLASTp. Additional information includes the number of amino acid residues in the sequence (Hit
Length), the number of exact matching amino acids between the hit and query sequence (Identity), the
number of exact and similar matches in amino acids between the hit and the query sequence (Positives),
the expect value (E-value) describing the number of different alignments expected to occur in the
database search by chance, and the conserved domain count. The conserved domain count identifies all
domains associated with the query protein in the NCBI conserved domains database (Specific hits, Non-
specific hits, Superfamilies, and Multi-domains; See NCBI conserved domains database for details).
SeqAPASS algorithms record the query sequence coverage of each curated domain and compares that
coverage to that of the hit sequence. If the hit sequence covers the curated domain greater than or equal to
the query sequence, then the domain is considered a common domain between the hit and query. The
number of common domains comparing each hit sequence to the query sequence are summed and
reported. This column displays "0" when the hit protein and query protein do not have any common
domains. (See Search, View, and Download Data Tables section of user guide for more information).
The user can also download the currently applied report settings by selecting the "Download Current
Level 1 Report Settings." This csv allows the user to track which settings were used or changed by the
user when downloading a data table.
¦
(jpi Primary Report —

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
A
B
1	Level 1 Report Settings
2
3
4	Analysis TimeStamp
5	SeqAPASS version
6	Query Species
7	Query Protein
8	Query Accession
9	Ortholog Count
10	LI Cutoff
11	LI Cutoff Value
12	E-value
2019 05 16 11:04:08
3.2
Homo sapiens
estrogen receptor isoform 1
NP 000116.2
Default
33.93221513
0.01
13	Sorted by Taxonomic Group CLASS
14	Common Domains
1
15	Species Read Across
16	Show Only Eukaryotes
17	Report
Y
Checked
Primary
When downloading the current Level 1 report settings, the following information will be present in the
csv file. If the user decides to change the default settings, the csv file can be utilized for quick information
if the SeqAPASS page is no longer open.
Susceptibility Cutoff Box for Level 1
The susceptibility prediction is determined by identifying ortholog candidates, sequences above a defined
susceptibility cutoff, or by identifying those species below the susceptibility cut-off from an organism
class above the susceptibility cutoff. The default susceptibility cut-off is set by plotting the distribution of
percent similarities calculated for each hit protein. From this plot, the critical points are identified, and the
local minimums and maximums reported. Using the ortholog candidate data, a susceptibility cut-off is
automatically determined by identifying the first ortholog candidate at an equal or higher percent
similarity than the first local minimum. The user can view this graph by clicking the "Cutoff Settings"
button in the "Susceptibility Cut-off' box, which will open a new tab in the web browser. The "Select
Cut-Off' drop-down can allow the user to select between the default cut-off, the 2nd local minimum or a
user defined cut-off. The 2nd susceptibility cut-off is identified in the density plot by finding the 1st
ortholog candidate at an equal or higher percent similarity to that of the 2nd local minimum. Upon
selecting the User defined cut-off from the dropdown, the user can view and closely examine the density
plot and manipulate the cut-off. The "Enter Cut-off' text box becomes active and the user can enter a
number 1-100. To update the cut-off in the Level 1 data report and/or close the cutoff tab and return to the
Level 1 page, click "Update Cut-off' button.
26

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Susceptibility Cut-off
Cutoff Settings
This will open in a separate tab
Note: The user should have a justification for changing the susceptibility cut-off, either based on
evaluation of Ortholog cutoffs in the data visualization or from empirical evidence.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level 1 Susceptibility Cut-off: Primary Report
Local minimums are identified and susceptibility cut-off is set based on % similarity of next ortholog candidate. Use update cut-off button to go back to Level 1 data.
SeqAPASS ID: 1290	Query Accession: NP 000116.2	Ortholog Count: 348	Protein and Taxonomy Data: 02/2872019
Query Species: Homo sapiens	BLAST Version: 2.8.1
Query Protein: estrogen receptor isoform 1	Software Version: 3.2
Select Cut-off: | Default: Identify 1st local minimum and find next ortholog candidate
Enter Cut-off:

Density Plot
Cut-off Based on Ortholog Candidates

5.5

5.0

4.5

4.0
Cut-off Susceptibility
3.5
# Cut-off

1 33.93
>. 3.0
2 51.64
c
3 61.97
Q 2.5
4 71.68

5 85.11
2.0
6 96.53
1.5

1.0
0.5

0.0







¦	Density
¦	Local Max
HI Local Min














¦ Inflection Point






























































\


I




v

V—


^
Percent Similarity

All potential susceptibility cut-offs generated by the data distribution and ortholog candidate
identification are reported in the table with columns "Cut-off #" and "Susceptibility Cut-off. The user
can use these numbers to define a cut-off if empincal evidence suggests that the "Default" or "2nd
minimum" are not supported.
27

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
No Orthologs Detected
Level 1 Query Protein Information
SeqAPASS®: 229? Ouetv Accewon: CACM767 1 —	Oittiotofl Count: 0 Protein end Taxonomy 0(
Query Specie*: Pweomoes piomctet	BLAST Vtnton: 2 110
Ouery Piotein:QiooirwneP4S0aromslKe	Software v&rsiofi:) 1
Susceptibility Cut-off	VI1. Lev
Primary Report Settings

¦ Snow Only EuKaryores
¦w leva 1 Suronmy Repoii
Pu*h Lwd Tic PS Report
Level 1 Data - Primary
The TotoM-vg Ma exit me «m Mr	O
6C0I0XW.dB.t_ O
:S«iTCfu Enter keyword ®

— ss ssi 'ssr
dlastp ontiotoo oniioioa
¦
™ ; z : ^ r::: .

H
"iZT ' . z - I" - ™ ~

P
3? i FEE "rr™ ~r THT"




(1 of 144) f|, 21 al 4 slg 7 8 9 J, 10 - ' 10~ Download Table:
If no orthologs are detected from reciprocal best hit blast analysis, the "Ortholog Count" will be "0" at the
top of the "Level 1 Query Protein Information" page. The cutoff will be set by the local minimums only,
therefore the susceptibility prediction will NOT take into account ortholog candidates. It is recommended
that the user checks the full report for ortholog candidates or identifies a different query sequence for
the susceptibility predictions. Here, the susceptibility predictions will be highlighted in dark pink in the
Level 1 data table to indicate that 0 orthologs were detected and the susceptibility cutoff was determined
from plotting the distribution of percent similarities and identifying the local minimums.
Level 1 Query Protein Information
Hit proteins are identified for the following query protein. Use the main button to go back to the SeqAPASS Reports list
SeqAPASS ID: 1299	Query Accession: APQ40848.1 i^of7-	Ortholog Count: 0	02/^/20^9^ "'"axonom^ ^a*a"
Query Species: Poa annua	BLAST Version: 2.8.1
Query Protein: PsbA, partial (plastid)	Software Version: 3.2
Note: De-select the "Show Only Eukaryotes" checkbox to see if prokaryotes were identified as orthologs.
28

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
By clicking on the "Cutoff Settings" button when no orthologs are detected, the ''Cut-off #" and
""Susceptibility Cut-off columns will report only the local minimum values.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level 1 Susceptibility Cut-off: Primary Report
Local minimums are identified and susceptibility cut-off is set based on % similarity of next ortholog candidate. Use update cut-off button to go back to Level 1 data.
SeqAPASS ID: 2297	Query Accession: CAC38767.1	Ortholog Count: 0	Protein and Taxonomy Data: 04/28/2021
Query Species: Pimephates promelas	BLAST Version: 211.0
Query Protein: cytochrome P450 aromatase	Software Version: 5.1
Select Cut-off: [ Default: Identify 1st local minimum and find next ortholog candidate
Update Cut-off
Enter Cut-off:
Density Plot
Cut-off Based on Ortholog Candidates
Cut-off
Susceptibility
#
Cut-off
1
25.00
2
39.00
3
59.00
4
79.00






¦	Density
¦	Local Max
¦	Local Min









¦ Inflection Point
































I



-


* $
Percent Similarity
From the "Level 1" page the user can return to the list of completed SeqAPASS runs by clicking the
"Main" button on the upper left-hand side of the "Level 1 Query Protein Information" page.
Level 1 Query Protein Information
Hit proteins are identified for the following query protein. Use the man button
SeqAPASS ID: 1203	Query Accession: NP 001230448.1
Query Species: Homo sapiens
Query Protein: estrogen-related receptor gamma isoform 2
lo the SeqAPASS Reports lis
Ortholog Count: 57
Susceptibility Cut-off
Cutoff Settings
This twill open in a separate i
Primary Report Settings
Sorted by Taxonomic Group:
Common Domains:
IB
Species Read-Across:	(5IH	o
Update Report	Use Default Settings
Visualization
Visualize Data This will open in a separate ta
Protein and Taxonomy Data: 02128.2019
BLAST Version: 23 1
Software Version: 4.0
	Level 2 Query Domain -
NCBI C-orservec Domain DataBase 1 ©
Functional Domains
| -Select Domain -
IB®
View Level 2 Data
Choose Domain to View
[ -Select Completed Domain - Sf>
View Level 2 Data
Refresh Level 2 and.3 ru
~ Reference Explorer
Level 3 Query Amino Acid Residues ¦
NCB: sro:e.i Database gmflj
Se-'ect Template Sequence
Comparisons (optional')
Enter Level 3 Run Name
NCBi Taxonomy Database IfS
Choose Taxonomic Group)s j
[At Groups
Use table below to select sequences
Request Residue Run
View Single Report
-Select Level 3 Run Name -
"aJO
View Level 3 Data
1 View Combined Report
Combine Level 3 Data
m®
o
29

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
ECOTOX Widget
The ECOTOX widget gives the user the option to create a species and chemical filter that will link out to
ECOTOX. The widget allows for rapid access of curated empirical toxicity data from the
ECOTOXicology (ECOTOX) Knowledgebase (httns://cfbub.eua.go v/ecotox/) that can be compared to
sequence-based predictions of chemical susceptibility from SeqAPASS results. In the "Level 1 Data"
table header, the "ECOTOX Widget" button can be clicked and will open a widget that is populated with
all the taxonomic groups and species from the Level 1 Data table.

It) Partial Hit Prolan Sequence
©


4gi Primary Report
w maammsam


View Level 1 Summary Report O
$ Ful Report
IB Show Only Eukaryotes


PuahUwl 1 To DS Report ©
Level 1 Data - Primary

The following links exit the -;.te BP


Download Current Level 1 Report Settings 9
1 ECOTOX Widget <[>
Select Species
Taxonomic groups that are present within the '"Select Species" section of the ECOTOX widget are those
found in the Level 1 Data table and Boxplot. Default settings auto select those taxonomic groups and
species in common with ECOTOX. The user can select/deselect taxonomic groups of interest in the
"Select Taxonomic Groups (CLASS)" box. Additionally, species can be selected/deselected in the "Select
Species" box. Taxonomic groups and Species whose selection box is displayed greyed out, are not found
in ECOTOX. The maximum number of species that can be pushed to the ECOTOX filter is 500. (Note:
common species include those that are in the ECOTOX database, which does NOT mean they have
toxicity records associated with them m ECOTOX.). Upon selecting species for comparison in ECOTOX,
the user clicks on the "Push NCB1 Tax IDs" button to advance to the "Select Chemicals" feature of the
widget.
Select Species
Select Taxonomic Groups (CLASS)

Select Species

S*I*C,AU Taxonomic Group

s*iect An _ .
^ Species





H Aves



kaj Testudlnes


,_J Lepidosauria
-yi




_j Crocodylia


Coelacanthtformes


faj Actinopteri


~
~ ; dJjaiJ'EH'

Cladlstia


Hyperoartfa

„

	 >

Max number of species: 500
ritf' Common Name
Number of species selected: 366
i^i Scientific Name



Push NCBI Tax IDs
Select Chemicals (Optional)
The "Select Chemicals" feature of the ECOTOX widget is optional and can skipped by selecting the
"Open in ECOTOX" button. Chemicals can be searched by ty ping 3 letters of the chemical name which
will then populate with the top 100 hits containing those 3 letters. The chemical will appear and display
30

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
the CASRN number following the name. Up to 5 chemicals can be included in the ECOTOX filter.
Unwanted chemicals can be individually removed by selecting chemical and then clicking the "Remove
Selected Chemicals" or all chemicals can be removed by clicking the "Remove All Chemicals/' There are
links to the "CompTox Chemical Dashboard'' and "ECOTOX Chemicals," which open the respective
databases in separate browser tabs to aid in finding chemicals of interest. To push the created filtered
group to the ECOTOX Explore page, click the "Open in ECOTOX" button. Clicking the button will open
up a separate browser tab that will incorporate the user customized group within the ECOTOX webpage.
The selected species will be added to an ECOTOX Custom Species Group with the selected chemicals
used as a filter in Explore for the user to view and download records from ECOTOX.
Select Chemicals (Optional)
CoiddTox Chemical Dashboard
Chemical Search: |~
Add Selected Chemical ECOTOX Chemicals EXIT

Selected Chemicals:
Imidacloprid (CASRN:138261413)
Flupyradifurone (CASRN:951659408)
Thiacloprid (CASRN:111988499)

Remove Selected Remove All
Chemical Chemicals
(3/5) CAS Numbers Selected

Back to Tax IDs Open in ECOTOX
Level 2: Functional Domain(s) Alignment
In the "View SeqAPASS Reports" tab, on the "Level 1 Query Protein Information" page, there is a
"Level 2" box for comparing hit domains to the query domain. In the "Level 2" dropdown box, there is a
link out to the "NCBI Conserved Domain Database" for the query protein of interest. Below this link the
user will find a drop-down containing functional domains associated with the query sequence for
comparison across species.
Level 1 Query Protein Information
Hit proteins are identified for the following query protein. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1290	Query Accession: NP 000116.2 mar	Ortholog Count: 348
Query Species: Homo sapiens
Query Protein: estrogen receptor isoform 1
Susceptibility Cut-off
® |
Primary Report Settings
..
Visualization
-1
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
Software Version: 3.2
Level 2 Query Domain
NCBI Conserved Domain Database
Functional Domains
BBT O
View Level 2 Data
Choose Domain to View
f -Select Completed Domain -
View Level 2 Data
Refresh Level 2 and 3 runs
In the drop-down box (below the words "Functional Domains") the user will find all domains associated
with the query protein listed in the "NCBI Conserved Domains Database". To compare a domain from the
31

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
query protein to domains of the hit proteins, the user will use the drop-down to highlight a domain and
click the "Request Domain Run" button.
Note: Domains in the drop-down are listed with the first amino acid residue position that aligns w ith the
NCBI curated domain in parenthesis, followed by the NCBI domain Accession, domain name, and
description.
Level 2
O -
Leve
Level 2 Query Domain
NCBI Conserved Domain Database 0
mm
Functional Domains
[ -Select Domain -	"j ^~| O
+ Reference Explo
Level 3 Query Am
NCBI Protein Datab
Jsolnrt Tomnlalo tvo
y
-Select Domain -
(243) cd06157, NR_LBD, The ligand binding domain of nuclear rec
(105) cd06916, NR_DBD_like, DNA-binding domain of nuclear reef
(245) cd06929, NR_LBD_F1. Ligand-binding domain of nuclear rec
(242) cd06930, NR_LBD_F2, Ligand-binding domain of nuclear rec
(215) cd06931, NR_LBD_HNF4_like, The ligand binding domain of„
Note: The user can also use the text box on the top of the drop-down to search the "Functional Domain"
list in the drop-down.
It is recommended that the user click on the "NCBI Conserved Domains Database"
htto://www.ncbi.nlm nih.gov/cdd/ link to identify which domains are "Specific hits" in the NCBI
Conserved Domains Database. On the NCBI page, the user can scroll over the graphical representation of
the domains associated with the query sequence to highlight and identify the Accession associated with
domain "Specific hits." The example below shows the user hovering over the NR LBD ER domain with
the computer mouse.
NCBI
Structure Home
3D Macromolecular Structures
Conserved Domains I Pubchem ] BioSystems
Conserved domains on [gi|6282i794|ref]NP_oooii6.2[]
estrogen receptor isoform 1 [Homo sapiens]
View Concise Results (2)
Graphical summary
D Zoom to residue level I
Specific hits
Superf anilies
List of domain hits


U i -'"I 1 "tcoac

^| ¦ II ^1




Oest_recep superfanily


arhYtiii
Search for similar domain architectures | ^
fig Name Accession
HNR_LBD_ER cd06949 Ligand oinding domain o
" [Specific hit, evalue = 1.46e-
146}cd06949, Ligand binding domain
of Estrogen receptor, which are
activated by the hormone 17beta-
estradlol (estrogen) ;The ligand bindingj
I domain (LBD) of Estrogen receptor
(ER): Estrogen receptor, a member of
¦ nuclear receptor superfamily, is activated by the hormone
1 estrogen. Estrogen regulates many physiological
SSI
Ligand binding domain of Estrogen receptor, which are activated by the hormone 17beta-estradiol (estroge processes including reproduction bone integrity
Estrogen receptor, a member of nuclear receptor superfamily, is activated by the hormone estrogen. Estrog-: <^0^,3, he3|mi and behavjor. main "
bone integrity, cardiovascular health, and behavior. The main mechanism of action of the estrogen receptor mechanism of actjon ofthe estrogen receptor is as a
element of target genes upon activation by estrogen and then recruiting coactivator proteins which are resp transcription factor by Wndlnfl t0 the estrogen response Rs
may associate with other membrane proteins and can be rapidly activated by exposure of cells to estrogen e|ement of ,a ,	actjvation by eslrogen and _
ligand-activated transcription factors, ER has a central well conserved DNA binding domain (DBD), a variat	•		
binding domain (LBD). The C-terminal LBD also contains AF-2 activation motif, the dimerization motif, and part ofthe nuclear localization region. Estrogen receptor has been
linked to aging, cancer, obesity and other diseases.
Pssm-ID: 132747 Cd Length: 235 Bit Score: 426.07 E-value: 1.46e-146
gi 62821794 470 EEKPHXiiRVlI?KITPIIIigJ4AKASLTLQQQHQRLAQHXII.SHIRfiMSHKSHEHLYSMKCKHWPI.YDT.T.T.FMT,f)AH 547
32

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
After identifying the domain(s) of interest and the corresponding starting residue and domain Accession,
the user can return to the SeqAPASS tool, scroll to the domain of interest in the drop-down. If that
domain has not been previously run by the user, the "Request Domain Run" button will become active
and the user can click it to submit the domain query.
Level 2
Level 2 Query Domain
NCBI Conserved Domain Database ft
mm
Functional Domains
[(243) cd06157. NR_LBD. The ligand ( »] O
Request Domain Run
I	| View Level 2 Data	
Choose Domain to View
| -Select Completed Domain - j * 0
View Level 2 Data
When user clicks the "Request Domain Run" button, the following message will appear if the runs has
been submitted successfully.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Lo
q out
Level 2 Run Requested
Status ,M
Home
Request SeqAPASS Run
SeqAPASS Run Status View SeqAPASS Reports Settings


When sequence comparisons have completed for the selected functional domain, the domain will be
present in the "View Level 2 Data" drop-down. Hie drop-clown is not automatically populated with the
completed domain mn. The user must click on the "Refresh Level 2 and 3 runs'''' button to update the
page for the newly completed domain to present itself in the Choose Domain to View drop-down.
To view a completed Level 2 domain, highlight the domain of interest in the drop-down box and click the
"View Level 2 Data"' button. This will bring the user to the "Level 2" data page for the selected query
protein/domain.
Note: The user can also use the text box on the top of the drop-down to search the "Completed Domain"
list.
Level 2
Level 2 Query Domain
NCBI Conserved Domain Database bot ©
Functional Domains
| -Select Domain -	- ©
Request Domain Run
View Level 2 Data
Choose Domain to View
[ -Select Completed Domain - -J©
|n
-Select Completed Domain -
(316) cd06931, NR_LBD_HNF4Jike. The ligand binding domain of h<
(310) cd06949. NR_LBD_ER, Ligand binding domain of Estrogen rec
Level 2
Level 2 Query Domain

¦ ©
"J®
NCBI Conserved Domain Database cm
Functional Domains
-Select Domain -
Request Domain Run

View Level 2 Data

c
I (310)cd069
bindiij g I O
hoose Domain to View
49, NR_LBD_ER, Ligan
View Level 2 Data
33

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View Level 2 Data Page
The "Level 2 Query Domain Information" box contains the SeqAPASS Run ID, Query Accession,
Qrtholog Count (# of hits identified as ortholog candidates to the query species protein sequence), NCBI
Data updates ("Protein and Taxonomy Data:" and "CDD Data:" display the dates that NCBI databases
were downloaded and incorporated into the SeqAPASS database; "BLAST version:" and "Software
Version:" displays the version being used by the SeqAPASS tool for the selected data), Query Species,
Query Domain (with link out to NCBI domain page), Query Protein name.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Logout
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports
Settings
SeqAPASS Reports Version 6.0
Logged in as: Donovan Blatz

Main Level 1 Level 2
Level 2 Query Domain Information
Hit domains are identified lor the following query domain Use the main button to go back to the SeqAPASS Reports list
SeqAPASS tO: 2295	Query Accession: NP_000116 2	Ortholog Count: 656	Protein and Taxonomy Data: 04/28/2021
Query Species: Homo sapiens	BLAST Version: 2 11 0
Query Domain: (310) cd06649 n NR_LBD_ER ligand binding domain of Estrogen receptor, which are activated by the hormone 17betaestradiol (estrogen) CDD Data: 04/29*2020
Query Protein: estrogen receptor isoform 1
Susceptibility Cut-off
Software Version: 5 1

Primary Report Settings
0 -
Evalue
110 0 i O

Soiled by Taxonomic Group
' class - O

Species Read-Across
|yM O

Update Report
Use Default Settings

View Cutoff
This will open in a separate tab
Visualization
Visualize Data I This will open in a separate tab
The default "Level 2" table is the "Primary Report", which includes query domain information in the first
row below the column titles, followed by hit domains whose sequences aligned with the selected query
domain. The hit domains are ordered from the highest to lowest percent similarity (Maximum percent
similarity =100%). For each hit domain, Data Version, NCBI Accession and species information is
provided, including the "Protein Count" which indicates the number of protein records per species in the
NCBI protein database, taxonomic information, and species names. Also included are the NCBI accession
for the query protein, query protein name, Domain Type, BLASTP bitscore (describes overall quality of
the alignment, See NCBI BLASTp tutorials), and Domain percent similarity ([hit bitscore/query
bitscore]* 100). If the hit protein has been identified as an ortholog candidate (using reciprocal best hit
BLAST method), it will be noted with a "Y" for yes or if not an ortholog candidate, a "N", for no.
A prediction of susceptibility is displayed based on the susceptibility cut-off, identified with a "Y" for yes
or an "N" for no. The date/time the analysis was completed is also identified. (See Search, View, and
Download Data Tables section of user guide for more information). There is a column that identifies if
the species is a eukaryote, noted with a "Y" for yes or alternatively a "N" for no if the hit is a prokaryote.
Additionally, a column with a link to the U.S. EPA ECOTOX Knowledgebase
(https://cfpub.epa.gov/ecotox/heip.cfm) is available when there are empirical toxicity data curated for
the species identified in the row. This link allows the user to view available single chemical toxicity data
from the literature for specific species.
34

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Default highlights identify partial protein sequences, sequences with a bitscore higher than the query
domain and therefore percent similarity greater than 100% (commonly synthetic constructs), and when
zero ortholog candidates are identified (in this case a user should consider a different query sequence).
Additionally, the default setting for the report shows only eukarvote data, excluding prokaryote data from
the table with the "Show Only Eukaryotes" checkbox checked. To view prokaryote data, deselect this
checkbox.
Partial Hit Protein Sequence
 Primary Report
Q Full Report
0 Show Orrty Eukaryotes
View Level 2 Summary Report
Level 2 Data - Primary
The following links exit the site f|
Download Current Level 2 Report Settings
Hseardgjj Enter keyword ®
Data
Version
NCBI Accession "
Protein
Count 0
Species
Tax ID c
Taxonomic
Group 0
Filtered
Taxonomic
Scientific Name 0
Common Name 0
Protein Name 5 1
4
NP 000116.2
1265506
9606
Mammalia
Mammalia
Homo sapiens
Human
estroaen receptor isoform 1
4
ABY647171
2023
9593
Mammalia
Mammalia
Gorilla aorilla
Western gorilla
estroaen receotor alpha
4
XP 002817538 1
145798
9601
Mammalia
Mammalia
Ponoo abelii
Sumatran orangutan
estroaen receotor isoform X2
4
XP 0118521901
38580
9568
Mammalia
Mamm^ia
Mandnlltis leucoDhaeus
Dnll
PREDICTED estroaen receotor isoform X2
4
XP 023061905 1
54518
591936
Mammalia
Mammalia
P>liocoictx>$ teohrosceles
Ugandan red Colobus
estroaen receotor isoform X2
4
XP 018884801 1
47068
9595
Matranafca
Mammalia
Gpnllg gorilla gorilla
Western lowland gorilla
PREDICTED estrogen receptpr isofprrri X2
4
XP 008005788 1
62315
60711
Mammalia
Mammalia
Chlorocebus sabaeus
Green monkey
PREDICTED estroaen receotor isoform X2
4
XP 011751932 1
69122
9545
Mammalia
Mammalia
Macaca rtemestnna
Pig-tailed macaque
estroaen receotor isoform X2
4
ABY64719.1
712
9580
Marrenalta
Mammalia
Hylobates lar
Common gibbon
estroqen receptor alpha
4
NP 0011580591
68224
9555
Mammals
Mammalia
Paoto anubis
Olive baboon
estrooen receptor
{1 of 95)
Download Table:
Level Two Summary Report
The user can view a summary of the data for each taxonomic group by clicking on the "View Level 2
Summary Report". Hie data includes, number of species, mean percent similarity, median percent
similarity and susceptibility prediction. This data table can also be downloaded.
Level Two Summary Report
Taxonomic Group
Filtered
Taxonomic Group
Number of Mean Percent Median
Species 0 Similarity 0 _.,rCe™
r ' Similarity 0
Susceptibility
Prediction o
Mammalia
Mammalia
176
80.60
97.63
Y
Aves
Aves
96
83,78
95.73
Y
Crocodylia
Crocodylia
7
84.98
95.97
Y
Testudines
Testudines
9
86.30
94.55
Y
Lepidosauria
Lepidosauria
22
71.14
92.21
Y
Amphibia
Amphibia
22
60.74
81.03
Y
Chondrichthyes
Chondrichthyes
7
55.68
67.59
Y
Coelacanthiformes
Coelacanthiformes
2
70.43
70.43
Y
Actinopteri
Actinopteri
179
51.66
62.13
Y
Ceratodontimorpha
Ceratodontimorpha
3
53.96
71.15 Y

(1 Of 6) 12 3
4 5 6 *"'' 10' Download Table:

35

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 2: Primary Report Settings
Default settings
The "Primary Report Settings" box allows the user to view default settings on the table below and
manipulate certain settings. The "Primary Report Settings" box is only available on the "Primary Report"
display. The default settings show data for hits whose E-value are <10. The default setting for the "Sorted
by Taxonomic Group" is "class," therefore the "Filtered Taxonomic Group" column in the table is set to
identify and report the taxonomic lineage of "class" from the NCBI Taxonomy Database. However, if
class is not identified in the NCBI Taxonomic Hierarchy associated with the hit accession, then the
algorithm will report the next available Taxonomic Group moving from class to subclass, to superorder,
to order, to suborder, to superfamily, to family, to subfamily, to genus. Finally, the susceptibility
predictions are set by using Species Read-Across. (Please view SeqAPASS Documentation Section of
the User Guide for details on Read-Across settings). Briefly, "Species Read-Across" is used to set the
susceptibility prediction, where all ortholog candidates are Susceptible = Y; all species listed above the
susceptibility cut-off are Susceptible = Y; all species below the cut-off from the same taxonomic group of
one or more species above the cut-off are Susceptible = Y; and those below the cut-off that are not
ortholog candidates and do not belong to a taxonomic group above the cut-off are Susceptible = N.
Primary Report Settings © -


E-value:
10.0 ©


Sorted by Taxonomic Group:
class ^T ©


Species Read-Across:
Yes - ©


Update Report
Use Default Settings

Changing Default Settings
The user may choose to change the level of the taxonomic hierarchy that is used for the susceptibility
prediction. From the "Sorted by Taxonomic Group" dropdown the user may choose to display a different
taxonomic group in the "Filtered Taxonomic Group" column of the data table.
10.0
E-value:
Sorted by Taxonomic Group:
Species Read-Across:
Update Report
Primary Report Settings
OH
suborder
superfamily
family
subfamily
genus
class
subclass
superorder
order
36

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
If the user chooses "order' for example, the "Filtered Taxonomic Group"' column in the data table will
report the taxonomic lineage of "order" from the NCBI Taxonomy Database and all species read-across
for the susceptibility prediction will be based on order instead of class. As described previously, if order
is not identified in the NCBI Taxonomic Hierarchy associated with the hit accession, then the algorithm
will report the next available Taxonomic Group moving from suborder, to superfamily, to family, to
subfamily, to genus. Upon selecting the Taxonomic Group from the dropdown and clicking "Update
Report," the "Level 2" data for the Primary Report will update to the selected taxonomic level. The user
can also download the currently applied report settings by selecting the "Download Current Level 2
Report Settings". This csv file allows the user to track which settings were used or changed by the user
when downloading a data table.
Level 2 Data - Primary
The following links exit the site £XIT
Search: Enter keyword ®
Data
Version
NCBI Accession 0
Protein
Count 0
Species
Tax ID 0
Taxonomic
Group 0
Filtered
Taxonomic
Group 0
Scientific Name 0
Common Name 0
4
NP 000116.2
1265506
9606
Mammalia
Primates
Homo sapiens
Human
4
XP 014992596.1
88400
9544
Mammalia
Primates
Macaca mulatta
Rhesus monkey
4
ABY64721.1
931
9534
Mammalia
Primates
Chlorocebus aethiops
Grivet
4
XP 003255939.1
38964
61853
Mammalia
Primates
Nomascus leucoqenys
Northern white-cheeked gibbon
4
XP 025240309.1
52618
9565
Mammalia
Primates
Theropithecus qelada
Gelada
4
XP 003811544.1
51891
9597
Mammalia
Primates
Pan paniscus
Pygmy chimpanzee
4
XP 011922091.1
66748
9531
Mammalia
Primates
Cercocebus atvs
Sooty mangabey
4
ABY64717.1
2023
9593
Mammalia
Primates
Gorilla aorilla
Western gorilla
4
XP 0028175381
145798
9601
Mammalia
Primates
Ponqo abelii
Sumatran orangutan
4
XP 011852190 1
38580
9568
Mammalia
Primates
Mandrillus leucophaeus
Drill
(1 of 95)	Wl^l^l4|l5|l6|Ml8jl9jllO| l"*J| M 10' Download Table:
The user may also choose to turn species read across off, by using the "Species Read-Across" drop-down
and selecting "No" and clicking "Update Report". When "No" is selected, the susceptibility predictions
will only be "Y" in the table below if Percent Similarity is above the Cut-off or if the hit is identified as
an Ortholog Candidate, yes or "Y." Any hit below the cut-off will yield a susceptibility prediction of no
or "N".
Primary Report Settings
E-value:
Sorted by Taxonomic Group:
Species Read-Across:
Update Report
The user can select the "Full Report" on the "Level 2" data page, which includes the same information as
the "Primary Report" and additional information pertaining to the alignment of the protein sequence using
BLASTp and domain information. Additional information includes the NCBI PSSM ID, NCBI Domain
ID, Domain Name, number of amino acid residues in the sequence (Hit Length), the number of exact
matching amino acids between the hit and query sequence (Identity), the number of exact and similar
Lit Settings
37

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
(similar side-chain substitutions) matches in amino acids between the hit and the query sequence
(Positives), and the expect value (E-value) describing the number of different alignments expected to
occur in the database search by chance. (See Search, View, and Download Data Tables section of user
guide for more infonnation).
Level 2 Data - Full
The foflowing links exit the site EXIT	Download Current Level 2 Report Settings
Enter keyword

n
Domain Name
Hit Length 0
Identity c
Positive c Evaiue c
BLASTp
Bitscore ;
Ortholog
Candidate £
Ortholog
Count
Cut-off o
Percent
Similarity 0
Susceptibility
Prediction £
Analysis Completed £
Eukaryote
EcoTox

NR LBD ER
238
238
238 1 621E-179
487 26
Y
348
4150
100.00
Y
2019 08 2309:47 27
Y


NR LBD ER
238
237
238 | 9910E-179
485.34
Y
348
41 50
99.60
Y
2019082309:47:27
Y


NR LBD ER
238
237
238 9.910E-179
485 34
Y
348
4150
9960
Y
2019 08 23 0947 27
Y


NR LBD ER
238
237
238 | 9.910E-179
485 34
Y
348
41 50
99.60
Y
201908 23 09 4727
Y


NR LBD ER
238
237
238 9.910E-179
485.34
Y
348
4150
9960
Y
20190823 09:4727
Y


NR LBD ER
238
237
238 9.910E-179
485 34
Y
348
4150
99.60
Y
2019 08 23 09:4727
Y


NR LBD ER
238
237
238 9 910E-179
485 34
Y
348
4150
99.60
Y
2019 08 23 09 4727
Y


NR LBD ER
238
237
238 9.910E-179
485.34
Y
348
41 50
99.60
Y
2019 08 23 09:4727
Y


NR LBD ER
238
237
238 9.910E-179
485.34
Y
348
4150
9960
Y
201908230947:27
Y
-

NR LBD ER
238
237
238 9.910E-179
485 34
Y
348
41 50
99.60
Y
2019082309:4727
Y






(1 of 95)

1 2 3 4 5
|6j7 8 9 10 ^
*• 10* Download Table:


Note: SeqAPASS v2.0 and newer parse the BLASTp query and hit accessions to identify all the
species/accessions from identical proteins. Therefore, if a hit sequence represents multiple species, all
species with the identical sequence will be found in the data tables for Level 2. To determine which
sequence/species was identified from BLASTp as a hit and which sequence/species was parsed from the
identical sequence, view the "Full Report" for Level, column "Identical Protein," where "N" is indicative
of the original hit sequence and "Y" is the parsed sequence.

A
B
1
Level 2 Report Settings

2


3


4
Analysis TimeStamp
2019 05 16 11:04:08
5
SeqAPASS version
3.2
6
Query Species
Homo sapiens
7
Query Protein
estrogen receptor isoform 1
8
Query Domain
(310) cd06949, NR_LBD_ER,
Ligand binding domain of
Estrogen receptor, which are
activated by the hormone
17beta-estradiol (estrogen)
9
Query Accession
NP_000116.2
10
Ortholog Count
348
11
L2 Cutoff
Default
12
L2 Cutoff Value
41.5003807
13
E-value
10
14
Sorted by Taxonomic Group
CLASS
15
Species Read Across
Y
16
Show Only Eukaryotes
Checked
17
Report
Primary
When downloading the "Current Level 2 Report Settings", the following information will be present in
the csv. If the user decides to change the default settings, the csv can be utilized for quick information if
the SeqAPASS page is no longer open.
38

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Susceptibility Cutoff Box for Level 2
The susceptibility prediction is set by identifying ortholog candidates, sequences above a defined
susceptibility cutoff, or by identifying those species below the susceptibility cut-off from an organism
class above the susceptibility cutoff. The default susceptibility cut-off is set by plotting the distribution of
percent similarities calculated for each hit protein. From this plot, the critical points are identified, and the
local minimums and maximums reported. Using the ortholog candidate data, a susceptibility cut-off is
automatically determined by identifying the first ortholog candidate at an equal or higher percent
similarity than the first local minimum percent similarity. The user can view this graph by clicking the
"View Cutoff' button in the "Susceptibility Cut-off' box. Radio buttons located to the right of the
graphical display indicate which Cut-off has been applied for the evaluation of susceptibility in the report.
These radio buttons can be selected to change the cut-off in the table to the 2nd local minimum, where the
2nd local minimum is identified in the density plot and the first ortholog candidate at an equal or higher
percent similarity than the second local minimum percent similarity is used to set the cut-off. Or the user
can define the local minimum by clicking on the "User Defined" radio button. Alternatively, the user can
view the closely examine the density plot and manipulate the cut-off by clicking the "View Cutoff'
button.
Level 2 Query Domain Information
Hit domains are identified for the following query domain Use the main button to go back to the SeqAPASS Reports list
SeqAPASS ID: 1290	Query Accession: NP OOQ116 2 nr	Ortholog Count: 348	Protein and Taxonomy Data: 02/28/2019
Query Species: Homo sapiens	BLAST Version: 2 81
Query Domain: (310) cd06949 but , NR_LBD_ER , Ligand binding domain of Estrogen receptor, which are activated by the hormone 17beta-estradiol (estrogen)	CDD Data: 12/08/2016
Query Protein: estrogen receptor isoform 1	Software Version: 3.2
Susceptibility Cut-off
a




View Cutoff

This will open in a separate tab
Visualization
°m
Upon clicking "View Cutoff' button, a new page is displayed with a drop-down that allows the user to set
the susceptibility cut-off using the first local minimum and the identified ortholog candidate, the second
local minimum and the identified ortholog candidate, or by the "User defined cut-off' (where the user
selects the cutoff). To update the cut-off in the Level 2 data report and/or return to the Level 2 page, click
"Update Cut-off' button.
Note: The user should have direct empirical evidence that species above the user defined cutoff are
susceptible via the protein of interest, or that the species below the user defined cutoff are not susceptible.
39

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Upon selecting the User defined cut-off from the dropdown, the "Enter Cut-off' text box becomes active
and the user can enter a number 1-100.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level 2 Susceptibility Cut-off: Primary Report
Local minimums are identified and susceptibility cut-off is set based on % similarity of next ortholog candidate. Use update cut-off button to go back to Level 2 data.
SeoAPASS ID: 1290 Querv Accession: NP 000116.2 Ortholoa Count: 348
Query Species: Homo sapiens
Querv Domain: (310) cd06949 NR LBD ER Liaand bindma domain of Estroaen receotor. which are activated bv the hormone 17beta-estradiol (estroaen)
Query Protein: estrogen receptor isoform 1
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
CDD Data: 12/08/2016
Software Version: 3.2
Select Cut-Off: j Default: Identify 1st local minimum and find next ortholog candidate [ ~ Enter Cut-Off:
©
Update Cut-off

Density Plot
Cut-off Based on Ortholog Candidates
¦	Density
¦	Local Max
¦I Local Min
A	¦; Inflection Point
Susceptibility
Cut-off
P * ?	0	^
Percent Similarity
All potential susceptibility cut-offs generated by the data distribution and ortholog candidate
identification are reported in the table with columns "Cut-off#" and "Susceptibility Cut-off'. The user
can use these numbers to define a cut-off if empirical evidence suggests that the "Default" or "2
minimum" are not supported.
40

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
No Orthologs Detected
Level 2 Query Domain Information
Hit domains are identified for the following query domain. Use the main button to go back to the SeqAPASS Reports Bst.
SeqAPASS ID: 1326	Query Accession: NP 001317544.1 IBM	Ortholog Count: 0
Query Species: Homo sapiens
Query Domain: (110) cd06965 —» , NR_DBD_Ppar. DNA-binding domain of peroxisome proliferator-activated receptors (PPAR) is composed of two C4-type zinc fingers
Query Protein: peroxisome proliferator-activated receptor gamma isoform 3
Susceptibility Cut-off
View Cutoff
This will open in a separ;
Visualization
o +
Partial Hit Protein Sequence
<•§ Primary Report B
0 Full Report	ft
View Level 2 Summary Report
Show Only Eukaryotes
Level 2 Data - Primary
The following links exit the site
Search: Enter keyword °
Data
Version
NCBI Accession = "JJ".
ToxTd8:
Taxonomic
Group C
Filtered
Taxonomic
Group C
Scientific Name c
Common Name o
4
NP 001317544.1
1265506
9606
Mammalia
Mammals
Homo saoiens
Hainan
4
XP 006150376.1
50340
29073
Mammafls
Mammalia
Eoteswrus fuscus
ag brawn B3

XP 0192336651
5S782
9691
tJanvrsaa
Mammalia
Panthera oardus
Leopaia
4
XP_021047523.1M
a&Bt ¦
mm I
IM iMammailS. M
IM-fttrnrnafer-B
Museahan
Shrew mouse
If no orthologs are detected from reciprocal best hit blast analysis, the '"Ortholog Count" will be "0" at the
top of the "Level 2 Query Protein Information" page. The cutoff will be set by the local minimums only,
therefore the susceptibility prediction will NOT take into account ortholog candidates. It is recommended
that the user checks the full report for Ortholog candidates or identifies a different query sequence for
the susceptibility predictions. Here, the susceptibility predictions will be highlighted in dark pink in the
Level 2 data table to indicate that 0 orthologs were detected and the susceptibility cutoff was determined
from plotting the distribution of percent similarities and identifying the local minimums.
Main
Level 1
Level 2
Level 2 Query Domain Information
Hit domains are identified for the following query domain. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1321	Query Accession: BAF57671.1 SH	Ortholog Count: 0
Query Species: Mus caroli
Query Domain: (24) CHL00070 Boff petB , cytochrome b6
Query Protein: cytochrome b, partial (mitochondrion)
Susceptibility Cut-off
!.~!
Visualization
0*
Protein and Taxonomy Data: 02/2&'2Q19
BLAST Version: 2.8.1
CDD Data: 12/08/2016
Software Version: 4.0
Primary Report Settings
By clicking on the "View Cutoff' button when no orthologs are detected, the "Cut-off #" and
"Susceptibility Cut-off columns will report only the local minimum values.
41

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level 2 Susceptibility Cut-off: Primary Report
Local minlmums are identified and susceptibility cut-off is set based on % similarity of next ortholog can
SeoAPASS ID: 1326 Querv Accession: NP 001317544 1
Query Species: Homo sapiens
Query Domain: (110) cd06965 NR_DBD_Ppar DNA-binding domain of peroxisome proliferaior-ac
Query Protein: peroxisome proliferator-activated receptor gamma isoform 3
ididate Use update cut-otf button to go back to Level 2 data
Ortholog Count: 0
tivated receptors (PPAR) is composed of two C4-type zinc fingers
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2 81
CDD Data: 12/08/2016
Software Version: 4.0
Select C Ut-off: (Default: Identify 1 st local minimum and find next ortholog candidate
"H Enter Cut-otf: 100.0 ©

Update Cut-off


Density Plot
Cut-off Based on Ortholog Candidates
0
™ Point
5
0
0
1
5
0
0
0
Percent Similarity
The user can return to the "Level 2" data page by clicking the "Update Cut-off' button or exiting the tab.
Level 1 and Level 2: Data Visualization
From the Level 1 or Level 2-results page SeqAPASS users can access an interactive data visualization for
both the "Primary Report'' or "Full Report" by clicking on the "Visualize Data"' button.
Example of Level 1 page:
Home Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

I SeqAPASS Reports


Version 6.0
Logged in as: Donovan Blatz

Main Level!
Level 1 Query Protein Information
hw protems are mewmed «* me roaowrig query [notem use tne (nam ouooi to go oac* »the SetjAfAss Reports isi
SeqAPASS 10:2293	Query Accession: Sf IXi0116 2 T.	Ortnotofl Count 656
Query Species: Homo saptens
Query Protein: eytojeq receptor isototm 1
Susceptibility Cut-off	Hfifl
Cutoff Settings j
mis open m a separate at)
Primary Report Settings
Common Domam
Refresn Level 2 ana 3 ru
species Reao-Aaoss	y*s i>j
Upa ate Report	Use Default Settings
Visualization
^^VisuaUMOauJ^^m^w^pefMru^epanii^at^^
42

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Example of Level 2 page:
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)	Upgoul
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports	Version 6.0	Logged In as: Donovan Blatz
Main Level 1 Level 2

Level 2 Query Domain Information
mi domains are idcnMied for tne following query domain Use
the ram button !o go Dadi
10 tne SeqAPASS Reports list.

SeqAPASS ID: 2295 Query Accession: N°_Q0Q116 2 MBr
Ortnolog Count: 656
Protein and Taxonomy Data: 04/28/2021
Query Species: Momo sapiens


BLAST Version: 2 110
Query Domain: i3i0i cao69
m
snow orti tukaryaes


Push Level 2 TO OS Report O
Level 2 Data - Primary

The data visualization will then open in a new web browser tab, one for Level 1 and a different one for
Level 2. The visualization will display for the report selected by the user on the Level 1 or Level 2 report
page and be identified as "Level One Visualization - Primary Report" or "Level One Visualization - Full
Report" and "Level Two Visualization - Primary Report" or "Level Two Visualization - Full Report."
Note: One report type at a time, either "Primary Report" or "Full Report," can be displayed in the
visualization tab for Level 1 and Level 2. Therefore, if the user is viewing the "Level One Visualization -
Primary Report" page and returns to the Level 1 results page and clicks the radio button for "Full Report,"
the data visualization tab will update to "Level One Visualization - Full Report."
43

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level I and 2 Information Page
The initial page that opens upon clicking the "Visualize Data" button provides the respective level query
protein information, including SeqAPASS ID, query protein, query species, ortholog count, and query
accession information. A link out to the NCBI protein database page corresponding to the queried
accession is available by clicking the query accession. Information on the visualization is provided in the
"Visualization Info" text box. To view the data visualization boxplots click the BoxPlot icon.
Level One Visualization - Primary Report
Level 1 Queiy Protein infwmatlon
Select to Open Information or Date Visualization
©id
Info
Visualization into
The Mowng data visualization is available far Level 1 and Level 2 data:
• BoxPlot ¦ Boxclots depicting SeqAPASS data Illustrating the percent similarity across species compared to the query species examining Ihe primary anvoo acid sequences (Level 1 Visualization) or
functional domain (Level 2 Visualization).
° The open circle, o, represents the query species and closed crcies, •, represent the species wth Ihe highest percent similarity within the specified taxonomic group,
v The top and bottom of each box rejyesent the 75th and 25th percentiles, respect veiy. The top and bcttem whiskers extend to 1.5 times the interquartile range,
o The trean and median values for each taxonomic group are represented by horizontal thick and thin black lines on the box, respectively,
o The dashed line indicates the cut-off for susceptibility predictions (based on orthclog analysis).
Level Two Visualization - Primary Report
Select to Open Information oi Data Visualization
m
BoxPlot - Boxplsts Sepeng SeqAPASS data
functional dcmain (Level 2 Visualization!.
Level 3 Visualization Information Text
• Heat Map - Heat Maps depicting SeqAPASS data illustrating the comparison between the
template species and the user selected species allows for a summary of species" protein sequence
comparisons.
o The similarity between species compared to the template species and the user selected
amino acids is denoted with either a (Y)—yes, or (N)—no. The color green is associated
with "yes" and red is associated with "no."
o Similarities between amino acids are determined by comparing the species specific amino
acids against the template species. The amino acids can be either a Total Match, Partial
Match, or Not a Match,
o The user has the ability to add or remove five settings (Susceptibility Prediction,
Susceptibility Prediction Text, Alignment Prediction Heat Map, Amino Acid, and Amino
Acid Position) to allow for a customizable Heat Map.
o Selecting one of the Optional Selections will highlight the species names that are
associated with that selection.
44

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1 and 2 BoxPlot Page - Controls
Upon clicking the "BoxPlot" icon on either Level 1 or Level 2 Visualization Information pages, a box for
the boxplot "Controls" and a box for the interactive boxplot will open, respectively.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level Two Visualization - Primary Report
Level 1 Query Protein Information
SeqAPASS ID: 1290	Query Accession: NP 0001162
Query Species: Homo sapiens
Ortholog Count: 348
Query Domain: (310)cd06949, NR_LBD_ER . Ligand binding domain of Estrogen receptor, which are activated by lhe hormone 17beta-es!radiol (estrogen)
Select to Open Information or Data Visualization
m
Taxonomic
0 Groups
Select
A Species
U for
Legend:
© Optional Selections:
Mammalia	|| Crocodyiia * 1 Aves » 11 Testu dines ¦ j: Leptdosauria	* j i Amphibia « Chondrichthyes "	I Coetacanthifomnes	» 1! Actinopteri » I Cerafodontimorpha ¦
Cladistia » , [ Myxintformes » 11 Petromyzontiformes »| [ Brvalvia " Branchiostomidae « 11 Gastropoda »	Errteropneusta " Priapulimorpha * Ascidiacea «i
Cephalopoda *|i Polychaeta * Arachnida «j. Walacostraca « Insects ¦ Coltembola « Hexanauplia	« LHTopsida * ! Pflfdiophora * Lingutata "ifiacjBa
Clitellata
rillBBSSSitiBlfe—»| ( RBSjaiStMae «F| )'"r^TOta^T=»'| FTfieHopteteidae >rj ["StfyffTRgga "j pCfliRw
Enoplea * 11 Appendicularia * Cestoda * 1 Diplopoda »
tj |	|^H8eor>ycB^Bora »r|	FTimBHiiB
f) Common Name
, Scientific Name
Q Group by Common Name
Ortholog
Threatened
Endangered
Common Model
Candidates
Species:
Species:
Organisms:
y
~
~
~
Download BoxPlot...	Open Size Controls...
E
_03 60-
l
CO 50
c
a?
O 40-
a>
Q.

"Iff

' ifl. # •
: JlUi jf 3f1*ft *-
^ * J I i s ~ | ® l.
L | J S! 11i fl 1J
!1 111 11!If fj
L < "§-1£ < u I J "
Li i, I i ° 1
| J "2 ^ ® c :
Taxon
45

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Manipulating Taxonomic Groups on x-axis
The boxplot controls allow the user to edit the taxonomic groups that are displayed on the x-axis by
clicking on the ""X" for the Taxonomic Group name (e.g., Aves). This action removes the selected group
from the x-axis. To the right of the "Taxonomic Groups" controls box is a drop-down that allows the user
to remove or add back taxonomic groups to the x-axis of the boxplot graphic, by deselecting or selecting
check-boxes in the dropdown. Similarly, unwanted taxonomic groups may be removed directly from the
boxplot by hovering the cursor over the taxonomic groups listed along the x-axis. The user will notice
that the selection arrow changes to a black arrow with a red V next to it; clicking the taxonomic group
will then remove it from the boxplot and the "Taxonomic Groups" controls box. The user can delete
multiple species by pressing CTRL and either clicking individual species or slowly dragging across
multiple species. Additionally, that taxonomic group will have the checkbox deselected in the
"Taxonomic Groups'' controls box drop-down list.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level One Visualization - Primary Report
Query Accession: Hf D00116?
Level 1 Query Protein Information
Select to Open Information or Data Visualization

Controls
		Hit —I r m mum i • a«r ¦ Croeotfylia « Lvpmoauna ¦ Amphibia • cmnmnumiys • Dipnomofpnj • coMontnitornws • Admapt»n • Cbdissa » Hrpossna • Mynm ¦
O Groups	CimoUM • Onracnu • CoUmMa • Hnananplia ¦ Enopta • BraneMopoda • 1 Pycnogofwta ¦ Etilwiuaaj • Ciiaftjts ¦ MwuiluBuo ¦ Magnofcopuda • EuonJlgrada • HototTwronJK
O opoona seiecuons


3$
'w r\ ® -c  o $ _
S H S "5 -K S 45 ® !
1° sill 1 II-
.2 8> c J 2
11J
11 I I
1 IF *

n ni
s 12 § i
J £

i" m
2 § ° ^ t
t §
K J
i 6
46

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Customize BoxplotLegend
The user may customize the "Boxplot" by adding a legend that will pinpoint species of interest on the
boxplot. Upon clicking the drop-down for "Select Species for Legend" in the controls box the user may
search in the text box for specific species to display in the boxplot legend. Upon identifying a species
from the drop-down menu and selecting the checkbox the species name will be placed in the boxplot
legend and a corresponding data point will be produced on the graph. The default settings display the
species common name both in the "Select Species for Legend" dropdown and 011 the boxplot. However, if
the species scientific name is desired, the user can select the radio button for "Scientific Name" in the
controls box for "Species Legend Options." This action will change the drop-down menu and species in
the legend to display the species scientific name.
Note: The database will take a brief moment to update the list upon changing between "Common Name"
and "Scientific Name."
Controls
Mammalia Testudines « Aves * 1 Crocodylia * Lepidosauria ¦ Amphibia » Chondrichthyes * Csratodontimorpha » Coelacanthiformes «
Actinopteri * Cladistia « Petromyzontiformes * Myxiniformes « Enteropneusta ¦ Gastropoda » Bivalvia • Branchiostomldae «
Taxonomic cephalopoda
Hexanauplia
» Priapulimorpha « Ascidlacea ¦ Lingulata • Potychaeta ¦ Arachnida ¦ Malacostraca ¦ Insecta ¦ Collembola
Enopla • Branchiopoda » Echinoidea ¦ Merostomata • Clitellata * Liliopsida * Eutardigrada * Monogononta
Groups:
(x-axis	_ _	_
labels)	Holothuroidea * Rhopaluridae * Anthozoa * Asteroldea » Appendicular^ * Polypiacophora * Hydrozoa ¦ Scyphozoa « Trichoplacidae
Chllopoda * Cubozoa « Udeonychophora * Rhabdltophora * Chromadorea ¦ Enoplea ¦ Trematoda ¦ Cestoda *1 Dlplopoda »
Pilidiophora «
Select
, Species
American beaver * Anna's hummingbird « Bactrian camel
Chum salmon
0 Species Legal
Options:
Aardvark
Abalones
Acorn worms
J Group by Common Name

_ Adelie penguin
1 Endangered
Common Model
q Optional
__ African clawed frog
Species
Organisms
Selections:
African cotton leafworm


Download BoxPlot...
Open Size Controls...
JJ3 60
E
CO 50.
Q)
Q_
•	Abalones	T Chimpanzee
0	American beaver o Chum salmon
¦	Anna's hummingbird
A	Bactrian camel

f!!!i!!iiJ!Jf!!!ffi!Jiiii
S = 3
E T3 <
E B
o xj
i-f 1
« <3 f
i ! s 5
ills
1 i5!
& $ s S 3 H =
? I ! 6 ! „
i * 13 3 5
2
1 1 1 f I I J * 8
I lis
Taxon
47

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Change Species Display on Plot
Multiple scientific names can be represented by only one common name (e.g., Common name: Teleost
fishes; corresponding scientific names: Spinibctrbus denticulatus, Sinocyclocheilus rhinocerous,
Sinocyclocheilns grahami, Sinocyclocheilus anshniensis, Gobiocypris rams, Thamnaconus
septentrionalis). Therefore, if a species common name that represents multiple species was used to create
the legend, and the user decides to instead select "Scientific Name," by default the boxplot legend will
change to display multiple scientific names that representing the individual common name and each
scientific name will be represented by a unique color/shape point on the plot. However, if the user selects
the checkbox "Group by Common Name" in the "Species Legend Options" control box, then the
scientific names that are represented by one common name will all display the same color/shape point on
the plot.
The user has the option of removing selected species from the legend either by removing them directly
from the "Select Species for Legend" drop-down box or by hovering the mouse directly over the species
name in the legend. The mouse will change to a black arrow with a red next to it. Clicking the name
while this arrow is displayed will remove the species from the legend and from the control box.
BoxPlot
Controls
Taxono
H Groups
U-a*e;
Mammalia j Testudines " Aves * Crocodylta * ' Lepidosauria f Amphibia * ClimkliitJWiyw ~*1 Oratodotrthnorpfta CwlwiWgiiiuiw * Actmopteri * Ctadistia *\
Petromyzontiformes « Vyxiniformes * Enteropneusta » Gastropoda »IPEHvaJvIa fi|) Branchiostomidae » Cephalopoda « Pnapulimorpha ¦ Asbdiacea ¦ Lingulata * Polychaeta «
Arachnids « Malacostraca • btSBda * Collembola * Hexanauplia »" hDwphP~TP|	: EfcHhOMea —11 MerOStOmata B WSBMBffBBI fEpopmia EutailbtfH M'
Monogononta * Holothuroidea • Rhopaluridae > Amhozoa |j §! Asfcrbidea • Appertdicularia ¦ Potyplacophora ¦ Hydrozoa « Scyphozoa ¦ Trichoptacida* * CKfopOda - Cubozoa H
Udeonychophora * Rhabditophora » Chromadorea * 'Qidftii W "Trematoda »|i CflgtBBif-*' Dipfopdda '¦ Piiidiophora *
Maliotis diversicolor ¦ Castor canadensis ¦ Calypt* anna * Camelus bactrlanus • Pan troglodytes » Oncortiynchus keta « Gymnogyps califomtanus • Aplysia callfornica «
Sinocyclocbeilus anshulensis • SlnocydocheNus rhinocerous ¦ Sinocyclocheilus grahami ¦ Spinibarbus denticulatus « Gobiocypris rarus IP)
mi	H Common Name	_ ^
"u	|g Group by Common Name
# Scientific Name
O Optional Selections
aSSL	1ST	sSSST Common Model Ogantams
Download BoxPlot.. Open Size Controls...
Hallotls diversicolor
Castor canadensis
Calypte anna
Camel us bactrlanus
Pan troglodytes
Oncorhynchus keta
Gymnogyps callfornianus
Aplysia callfomlca
Sinocyclocheilus anshulensis
Sinocyclocheilus rhinocerous
Sinocyclocheilus grahami
Spinibarbus denticulatus
Gobiocypris rarus
I 1 I f I
i 3 1 i -
s & -§ -§ J £
I 1 8> p = c
s >• s 3
E 2 £
¦=; c ° s
«3 | | £
48

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Customize the Legend to Display Species Groups of Interest
In the "Optional Selections" controls box, the user has the option of displaying "Ortholog Candidates,"
"Threatened Species," "Endangered Species," or "Common Model Organisms." Upon selecting one of
the checkboxes, red data points corresponding to species will be displayed on the boxplot. By hovering
the mouse over a single red point, a pop-up box will appear with the corresponding species name,
taxonomic ID, query protein, and percent similarity.
Note: The user can select to display either species common name or scientific name in the hover over
information box by selecting from the "Species Legend Options."
If the user selects either "Threatened Species" or "Endangered Species," clicking on an individual red dot
will open a new web browser tab and link to the corresponding species page on th US Fish and Wildlife
Service's Environmental Conservation Online System (USFWS, ECOS; e.g.,)
(https://ccos.f\\s.go\/ccpO/profilc/spccicsProfilc'.)sld= 1506).
© Optional Selections:
Ortholog Candidates: Threatened Species: Endangered Species: Common Model Organisms:
a	u	s	a
Download BoxPlot.
Open Size Controls...
Boxplot
• Endangered Species
"l~
03 60-
E
Cf) 50-
C
O 40-f
L_
o
Q_
Rainbow trout (taxid: 8022)
Estrogen receptor isoform X3
64.51% similarity




¦ o
E -
i i |
>5 6
- E	CD
:  co co ra
CO O CO O O <1>
¦a = e
O ^ "O
O Q. X
Q- D
8- & °
6
°	°	O %	CO
u c	c	u Q-	45 .
¦D Q.	Q.	CO O	3
°	°	E	—
Taxon
49

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
BoxPlot Controls Widget for Bar Width, Zoom and Pan
By clicking the "Open Size Controls" button, a "BoxPlot Controls" widget opens that allows the user to
adjust the size of the bars on the boxplot by increasing or decreasing the "Bar Width" using the up and
down arrows. The minimum and maximum size for bars are 6 and 60, respectively. To reset the bar width
on the boxplot to default size, click the "Reset" button to the right of the "Bar Width" adjustment box in
the "BoxPlot Controls" box. The user can also Zoom and Pan the boxplot by toggling the on /off button
under the "Zoom" heading. The user can then zoom in or out by clicking the up or down arrows or
entering a number in the text box and clicking enter. To reset the zoom on the boxplot to default size,
click the "Reset" button to the right of the "Zoom" adjustment box in the "BoxPlot Controls" widget.
The pan option is available when the "Zoom and Pan" option is toggled to the "on" position, which
allows the user to click on the boxplot and drag the plot around the screen to reposition. To reset all
BoxPlot Controls to default settings click the "Reset All" button.
Note: Upon exiting out of the BoxPlot Controls widget, the Zoom and Pan options are automatically
turned off.
BoxPlot Controls
Bar Width


is;
Reset



Zoom


125:
Reset

Zoom & Pan
on
Reset All
Download BoxPlot Widget
To download the boxplot, click "Download BoxPlot" button in the controls box. A "Download Boxplot"
Widget will pop up. It will be necessary to specify which type of file (SVG, PNG, or JPG,) to
downloaded by clicking on the desired radio button for "Image Type." The user may customize the
resolution of the boxplot for PNG and JPG files prior to download by altering the "Width" and "Height"
of the BoxPlot. To change "Width" or "Height," enter the desired number in the text boxes. Click
"Download Image" button to download the file. To close the "Download Boxplot" widget, click the "x"
on the top right of the widget.
Download Boxplot
Image O ®
Type: SVG PNG JPG
Width:
Height:
Download Image
1,236
755
50

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Hover-over Features in the BoxPlot
By hovering over a taxonomic group name on the x-axis of the boxplot, an information box will pop-up
listing the top three species in order by highest percent similarity. If only one or two species are
represented in the taxonomic group, then only those species will be displayed. Hovering the mouse over
any of the species in the boxplot, that is present in the legend, will generate a pop-up box with the
corresponding species name, taxonomic ID, query protein, and percent similarity. The susceptiblity cut-
off is displayed in a pop-up text box upon hovering over the dashed horizontal cut-off line.
Summary> Table for Species in a Speci fic Taxonomic Group
By clicking on a box representing a taxonomic group in the boxplot a table will pop-up providing
summary information for that particular group. The table header will provide summary statistics (i.e.,
mean and median percent similarity), including the Taxonomic Group name, number of species
represented in the box, the overall susceptiblity prediciton for the selected taxonomic group. Data table
includes protein and species information along with metrics for evaluated protein similarity and
predicting suseptiblity. Also inlcuded in the table are columns indicating if a species belongs to a certain
group of interest (e.g., Threatened Species; Endangered Species, Model Organism). Table can be
downloaded by clicking on the icon for excel or csv file.
Interactive Visualization with Level 1 Data Page and Level 2 Data Page
The data visualization is programmed to update with changes made to the Level 1 Data page and Level 1
Data page, respectively. Therefore, if the user updates the Susceptibility Cut-off (See user guide section
Susceptibility Cutoff Box for Level 1 and Susceptibility Cutoff Box for Level 2) to the "Second Local
Minimum" or "User Defined Cut-off," the previously opened data visualization boxplot tab will update
the cut-off accordingly. Similarly, the user modifies the Primary Report Settings (See user guide section
Level 1: Primary Report Settings and Level 2: Primary Report Settings), the data visualization will
update accordingly.
Note: If the user updates the "Primary Report Settings" for "Sorted by Taxonomic Group" the boxplot
will update to display the new taxonomic group selection that is present in the "Filtered Taxonomic
Group" column in the data table. The user should be aware that manipulating the "Sorted by Taxonomic
Group" to a different level in the taxonomic lineage (e.g., from class to order; from class to genus) adds a
larger number of taxonomic groups to the x-axis. Therefore, the plot may require greater user
manipulation using the "BoxPlot Controls" to view the data.
51

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3: Individual Amino Acid Residue Alignment
In the "View SeqAPASS Reports" tab, on the "Level 1 Query Protein Information" page, there is a
"Level 3" dropdown for setting up the query for comparing individual amino acid residues to a template
sequence. It is anticipated that the choice of template sequence and residues that are selected to align will
be derived from the published literature in most cases. Publications evaluating homology models, protein
crystal structures, pesticide field resistance, or utilizing site-directed mutagenesis are a few examples of
the types of studies that may contain such information to guide a Level 3 SeqAPASS evaluation.
Level 3
— Reference Explorer
Additional	I
Names:	I	
Add Protein Name
estrogen receptor isofbrm 1
Remove Selected Protein	Restore Default Proteins
Generate Google Scholar Link
Level 3 Query Amino Acid Residues
NCBI Protein Database extt
Select Template Sequence
O
Additional Comparisons (optional)
I	l»
NCBI COBALT IMj
Enter Level 3 Run Name
«
NCBI Taxonomy Database exit
Choose Taxonomic Group(s)
All Groups	- 0
Use table below to select sequences
0 species selected
Request Residue Run
View Single Report
Choose Query to View
[ -Select Level 3 Run Name - jo
View Level 3 Data
View Combined Report
Combine Level 3 Data
Relevant literature containing these data can be identified using the SeqAPASS "Reference Explorer."
The user can search for literature with the protein(s) of interest with an auto-populated search term that is
integrated into a predefined Boolean string and generate a Google Scholar link that will take them to
scientific articles containing their protein(s).
— Reference Explorer
Additional
Names:
IE
Add Protein Name
estrogen receptor isoform 1
Remove
Selected
Protein
Restore
Default
Proteins
Generate Google Scholar Link
52

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The user can modify the Boolean search string by adding text to the "Additional Names" text box and
clicking the "Add Protein Name" button. By selecting a name that is currently in the text box and clicking
the "Remove Selected Protein" button, the user can delete names from the text box and therefore these
names will not be included in the Boolean string for the Google Scholar search.
— — Reference Explorer
Additional I _ . i
Names	I
Add Protein Name
estrogen receptor isoform 1
oestrogen
Remove

Restore
Selected

Default
Protein

Proteins
Generate Googie Scholar Link
When satisfied with the protein names to be included in the Boolean search string, the user will select the
"Generate Google Scholar Link" button. A pop-up will appear displaying the Boolean sting to be
searched in Google Scholar. The user can continue to modify the Boolean string by clicking in the text
and adding additional infonnation. The Boolean string can be copied and pasted elsewhere by the user by
clicking the "Copy to Clipboard" button. The user can also choose to use the generated Boolean string to
search Google Scholar. To do so the user will select the "Search Google Scholar" button.
Google Scholar
https://scholar qooqle.com/scholar?hl=en&as sdt=0%2C34&q=(estroqen receptor isoform 1 )AND("site-directed mutagenesis"
OR "molecular docking" OR "docking analysis" OR "docking simulations" OR "x-ray crystallography" OR "crystal structure"
OR "homology modeling" OR "protein structure" OR "protein binding" OR "molecular model" OR "binding" OR "field
resistance" OR "amino acid" OR "amino acid residues" OR "mutation" OR "mutations" OR "molecular dynamics" OR
"transcriptional activation" OR "3D-pharmacophore" OR "pharmacophore" OR "structure-based" OR "chemo-bioinformatics"
OR "3D-stuctures" OR "3D-QSAR")
Search Google Scholar	Copy to Clipboard
Upon selecting the "Search Google Scholar" button, a new tab will be generated in the browser for
Google Scholar that contains the Boolean string in the search with publications and articles that matched
the SeqAPASS generated Boolean sting. The literature displayed by Google Scholar for the user should
be evaluated to identify appropriate articles for determining Level 3 template sequences and cntical
individual amino acids for comparisons across species.
@ Seqence Alignment to Predict AX + (estrogen receptor isoform t)AIV X +	— [3 X |
4r C O A scholar.google.com/scholar?hl=en&as_sdt=0%2C346tq=(estrogen%20receptor%20isoform%201)AND("site-directed%20mutagenesis"%200... ~ Q 0 0 « j
Google Scholar (estrogen receptor isoform 1)AND("site-directed mutagenesis" OR "molecular
Articles	About 18,500 results (0.16 sec)	My profile ~ My library £
Any time
Since 2019
Since 2018
Since 2015
Custom range...
Sort by relevance
Sort by date
•/ include patents
•/ include citations
Role of Pit-1 in the gene expression of growth hormone, prolactin, and
thyrotropin
LE Cohen, FE Wondisford. S Radovick - Endocrinology and metabolism ... 1996 - Elsevier
90 The ERE is distinct from but may interact cooperatively with, the other hormone response
elements 1 binding sites and the ER are required for distal enhancer activation by estradiol in
vitro ... Other Pit-1 binding sites also contribute to the estrogen response of the Prl gene, so ...
~ 00 Cited by 187 Related articles All 6 versions Web of Science: 108 S>S>
[html] Understanding the selectivity of genistein for human estrogen receptor-3
using X-ray crystallography and computational methods
ES Manas, ZB Xu, RJ Unwalla, WS Somers - Structure, 2004 - Elsevier
up the possibility of targeting other tissues while avoiding certain classical estrogenic effects both
known to enhance ligand-dependent transcriptional activation of the estrogen receptor and they
GEN, 17-j3 estradiol (E2), diethylstilbestrol (DES), and daidzein (see Figure 1) were
& 00 Cited by 176 Related articles All 7 versions Web of Science: 125 S>J>
[html] sciencedirect.com
53

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
In the "Level 3" box, there is a link out to the "NCBI Protein Database" for identifying the template
sequence of interest. Below this link the user will find a text box where the user can enter an NCBI
Protein Accession with the version number (e.g., NP_000116.2) or a FASTA formatted sequence (e.g., <
>gi|62821794|ref|NP_000116.2| estrogen receptor isoform 1 [Homo sapiens]
MTMTLHTKASGMALLHQIQGNELEPLNRPQLKIPLERPLGEVYLDSSKPAVYNYPEGAAYEFNA
AAAANA
QVYGQTGLPYGPGSEAAAFGSNGLGGFPPLNSVSPSPLMLLHPPPQLSPFLQPHGQQVPYYLENE
PSGYT
VREAGPPAFYRPN SDNRRQGGRERLASTNDKGSMAMESAKETRY C AV CNDY ASGYHY GVWSC
EGCKAFFK
RSIQGHNDYMCP ATN Q CTIDKNRRKS C QACRLRKCYEV GMMKGGIRKDRRGGRMLKHKRQRD
DGEGRGEV
GSAGDMRAANLWPSPLMIKRSKKN SLALSLTADQMV SALLDAEPPILY SEYDPTRPFSEASMMG
LLTNLA
DRELVHMINWAKRVPGFVDLTLHDQV).
Upon clicking on in the "Select Template Sequence" text box, a pop-up message will appear to provide
examples for the proper format of Accessions or FASTA files to be entered. A link out to the NCBI
Protein Database is available for the user and found above the template entry text box.
NCBI Protein Database exit
Iselect Template Sequence
Additional Comparisons (optional)
NCBI COBALT
Enter Level 3 Run Name
NCBI Taxonomy Database i
Choose Taxonomic Group(s)
All Groups
Use table below to select sequences
0 species selected
Request Residue Run
Choose Query to View
-Select Level 3 Run Name
View Level 3 Data
Combine Level 3 Data
View Single Report
View Combined Report
Level 3 Query Amino Acid Residues
-Enter NCBI Protein Accession OR FASTA Sequence-
Examples:
NP 000116.2
OR
>Sequence description in first line
MTMTLHTKASGMALLHQIQGNELEPLNRPQLKIPLERPLGEVYLDSSKPAVY
54

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Additional sequences can (this is an optional field the user can choose to fill in) also be incorporated into
the Level 3 alignment using the "Additional Comparisons (optional)" text box. Upon clicking on the
"Additional Comparisons (optional)" text box, a pop-up message will appear to provide examples for the
proper format of Accessions or FASTA files to be entered.
Note: In the "Additional Comparisons (optional)" text box, zero or more NCBI Protein Accession must
be entered prior to FASTA sequence(s) if they are to be included in the Level 3 alignment.
Level 3 Query Amino Acid Residues



NCBI Protein Database rear
Select Template Sequence

«
I®
-Enter 0 or more NCBI Protein Accession(s) followed by 0 or more FASTA Sequence(s)-
Examples:
NP 000116.2
1JLY_A
>Sequence description of first FASTA
MTMTLHTKASGMALLHQIQGNELEPLNRPQLKIPLERPLGEVYLDSSKPAVY
>Sequence description of second FASTA
XAGLPVIMCLKSNNHQKYLRYQSDNIQQYGLLQFSADKILDPLAQFEVEPSKTYDGLV
Additional Comparisons (optional)
I
NCBI COBALT IBOT
Enter Level 3 Run Name
©



I I
NCBI Taxonomv Database rear



Choose Taxonomic Group(s)



[All Groups ®



Use table below to select sequences



0 species selected



Request Residue Run
View Single Report
Choose Query to View
[-Select Level 3 Run Name - ©
View Level 3 Data
View Combined Report
Combine Level 3 Data
Below the text box where the user can choose to add additional sequences for comparison, is a link to
NCBI COBALT (Constraint-based Multiple Protein Alignment Tool). The NCBI COBALT allows the
user to align multiple sequences and is the alignment tool that SeqAPASS algorithms utilize to set up the
query of individual amino acid residues across species.
Note: The user does not need to use the COBALT link to run a Level 3 evaluation, however the link is
available in case the user chooses to further evaluate or compare multiple potential template sequences.
Under the text "Enter Level 3 Run Name," there is a text box where the user can enter a user defined
name for the run. The user may only enter letters or integers as text for the name. The user defined name
will appear in the "View Level 3 Data" dropdown upon completion of the Level 3 sequence alignment.
55

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3
— Reference Explorer
Additional
Names:
Add Protein Name
estrogen receptor isoform 1
Remove Selected Protein	Restore Defauit Proteins
Generate Google Scholar Link
Level 3 Query Amino Acid Residues
NCBl Protein Database exit
Select Template Sequence
o
Additional Comparisons (optional)
I	1 ©
NCBl COBALT exit
Enter Level 3 Run Name
NCBl Taxonomy Database exit
Choose Taxonomic Group(s)
| All Groups	p~| ©
Use table below to select sequences
0 species selected
Request Residue Run
— View Single Report
Choose Query to View


| -Select Level 3 Run Name - * O
View Level 3 Data


View Combined Report





Combine Level 3 Data


To complete the set-up for a Level 3 query the user must select which sequences to compare to the
identified template sequence. Listed in the "Choose Taxonomic Group(s)" drop-down are all Taxonomic
Groups that were identified as hits in the "Level I" primary amino acid sequence alignment data. Because
COBALT is used to align all sequences that are selected, it is recommended that the user selectively
identify sequences from the hit table below to align. For example, selecting sequences with low similarity
to the template sequence along with sequences sharing high similarity to the template sequence can skew
the alignment because COBALT is trying to align all the sequences together. It is recommended that the
user select sequences by first selecting a taxonomic group from the "Choose Taxonomic Group(s)" drop-
down. The user can also use the NCBl taxonomy link to type in the name of the "Taxonomic Groups '
found in the drop-down to look up which species fall in that group.
56

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3 Query Amino Acid Residues
NCBI Protein Database £Xtt
Select Template Sequence
o
Additional Comparisons (optional)
1	I*
NCBI COBALT EXIT
Enter Level 3 Run Name
[Actinopteri	O
NCBI Taxonomy Database g&rr,
Choose Taxonomic Group(s)
r


Actinopteri

Amphibia

Arithozoa

Appendicularia

Arachnida


-
View Combined Report
Combine Level 3 Data
Note: The "Choose Taxonomic Group(s):" drop-down will display the level of the taxonomic hierarchy
being displayed in the "Filtered Taxonomic Group" column of the "Level 1 Data" table. For example, if
the user changes the default option from "class" to "order," then "order will be displayed in the
dropdown.
+ Reference Explorer
Level 3 Query Amino Acid Residues
NCBI Protein Database
Select Template Sequence
Additional Comparisons (optional)
NCBI COBALT
Enter Level 3 Run Name
NCBI Taxonomy Database iEXiT
Choose Taxonomic Group(s)
All Groups
Combine Level 3 Data
View Combined Report
Level 3
© -
Acipenseriformes
Actiniaria
Amphipoda
Anabantiformes
Anguilliformes
Ansnrifnrrrffi—
57

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
By choosing a group from the drop-down menu, the "Level 1 Data'' table below will be fdtered by the
selected Taxonomic Group (see column "Taxonomic Group" in "Level 1 Data" table). When a
""Taxonomic Group" is selected from the drop-down, it can take up to a few seconds for the "Level 1
Data" table to filter completely, depending on the size of the table. Hie user can then examine each hit
protein in the "Level 1 Data" table and select those that they would like to compare to the template
sequence. To select sequences/species from the filtered "Level I Data" table, the user will select the
check boxes in the first column of the table. Although it is not typically recommended, the user may also
select the header check box in the first column to select all sequences/species in the filtered table.
Note; The user can also type the "Taxonomic Group" of interest in the text search box at the top of the
drop-down for quick filtering.
Below is an example where the user selected the "Taxonomic Group" Actinopteri from the drop-down
and then selected individual sequences/species to align with the template sequence. The number of
selected species will be shown in the text above the "Request Residue Run" button.
j-Satect CcmpMedDomam ¦ •	I	I ®
View Level) Qui	I	NCBJ C08ALT B*>
EnBi Level 3 Run Name
AcWOPM'.	"j O
NCBI ravooomv Database ebt
Use table Oslo® to setect sequences
Request Resusue Run
CRoose Query to Vow
i -Select Level 3 Run Name - • j ©
View ConUnoed Report
Primary Report Settings
Visualization
m CMio runs w* ocwn rt a separate tab
Level 1 Summary Repoit
Level 1 Data - Primary
nt Level 1 Repot: Seainos
LCOIOX WrdQet
Search:! Actinopten
(See Search, View, and Download Data Tables section of user guide for more information)
The user can choose to align sequences/species from multiple taxonomic groups with the template
sequence, by going back to the "Choose Taxonomic Group" drop-down and selecting another group,
which filters the Level 1 table based on the group selected, and then the user can select additional species
from the newly filtered table. As before, the number of selected species can be tracked in the text above
the "Request Residue Run" button that reads "X species selected".
When the user has selected all sequences they want to align, then click the "Request Residue Run" button.
Upon successful submission of a Level 3 query the user will see the following pop-up message. If
submission is unsuccessful, a message will appear describing the reason for the unsuccessful submission.
58

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
j Level 3 Run Requested
Status queued
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings


SeqAPASS Reports

Version 6.0


Logged in as: Donovan Blatz
To update the "Choose Query to View" drop-down menu with the completed Level 3 alignments, the user
can click on the "Refresh Level 2 and 3 runs" button.
Main Level 1 Level 2
Level 1 Query Protein Information
Hit proteins are identified for the following query protein. Use the main button to
o back to the SeqAPASS Reports fist




SeqAPASS ID: 2295 Query Accession: NP 000116 2 ;eh»
Ortholog Count: 656 Protein and Taxonomy Da
ta: 04/28/2021



Query Species: Homo sapiens

BLAST Version: 2:11.0




Query Protein: estrogen receptor isoform 1

Software Version: 5.1





Susceptibility Cut-off
a
Level 2

"*¦ IH
Level 3
oti~]

Primary Report Settings
fl£






Visualization
oe
Refresh Level 2 and 3 runs





Visualize Data ^ This will open in a separate tab.






Additionally, the user can check the status of the Level 3 run by clicking the "SeqAPASS Run Status" tab
and the radio button for '"Level 3 Status." Typically. Level 3 alignments complete in a few seconds. When
the Level 3 query completes and the Level 1 page has been updated, the user defined Level 3 Run Name
will be available in the "Choose Query to View" drop-down menu. After selecting the desired Run Name
from the drop-down, click "View Level 3 Data" button to view the aligned sequences and set up the
individual amino acid residue alignments with the selected sequences/species.
View Level 3 Data
Choose Query to View
-Select Level 3 Run Name -
Actinopteri
Amphibia
Chondrichthyes
COBALT v1 to COBLAT v2
View Level 3 Data
Choose Query to View
Actinoptefi
View Level 3 Data
Upon a successful Level 3 query submission a pop-up message will be displayed as follows in the upper
right-hand side of the screen:
59

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View Single Report
Choose Query to View
-Select Level 3 Run Name - - ©
View Level 3 Data
View Combined Report
Combine Level 3 Data
Once the Level 3 run has completed, the user can select the "Select Level 3 Run Name" drop down in the
"View Single Report" box to view an individual user defined Level 3 run. If the user has completed
multiple Level 3 alignments, between a template sequence and more than one taxonomic groups, the user
can combine Level 3 reports by selecting the "Combine Level 3 Data" button. A pop-up will appear for
the "Combine Level 3 Reports"'. There are a series of three steps to combine Level 3 reports. First the user
will "Choose a Level 3 Template" from the dropdown that contains a list of all templates used to generate
alignments in Level 3 by the user. The template sequence must be in-common to the Level 3 runs that will
be combined.
Combine Level 3 Reports
~ 1

Level 3 Jobs
Order Level 3 Jobs


Choose a level 3 Template:


-Select Level 3 Template -
Mi






NP 000116.2
(user defined) NP_000116,2 estrogen receptor isoform 1 [Homo sapk
3* I

After selecting the template, the user will click the "Next" button. At this point the user will select all
Level 3 Jobs that are to be combined by selecting the check box in the "Level 3 Jobs" dropdown next to
the user defined names. After all jobs that are to be combined are selected the user will click the "Next"
button. Note that as the user moves through each step of the Combine Level 3 Reports feature, the step
the user is currently on is indicated by highlighting the button in blue coloring (example "Level 3 Jobs"
button is highlighted when working on selecting Jobs to combine).
60

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Combine Level 3 Reports
Level 3 Templates	rgw»i«i~-d Order Level 3 Jobs
Choose level 3 Job(s):
Choose level 3 Job(s)
1 ~ o
_
Amphibia

¦
Aves

¦
Actinopteri

The next step in the "Combine Level 3 Reports"' feature is to put the jobs in order as to how they should
be displayed in the output. Typically, sequences from an individual taxonomic group are aligned to a
template sequence and named accordingly (e.g., Actinopteri, Amphibia, Aves, etc.). It may be useful to
order the combined report similarly to how the taxonomic groups are displayed on the x-axis of the Level
1 or Level 2 data visualization. Therefore, the user can select the user defined name from the "Order
Level 3 Jobs: " text box and drag and drop the name to the desired order from top to bottom . To move on
to select individual amino acids for sequence comparisons the user will select the "View Level 3 Data"'
button.
Combine Level 3 Reports
Level 3 Templates Level 3 Jobs
Order Level 3 Jobs
Order Level 3 Jobs:
Amphibia
I Aves
B
View Level 3 Data
The order selected will translate to the top to bottom order displayed in the data table, with the template
sequence only displayed once in the first row and all selected jobs below.
Level 3 Data - Primary
The following links exit the site fofiSfL	Download Current Level 3 Report Settings
Search: Enter keyword ®
Data
Version
Job Name
NCBI Accession 0
Protein
Count 0
¦fax^E)6* Taxonomic Group 0
Scientific Name 0
4
Amphibia
NP 000116.2
1265506
9606
Mammalia
Homo sapiens
4
Amphibia
OCT77903.1
130454
8355
Amphibia
Xenopus laevis
4
Amphibia
BAF30926.1
83
166789
Amphibia
Andrias japonicus
4
Amphibia
AUW64608 1
1591
141262
Amphibia
Andrias davidianus
4
Amphibia
BAE81788.1
94392
8364
Amphibia
Xenopus tropicalis
4
Amphibia
BAJ05031.1
18
2040589
Amphibia
Sclerophrvs capensis
4
Aves
XP 0194684581
34219
9103
Aves
Meleaaris aallopavo
4
Aves
XP 025978017 1
31563
8790
Aves
Dromaius novaehollandiae
4
Aves
KFQ02396 1
30590
8969
Aves
Haliaeetus albicilla
4
Aves
XP 010580195.1
25311
52644
Aves
Haliaeetus leucoceDhalus
(1 of 2)	1 2 | »""[£] 10' Download Table: ^
61

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View Level 3 Individual Amino Acid Query and Data Page
Clicking the "View Level 3 Data" button, the Level 3 data page opens. The "Level 3 Template Protein
Information" box contains the SeqAPASS Run ID, Query Accession (with link out to NCBI), Ortholog
Count (# of hits identified as ortholog candidates to the query species protein sequence), NCBI Data
(displays the date that NCBI databases and executables were downloaded and incorporated into
SeqAPASS), Level 3 Run Name (defined by user), Template Species (Entered by user in Level 3 query),
Template Protein, and Query Residues (this field is populated with residues upon selection and successful
table update).
Level 3 Template Protein Information
Individual amino acid residue(s) aligned with template sequence. Us
SeqAPASS ID: 1290
Level 3 Run Name: Actinopteri
Template Species: Homo sapiens
Template Protein: [NP_000116.2] estrogen receptor isoform 1
Query Residues: No Residues Selected
Show Amino Acid Info...
Ortholog Count: 348
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
Cobalt Data: 07/09/2010
Cobalt Version: 2.1.0
Software Version: 3.2
Select Amino Acid Residues

1M
2T
3M UJ
4T

Enter Amino Acid Residue Positions

6H
7T

Copy to Residue List

9A ~


Update Report


£ Primary Report
Q Full Report











View Level 3 Summary Report













Level 3 Data - Primary









The following links exit the site exit





Download Current Level 3 Report Settings










Search: Enter keyword ®


»,°'l NCBI Accession S
Protein
Count S
Species
Tax ID o
Taxonomic Group C
Scientific Name »
Common Name C
Protein Name $
Analysis Completed S
Similar
Susceptibility as
Template 0


4 NP 000116 2
1265506
9606
Mammalia
Homo saDiens
Human
estroaen receDtor isoform 1
2019 08 2914:55:59
TBD

4 AAU87498.1
495
90988
Actinopteri
Pimeohales oromelas
Fathead minnow
estroaen receDtor alDha
2019 082914:55:59
TBD

4 XP 014061037.1
112166
8030
Actinopteri
Salmo salar
Atlantic salmon
PREDICTED: estroaen receDtor isoform X2
2019 08 2914:55:59
TBD

4 XP 020570152.1
47555
8090
Actinopteri
Orvzias latioes
Japanese medaka
estroaen receotor
2019 08 2914:55:59
TBD


4 XP 021454037.1
124397
8022
Actinopteri
Oncortivnchus mykiss
Rainbow trout
estroaen receDtor isoform X3
2019 08 2914:55:59
TBD


4 AAI62466.1
87698
7955
Actinopteri
Danio rerio
Zebrafish
Estroaen receotor 1
2019 08 2914:55:59
TBD


(1 of 1) 1 •• (10 » Download Table: •' —

The user can view the "Level 3" data page, which includes the Data Version, NCBI Accession, Protein
Count, Taxonomic information, Protein Name, and date/time the Level 3 run completed. The data table
remains in order of percent similarity, with those sequences having the highest percent similarity to the
template sequence, on the top, to those with the lowest percent similarity on the bottom. (See Search,
View, and Download Data Tables section of user guide for more information).
62

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
For additional information on Amino Acid Residues, including definition of the acronym, the amino acid
residue name, the classification for the amino acid side chain and the size of the amino acid residue based
on molecular weight, the user can click the "Show Amino Acid Info..." button. A pop-up table, "Amino
Acid info," will be displayed providing this information.
Level 3 Template Protein Information
Individual amino acid residue(s) aligned with template sequence. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1290	Query Accession: NP 000116 2 Bar	
Level 3 Run Name: Actinopteri
Template Species: Homo sapiens
Template Protein: [NP_000116.2] estrogen receptor isoform 1
Query Residues: No Residues Selected
Ortholog Count: 348
Amino Acid info
Show Amino Acid Info...




1M


2T


3M ||y* |


4T


5L


6H


7T


8K

9A

Update Report
^ Primary Report
Q Full Report
The following links exit the site tESTO
ID 0
Name 0
Side Chain 0
Size o
A
Alanine
Aliphatic 89094
C
Cysteine
Sulfur-Containing
121.154
D
Aspartic Acid
Acidic
133.104
E
Glutamic Acid
Acidic
147.131
F
Phenylalanine
Aromatic
165.192
G
Glycine
Aliphatic
75.067
H
Histidine
Basic
155.156
1
Isoleucine
Aliphatic
131.175
K
Lysine
Basic
146.189
L
Leucine
Aliphatic
131.175
M
Methionine
Sulfur-Containing
149.208
N
Asparagine
Amidic
132.119
P
Proline
Aliphatic
115.132
Q
Glutamine
Amidic
146.146
R
Arginine
Basic
174.203
S
Serine
Hydroxylic
105.093
T
Threonine
Hydroxylic
119.119
U
Seleno-cysteine
Sulfur-Containing
168.064
V
Valine
Aliphatic
117.148
w
Tryptophan
Aromatic
204.228
X
Unknown
Unknown
Y
Tyrosine
Aromatic
181.191

b
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2 8.1
Cobalt Data: 07/09/2010
Cobalt Version: 2 10
Software Version: 3.2
Download Current Level 3 Report Settings
To obtain individual amino acid residue alignment data in the Level 3 data table, the user must use the
shuttle in the "Level 3 Template Protein Information" box to select positions and amino acid residues
from the chosen template sequence to align with the sequences/species that were selected by taxonomic
group. Single letter abbreviations are used for the amino acid sequences.
G: Glycine	A: Alanine S: Serine T: Threonine
L: Leucine	I: Isoleucine M: Methionine P: Proline
Y: Tyrosine	W: Tryptophan D: Aspartic Acid
N: Asparagine	Q: Glutamine H: Flistidine K: Lysine
C: Cysteine V: Valine
F Phenylalanine U: Seleno-cysteine
E: Glutamic Acid
R: Arginine
63

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Select Amino Acid Residues
1M
>

3M
2T
~

219Y
4T


267H
5L


26SK
6H

I - I
272D
7T

[	J
594T
BK



Q6



Update Report
The user can select one residue at a time by clicking and highlighting the residue of interest and then
clicking the top right arrow shuttle button to move the residue to the right-hand box for inclusion in the
alignment. Each time a residue is added to the right-hand box, the left-hand box resets itself to the 1st
residue. Or the user can select multiple residues at the same time by holding the Ctrl button, clicking on
residues, and then clicking the top right arrow shuttle button to move the residues to the right-hand box.
The user can choose to remove selected residues by using the left arrow button to clear one at a time or
the double left arrow button to remove all selected residues at once. When residues of interest (likely
defined from the literature as described above) have been selected, click the "Update Report" button,
which then updates the "Level 3 Data" table with the individual residue alignment data.
Alternatively, the user can enter the amino acid positions in the "Enter Amino Acid Residue Positions"
text box (e.g., 351,353,362) and click the "Copy to Residue List" button.
Upon clicking "Copy to Residue List" the "Select Amino Acid Residues" shuttle box is populated with
the position and residues typed. The user can then click the update Report button to produce Level 3
results in the table below.
Enter Amino Acid Residue Positions
351.353,362,364,394,524	|
Copy to Residue List
1M

21

3M

4T

5L

6H

7T

8K

9A



H
Q
H
351D
353E
362K
364V
394R
524H
Select Amino Acid Residues	0 —
Enter Amino Acid Residue Positions
351,353,362,364,394,524
Copy to Residue List
Update Report
The individual amino acid residue alignment data will then be updated on the right most columns of the
Level 3 Data table. The user can submit a maximum of 50 individual amino acid residues from the
template sequence to compare to the other selected sequences. The individual amino acid residues will be
listed in numerical order starting with the 1st position in the template sequence to the last position in the
template sequence.
64

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3 Data - Primary Report
The default report is the "Primary Report" and can be recognized as such because the radio button for
"Primary Report" above the "Level 3 Data" table is selected.
The "Primary Report" columns for the alignment will be titled "Similar Susceptibility as Template" ("Y"
or "N" for yes or no, respectively), followed by Position 1, Amino Acid 1, Total Match 1, Position 2
Amino Acid 2, Total Match 2, Position 3, Amino Acid 3, Total Match 3.... The template sequence will
always be in the top row of the "Level 3 Data" table followed by the previously selected sequences.
Further, the residues selected in the shuttle will also be displayed in the top row corresponding to the
template sequence. Each Position and Amino Acid in the following rows are those corresponding to the
Protein Accession identified in that row and aligning with the template sequence. The Total Match X
describes whether the amino acid residue matches the template based on side-chain classification and
molecular weight, "Y," for yes, or "N," for not a match to the template. The user can evaluate this data to
understand how well conserved an amino acid residue is across species or in a species of interest to add
an additional line of evidence to support (or question) susceptibility predictions. The user can also
download the current report settings by selecting the "Download Current Level 3 Report Settings." This
csv allows the user to track which settings were used or changed by the user when downloading a data
table.
® Primary Report
0 Full Report
View Level 3 Summary Report
Level 3 Data - Primary
The following links exit the site EXIT 1
Download Current Level 3 Report Settings
Search: Enter keyword
Protein Name 5
Analysis Completed 0
Similar
Susceptibility as
Template 0
Position 1
Amino Acid
Total Match

Position 2
Amino Acid
Total Match
Position 3
Amino Acid
Total Match
Po:
estrooen receptor isoform 1
2019 0
2914:55:59
Y
351
D
Y

353
E
Y
362
K
Y

estroaen receptor alpha
2019 0
2914:55:59
Y
320
D
Y

322
E
Y
331
K
Y

PREDICTED: estroaen receDtor isoform X2
2019 0
2914:55:59
Y
316
D
Y

318
E
Y
327
K
Y

estroaen receptor
2019 0
2914:55:59
Y
355
D
Y

357
E
Y
366
K
Y

estroaen receDtor isoform X3
2019 0
2914:55:59
Y
319
D
Y

321
E
Y
330
K
Y

Estroaen receptor 1
2019 0
2914:55:59
Y
319
D
Y

321
E
Y
330
K
Y



(1 Of 1) 1

10 ~ Download Table:







When downloading the current "Level 3 Report Settings", the following information will be present in the
csv. If the user decides to change the default settings, the csv can be utilized for quick information if the
SeqAPASS page is no longer open.

A
B
1
Level 3 Report Settings

2


3


4
Analysis TimeStamp
2019 05 16 11:04:08
5
SeqAPASS version
3.2
6
Level 3 Run Name
Actinopteri
7
Template Species
Homo sapiens
8
Template Protein
[NP 000116.2] estrogen receptor isoform 1
9
Query Residues
1M, 2T, 3M, 4T, 5L, 6H, 71, 8K, 9A, 10S
10
Query Accession
NP 000116.2
65

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3 Data - Full Report
The user may choose to view the Full Report for Level 3 data by selecting the radio button above the
"Level 3 Data" table for "Full Report." The table below will automatically update to display all of the
alignment details.
The "Full Report" columns for the alignment will be titled "Similar Susceptibility as Template" ("Y" or
"N" for yes or no respectively), followed by Position 1, Amino Acid 1, Direct Match 1, Side Chain 1,
MW1, MW Match 1 Total Match 1, Total Match 1, Position 2, Amino Acid 2, Direct Match 2, Side Chain
2, MW2, MW Match Total Match 2, Total Match 2	The template sequence will always be in the
top row of the "Level 3 Data" table followed by the previously selected sequences. Further, the residues
selected in the shuttle will also be displayed in the top row corresponding to the template sequence. Each
Position and Amino Acid in the following rows are those corresponding to the Protein Accession
identified in that row align with the template sequence. The Total Match X describes whether the amino
acid residue matches the template based on side-chain classification and molecular weight, "Y," for yes,
or "N," for not a match to the template. The user can evaluate this data to understand how well conserved
an amino acid residue is across species or in a species of interest to add an additional line of evidence to
support (or question) susceptibility predictions.
Hnmary Keport
View Level 3 Summary Report
% Full Report	^
Level 3 Data - Full
The Mowing links exit the site | EXIT	Download Current Level 3 Report Settings
Search: Enter keyword ®
Analysis Completed 0
Susceptibility as
Template 0
Position 1
Amino Acid 1
Direct Match 1
Side Chain 1
Side Chain
Match 1
MW 1
MW Match 1
Total Match 1
Position 2
Amino Acid 2
201908 29 14:55:59
Y
351
D
Y
Acidic
Acidic
Y
133.104
Y
Y
353
322
E
E
201908 29 14:55:59
Y
320
D
Y
Y
133.104
Y
Y
201908 2914:55:59
Y
316
D
Y
Acidic
Y
133.104
Y
Y
318
E
201908 29 14:55:59
Y
355
D
Y
Addic
Y
133.104
Y
Y
357
E
201908 29 14:55:59
Y
319
D
Y
Acidic
Y
133.104
Y
Y
321
E
201908 29 14:55:59
Y
319
D
Y
Acidic
Y
133.104
Y
Y
321
E



(1 Of










D 1
10
' Download Table: * * —




The "Direct Match X" column describes whether the hit amino acid is an exact match to the template
amino acid, providing a "Y" or "N" for yes or no, respectively. The "Side Chain X" column indicates the
side chain classification for the amino acid residue (click on "Show Amino Acid Info... for more
information on classifications). The "Side Chain Match X" column indicates whether the hit side chain
has the same classification as the template amino acid, providing a "Y" or "N" for yes or no, respectively.
The "MW X" column indicates the molecular weight (g/mol) of the amino acid residue and the "MW
Match X" column indicates whether the hit molecular weight has a difference in molecular weight greater
than or equal to 30 g/mol compared to the template amino acid, providing a "Y" or "N" for yes or no,
respectively. For the "Total Match X" to be "Y," both "Side Chain Match X" and "MW Match X" should
be either "Y" and Y" or one "Y" and one "N," respectively. Only if both "Side Chain Match X" and
"MW Match X" are "N" and "N," then the "Total Match X" is "N" for no. Ultimately, the Total Match 1,
2, 3, 4.... are used to inform the "Similar Susceptibility as Template" column. If there is one or more "N"
for Total Match comparing any amino acid residue to the template across a row for a given species, then
the "Similar Susceptibility as Template" is "N" for no, indicating that the hit species is predicted NOT to
have the same susceptibly prediction as the template sequence. However, if all "Total Match X" are "Y"
for yes, then the "Similar Susceptibility as Template" is "Y" indicating that the hit species is predicted to
have the same susceptibly prediction as the template sequence.
66

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Multiple Level 3 Runs Requiring the Same Amino Acid Residue Comparisons
Typically, Level 3 individual amino acid residue alignments are submitted repetitively, comparing species
from one taxonomic group at a time to the template amino acid residue(s).
Choose Query to View
Actinopteri
Amphibia
Aves
Crocodyliadae
Dipnoi
-Select Level 3 Run Name -
Lepidosauria
mammalia
Testudines
Therefore, to increase efficiency in submitting the same alignments in Level 3 over and over again, the
user can take advantage of the "Copy to Residue List" button. For the first alignment of amino acid
residues, the user would select the amino acid residues to align and click the "Update Report" button.
Select Amino Acid Residues 0~




1M
3M "H
4T
£ a
7, s
9A
351D
355V
356H
375Q
400G
Enter Amino Acid Residue Positions
[351,355,356,375,400

Copy to Residue List


Update Report

By clicking "Update Report" the residues that were selected will be copied into the "Enter Amino Acid
Residue Positions" text box. When the user selects a new "Level 3 Run Name" (from the same Level 1
query accession) to view by using the "View Level 3 Data" dropdown and clicking the "View Level 3
Data" button on the "Level 1 Query Protein Information" page, the "Enter Amino Acid Residue
Positions" text box will be populated with the amino acid residues selected from the previous run.
Enter Am i n o Aci d Res i d ue Pds iti on 5
I 351,353,362,364,394,524)
Enter residue positions as a comma separated list
Copy to Residue List
The user can keep, add, or delete, residue positions in this box and click "Copy to Residue List" button.
The amino acid residues will then be moved to the "Select Amino Acid Residues Shuttle" and the user
can then click "Update Report" to view the data in the table below.
67

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Heat Map
The Heat Map is a feature that allows the user to have a visual representation of the chosen amino acid(s)
for a single Level 3 run. The Heat Map utilizes color to denote which amino acids are a total match,
partial match and not a match to the template sequence. The Heat Map is accessed within the "Level 3"
page under the "Visualization" drop down and will open up in a separate tab. The Heat Map has many
similar features to the Level 1 and 2 boxplot with some added customizable features. There are many
settings that can be changed within the Heat Map and if necessary, there are informational buttons that
can be opened to get added information regarding the different options.
To get to the Heat Map, open a completed Level 3 run and click the "Visualization" drop down then
select the "Visualize Data" button. This will bring you to the Heat Map where there is information
regarding the features of the map. Then select the "Heat Map" icon to access the Heat Map itself.
©0
Visualization
Visualize Data
This will open in a separate tab.
143E
202C
1M
21
3M
4T
6H
7T
8K
9A
10S
11G
Update Report
Select Amino Acid Residues
Enter Amino Acid Residue Positions
Copy to Residue List
The default order of the taxonomic groups is based on how the species are selected during the Level 3 set
up process. There is the option to include all taxonomic groups or a user chosen few. To move the
taxonomic groups over to place them in order you must either click or *CTRL* click and select the arrow
pointed to the right. Once the taxonomic groups are moved over, the user can order the groups by
dragging them up or down.
Level 3
Taxonomic Groups
Mammalia
Testudines
Aves
Crocodylia
Lepidosauria
Amphibia
Dipnoi
Order Level 3
Taxonomic Groups
Level 3
Taxonomic Groups
Aves
Crocodylia
Amphibia
Order Level 3
Taxonomic Groups
Mammalia
Lepidosauria
Testudines
Dipnoi

Report Options
There are multiple options within the Heat Map that can be changed based on what information the user
desires to have present. The Heat Map itself can be changed between the "Simple" report which shows
the amino acid and its respective position or the "Full" report which gives added information about each
amino acid. The user can also change between the common name and scientific name displayed on the
Heat Map.
68

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Optional Selections
The '"Optional Selections" for the Heat Map will highlight the name for each respective species based on
what is selected; Qrtholog Candidates, Threatened Species, Endangered Species, Common Model
Organism. Only one optional selection can be highlighted at a time.
J Total Match
Q Partial Match
Not a Match Q Threatened Species

Common Name
Amino Acid
1
Amino Acid
2
Amino Acid
3
Amino Acid
4
Human

32K
I!
46S
55P
64A
Diamondback terrapin

32K


46S
55P
64T
Western painted turtle

32K


468
55P
64T
Chinese soft-shelled turtle

32K

46S
55P
64T
Terrapins
32K

46S
55P
64T
Goodes thornscrub tortoise H
32K
II
46S
55P
64T
Pacific ndley

32K

46S
64T
Painted turtle

32K
I
46S
55P
64T
Green sea turtle

32K
II
46S
55P
64T
Three-toed box turtle

61K
I
75S
84P
93T
Heat Map Settings
Changing the "Heat Map Settings" will give the user the option to display specific information in the Heat
Map. The user can select or deselect a variety of the settings to have a customized Heat Map.
Report Options
Report Type
(if) Simple
Full
Species Name Type
• Common Name
Scientific Name
n
Optional Selections

OH
Ortholog Candidates p- Threatened Species Endangered Species
Q Common Model Organisms


Heat Map Settings

OH
m
0 0
¦
H
Susceptibility Prediction Heat
Map
Susceptibility Prediction Alignment Prediction Heat
Text Map
Amino
Acid
Amino Acid
Position
69

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Common Name
Human
Similar
Susceptibility
Amino
Acid 1
Y
I
133E
Amino
Acid 2
135E
Western painted turtle
Y
Lappet-faced vulture
I
][
127E
I
129D
N
Nile crocodile
I
125E |[ 127E
Y
129D
	If	*if	
Y	127E 129E
Japanese giant salamander |
West African lungfish
|	Y	| 131E |[ 133E
Amino
Acid 3
Amino
Acid 4
137S
584I
131S
577V
129G
575I
131S
575I
131N
576V
130G
573L
135S
580G
~
~
Total Match
Partial Match
Not a Match
Above is an example of a simple report which shows the amino acid and its respective position. Each
amino acid is compared to the template species and can receive a dark blue color (Total match), a light
blue color (Partial match), or a yellow color (Not a match). To access more information regarding each
amino acid, the user can scroll over the amino acid box to bring up a box with added data.
How amino acids are compared to the template: Comparing Side Chain Classification (acidic, basic,
aromatic, etc.) and Molecular weight as surrogate for size (> 30g/mol different in size). Both the same
(total match), One the same (partial match), Both differ (not a match).
Below is an example of a full report which also shows the amino acid and its respective position but also
shows the amino acid's side chain, molecular weight and if it is a Total match (dark blue) or Not a match
(yellow) to the template species.
Common Name
Similar Amino
Susceptibility | Acid 1
Side
Chain 1
MW 1
Total
Match 1
Amino Side
Acid 2 Chain 2
MW 2
Total Amino
Match 2 Acid 3
Side
Chain 3
MW 3
Total
Match 3
Human
Y ][ 274G
Aliphatic
75.067
Y
275E ]( Acidic
147.131
Y 276G
Aliphatic
75.067
Y
Western painted turtle
[ 268Q
Amidic
146.146
N
269D
270A
Aliphatic
89.094
Y
Nile crocodile
m 268Q
Amidic
146.146
N
269D
133.104 ][ Y 270A
Aliphatic | 89.094
Y
Split-tongued squamates
[ 268Q
Amidic
146.146
N
269D 1 Acidic
[ 270S
Hydroxylic][ 105.093
N
Japanese giant salamander
267P
Aliphatic
115.132
v
268D
133.1041 Y 269Q
Amidic 1146.146
N
~ »
70

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The example below shows only the "Alignment Prediction" (Amino acid match against template) for each
amino acid in chronological order.
Common Name
Amino
Acid 1
Amino
Acid 2
Amino
Acid 3
Amino
Acid 4
Human




Western painted turtle




Lappet-faced vulture



Nile crocodile




Split-tongued squamates



Japanese giant salamander




West African lungfish




~
~
Total Match
Partial Match
Not a Match
There is added information for each species (NCBI Accession, Protein Name, Scientific Name, and
Taxonomic Group) along with each amino acid (Amino Acid Name, Abreviation, Side Chain, and
Molecular Weight). This can be found by scrolling over the species name or the amino acid.
Common Name
Similar
Susceptibility
Amino
Acid 1
Amino
Acid 2
Human
-
274G
275E
Western paintec
Nile crocodi
Split-torigued squ
Japanese giant sal
~
~
Total Match
Partial Match
No! a Match
NCBI Accession
Protein Name
Scientific Name
Taxonomic Group Mammalia
NP 000116.2
estrogen receptor isoform 1
Homo sapiens
Ortholog Candidate
ty
Amino
Acid 1
Side
Chain 1
MW 1
Total
Match 1

274(3
Aliphatic
75,067
Y
¦
268Q
V.
Name
Glycine
N
¦
268Q
Abv
G
N
¦
268Q
Side Chain
Aliphatic
N
¦
267P
MW
75.067
Y
71

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
To push the designed Heat Map to the Decision Summary Report as a visualization, press the "Push Level
3 Heatmap to DS Report" button. It will then be active within the DS Report Level 3 section. To
download the Heat Map, press the "Download Heatmap..." button. The Heat Map can be downloaded as
an SVG, JPG or PNG.
Download
Heatmap...
Push Level
3 Heatmap
To DS
Report
Decision Summary Report
The "Decision Summary (DS) Report" is a feature that gives the user options to design a single output
page to concisely view results from all Levels of the SeqAPASS evaluation for completed jobs. The
output is customizable to include visualizations and susceptibility predictions that can be downloaded in a
PDF format. The "DS Report" page becomes activate when the user takes action on a result page to push
tables or visualizations to the DS Report. The "DS Report" page will contain a maximum of one Level 1
output (and visualization) and one Level 3 output (and visualization) but can contain multiple Level 2
domain outputs (and their respective visualizations).
Main
Level 1
Level 2
Level 3
DS Report
To push results from any Level to the DS Report, the user must press the "Push Level # To DS Report"
button. The "DS Report" button will become active for the user to view the report settings. The DS
Report can be updated as the user changes settings in Level 1, Level 2 and Level 3 (Adding or removing
O amino acids) but the user must push the updated report
to the DS Report again using the "Push Level # To DS
Report" button (There will be a notification next to the
button if settings have been updated to remind the user to push the report). If the user chooses to change
to a different SeqAPASS job (e.g., a different protein accession), the "DS Report" button will become
inactive and the user must push the data from the new job to the DS Report as described previously.
Level 1 of the Decision Summary> Report
Upon clicking the "DS Report" button, the user is brought to a new page that will contain the "Level 1
Report" section of the DS Report which will show all the pertinent information for the query protein and
report settings that were pushed to the report. The user can also include the Level 1 visualization in the
DS Report by going to the "Level 1 Visualization" page and clicking "Push to Boxplot to DS Report".
72

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The default visualization or a user customized visualization will then be inserted in the downloadable DS
Report PDF once the radio button is selected.
Level 1 Info
Add Level 1 Info to Report
Level 1 Query Protein Information
SeqAPASS ID: 1631
Query Species: Homo sapiens
Query Protein: estrogen receptor isoform 1
Query Accession: NP 000116.2 Uit
Ortholog Count: 410
Protein and Taxonomy Data: 06/08/2020
BLAST Version: 2.10.0
Software Version: 4.1
Report Settings
Report Type: Primary
E-Value: 0.01
Sorted By Taxonomic Group:
CLASS
Common Domains: 1
Species Read-Across: Y
Cut-off %:34.43
Show Only Eukaryotes: Y
Optional Components
Component	Add to Report
Level 1 Visualization
Once the user is satisfied with the data that has
been pushed to the DS Report, the "DS Report"
button will bring the user to the "Level 1 Report"
section which gives the user customizable
options. In the "Level 1 Report" section, there is a
series of checkboxes in the "Select Taxonomic
Groups (CLASS)" box. Here the user can select
which taxonomic group(s) they would like to
select and display in the DS Report. Upon
selecting the taxonomic group(s), the user can
then customize the report in the "Select Species"
box, by selecting the checkbox next to the species
for which the user would like data from Level 1
displayed in the "Final Decision Summary
Report" table at the bottom of the page. The template species will always be selected and cannot be
deselected. Specie(s) will be active only when at least one taxonomic group is selected in the "Select
Taxonomic Groups (CLASS)" box. Level 1 results for those species selected from the "Select Species"
box will be integrated in the "Final Decision Summary Report" table at the bottom of the page (Note: if
the user does not push a Level 1 job to the "DS Report" page, there will be no information in that
section).
Level 1 Info
The Level 1 information section becomes present when either a Level 1 report or a Level 1 visualization
is pushed to the DS Report. The information contained in the section includes the "Level 1 Query Protein
Information" (i.e., SeqAPASS ID, Query Species, Protein, and Accession, Ortholog Count, Protein and
Taxonomy Data, Blast Version and Software Version.) as well as the "Report Settings" (i.e., Report Type,
E-Value, Sorted By Taxonomic Group, Common Domains, Species Read-Across, Cut-Off, and Show
Only Eukaryotes.) and finally the "Optional Components" section which contains the option to include
the "Level 1 Visualization" to the report.
Including Visualizations in DS Reports
The user can also include the "Level 1 Visualization" by going to the visualization page and either
pushing the default visualization or a user modified visualization which will then be attached in the
downloaded PDF once the radio button is selected. In the scroll downs, the template species will always
be selected and cannot be deselected. Specie(s) will be not active until a taxonomic group box is selected.
Once that occurs, those respective species will become active and can be deselected individually or by the
select all function. Those species selected will become active in the "Final Decision Summary Report"
table at the bottom of the page (Note: if a user
pushed only aboxplot to the DS Report, then only Push Level 1 Boxplot To DS Report
the "Level 1 Info" and the "Optional Components"
will be active).
73

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1 Report
Select Taxonomic Groups (CLASS)
Select All	_	. _
Taxonomic Group
Q Mammalia
Q Testudines
Q Aves
I	I Crocodylia
Lepidosauria
Q Amphibia
Chondrichthyes
Q Ceratodontimorpha
Coelacanth'iformes
Q Actinopteri
, Cladistia
LJ Petromyzontiformes
Select Species
Western gorilla
Chimpanzee
Western lowland gorilla
Pygmy chimpanzee
Sumatran orangutan
Bomean orangutan
Rhesus monkey
Sooty mangabey
Crab-eating macaque
Pig-tailed macaque
Ugandan red Colobus
3 Common Name
) Scientific Name
Add Level 1 Info to Report [
Level 1 Query Protein information
SeqAPASS ID: 1306
Query Species: Homo sapiens
Query Protein: estrogen receptor isoform 1
Query Accession: NP 000116.2 HiM
Ortholog Count: 348
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
Software Version: 3.2
Report Settings
Report Type: Primary
E-Value: 0.01
Sorted By Taxonomic Group:
CLASS
Common Domains: 1
Species Read-Across: Y
Cut-off %:33.93
Show Only Eukaryotes: Y
Optional Components
Component	Add to Report
Level 1 Visualization
Level 2 of DS Report
The Level 2 section of the DS Report contains all the domains that have been pushed to the report. There
can be multiple domains present in the section once they have been run and pushed individually to the
report. The user can also include each respective "Level 2 Visualization" by going to the visualization
page and either pushing the default visualization or a user modified visualization which will then have the
option to be attached in the downloaded PDF. Once a domain is selected, it will appear in the "Final
Decision Summary Report" table at the bottom of the page (Notes: if the user does not push a Level 2 run
to the DS Report page, there will be no information in that section. If a visualization is pushed to the DS
Report before a Level 2 report, the domain will be present along with the "Add Visualization to Report"
button being active.).
Level 2 Report
Select Level 2 Domains
Add to
Final Decision
Report

Domain
Optional Components
Select All


Add Info
to Report
Add Visualization
to Report

(316)cd06931, NR_LBDJ-)NF4 like, The ligand binding domain .of ,heptoc^tevnuclear factor 4, whiGh is explosively expanded in nematodt



(310) cd06949, NR LBD £R, Ligand binding domain of Es
;trogen receptor, which are activated by the hormone. 17beta-estradiol (estrogei
.

Level 3 of DS Report
The Level 3 section of the DS Report contains all the information for the query protein and report settings
that were pushed to the report. It also contains the amino acids that were updated in the report and pushed
over. New amino acids will need to be pushed over to the DS Report. The Yes (Y) or No (N)
susceptibility will be displayed in the "Final Decision Summary Report" table. The user can also include
the "Level 3 Visualization" by going to the visualization page pushing a user modified visualization
which will then have the option to be attached in the downloaded PDF. (Notes: if the user does not push a
Level 3 run to the DS Report page, there will be no information in that section. Also, if a "Level 3
Visualization" (Heat Map) is pushed before a Level 3 report, the "Level 3 Info" will be populated with
that respective run's information.)
74

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3 Report
Add to Final Risk Report ^





Add Level 3 Info to Report

351 D,353E,362K,364V,394R.524H

SeqAPASS ID: 1306



Template Species: Homo sapiens
Template Protein: [NP_000116.2] estrogen receptor isoform 1
Protein and Taxonomy Data: 02/28/2019

Optional Components


Component Add to Report

BLAST Version: 2.8.1

Level 3 Visualization

Software Version: 3.2

I	1




Final Decision Summary Report Table
The "Final Decision Summary Report" table contains the important data and susceptibility predictions for
each level run, for all the species selected in the Level one section. The table takes the susceptibility
prediction for each run and easily displays the results for a quick interpretation. The complete table can be
either saved as an excel spreadsheet or .csv file. It will also be added into the PDF when downloaded.
Each selected specie(s) will have its own respective row which contains the information that has been
pushed to the "Final Decision Summary Report" table. The columns will show the Data Version, NCBI
Accession, Filtered Taxonomic Group, Species, Protein Name, Level 1 Susceptibility Prediction as Yes
(Y) or No (N), Level 2 Common Domain(s) Name and respective Susceptibility Prediction as Yes (Y) or
No (N), Level 3 Template Species, and Level 3 Amino Acid Susceptibility Prediction as Yes (Y) or No
(N). (A few things to note: if there are multiple domains pushed to the "Final Decision Summary Report"
table, each domain will have their own column. Also, for species to have either a Yes (Y) or No (N)
susceptibility prediction in the table, they must be pushed to the report from the Level 3 run as well as
selected in the Level 1 taxonomic groups/species selection. If a species was not included in the Level 3
report that was pushed but is included in the "Final Decision Summary Report" table, they will receive a
NA for their Level 3 susceptibility prediction.)
Final Decision Summary Report
Search: Enter keyword |
Data
Version
NCBI Accession 0
Filtered
Taxonomic Group
Species 0
Protein 0
Susceptible
(Y/N) C
(345)
cd06157,
NR_LBD,
The ligand
binding
domain of
nuclear
receptors, a
family of
ligand-
activated
transcription
regulators
Level 3
Template
Level 3 Amino
Acids (Y/N)
5
NP 000116.2
Mammalia
Human
estrogen receptor isoform 1
Y
V
Homo sapiens
V



(1 of 1)
1 10v Download Table:
y



Download DS Report as PDF
To capture all the data pushed to the DS Report as a PDF, press the "Download DS Report" button. The
DS Report PDF will match the data on the DS Report page and will include the visualizations if selected
by the user. The information for each Level that is pushed to the downloaded DS Report PDF include all
the Query Protein Information for that respective protein, domain(s), and template protein. (Note: Once
the PDF is created and the DS Report page has been updated, the user must redownload the PDF to have
the most up to date version of the page.)
75

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1 of DS Report PDF
The Level 1 section of the DS Report PDF will contain all the "Level 1 Query Protein Information" along
with the Level 1 "Report Settings" for that respective protein's run. This information will not be present if
no Level 1 run information or Level 1 visualization is pushed to the DS Report PDF.
Level 1



Level 1 Query Protein Information
Report Settings


SeqAPASS ID: 1679
Report Type: Primary


Query Species: Homo sapiens
E-value: 0.01


Query Protein: estrogen receptor isofoim 1
Sorted By Taxonomic Group: CLASS


Query Accession: NP_000116.2
Common Domains: 1


Ortholog Count: 410
Species Read-Across: Y


Protein and Taxonomy Data: 06/08/20(20
Cut-off %: 34.43


BLAST Version: 2.10.0
Show Only Eutaiyotes: Y


Software Version: 4.1


Level 1 Visualization
;r
in
4-J *
c

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 2



Level 2 Query Protein Information
Report Settings


SeqAPASS ID: 1653
Report Type: Primary


Query Species: Homo sapiens
E-value: 10.0


Query Domain: (345) cd06157. NRLBD. The ligand binding
domain of nuclear receptors, a family of lrgand-activated
transcription regulators
Sorted By Taxonomic Group: CLASS


Query Accession: NP_000116.2
Specres Read-Across: Y


Ortbolog Count: 410
Cut-off %: 55.00


Protein and Taxonomy Data: 06/08/2020
Show Only Eukaryotes: Y


BLAST Version: 2.10.0



Software Version: 4.1


Level 2



Level 2 Query Protein Information
Report Settings


SeqAPASS ID: 1653
Report Type: Primary


Query Species: Homo sapiens
E-value: 10.0


Querv Domain: (341) cd06929. NR LBD Fl. Lieand-bindine Sorted By Taxonomrc Group: CLASS


domain of nuclear receptor family 1



Query Accesston: NP_000116.2
Specres Read-Across: Y


Ortholog Count: 409
Cut-off %: 42.03


Protein and Taxonomy Data: 06/08/2020
Show Only Eukaryotes: Y


BLAST Version: 2.10.0



Software Version: 4.1


Level 3 of DS Report PDF
The Level 3 section of the DS Report PDF will contain all the "Level 3 Template Protein Information'"
along with the Level 3 "Selected Amino Acids" for that respective run. This infonnation will not be
present if no Level 3 run information or Level 3 visualization is pushed to the DS Report PDF. The run
can have a visualization "Heat Map" that can be added to the DS Report PDF by selecting the "Add
Visualization to Report" radio button.
Level 3

Selected Amino Acids Level 3 Template Protein Information
5L. 57G. 120F. 177E
SeqAPASS ID: 1653
Template Species: Homo sapiens
Template Protein: [NP_0Q0116.2] estrogen receptor
isoform 1
Protein and Taxonomy Data: 06/08/2020
BLAST Version: 2.10.0
Software Version: 4.1

77

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Final DS Report Table in DS Report PDF
The Final Decision Summary Report table will display the species that were selected for the Level 1
section of the DS report. It can display the specie's respective "Protein", "Level 1 Susceptibility (Y/N)",
common domain(s), "Level 3 Template", and "Level 3 Susceptibility" all depending on what is selected
from the DS Report set up.
Final Decision Summary Report
Species
Protein
Level 1 Susceptible (Y/N)
(345) cd06157, NR LBD, The
ligand binding domain of
nuclear receptors, a family of
ligand-activated transcription
regulators
Human
estrogen receptor isoform 1
Y
Y
Western gorilla
estrogen receptor alpha
Y
Y
Chimpanzee
estrogen receptor isoform X2
Y
Y
Western lowland aorilla
estrogen receptor isoform X2
Y
Y
Pygmy chimpanzee
estrogen receptor isoform X2
Y
Y
Bomean orangutan
estrogen receptor alpha
Y
Y
Sumatran orangutan
estrogen receptor isoform X2
Y
Y
Sooty mangabey
PREDICTED: estrogen receptor
isoform X2
Y
Y
Rhesus monkey
estroaen receptor isoform X2
Y
	Y

Moving Between Level 1, Level 2, Level 3, and Decision Report Data Pages
As a user chooses to view Level 1, Level 2, or Level 3 data in the "View SeqAPASS Reports" tab, new
buttons become available for allowing the user to move between Levels of an analysis. The Decision
Report data page will become active once a user pushes a finished run using the "Push Level # To DS
Report" button. Please see snapshot below.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Log out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports
Settings

SeqAPASS Reports Version 6.0

Logged in as: Donovan Blatz

Main Level 1 Level 2 Level 3 DS Report
The user can use the "Main" button to return to the list of completed Level 1 runs and select a different
query accession to view. The "Level 1" button brings the user to the Level 1 data page, where the user can
set up queries for Level 2 and Level 3, as well as select the button to view Level 2 and Level 3 data pages.
Open Level 1, Level 2, and Level 3 pages remain open until the user selects a different run to view on the
"Main" page. Moving between tabs, such as "Home," Request SeqAPASS Run," and "SeqAPASS Run
Status", does not close the Level 1, Level 2, or Level 3 pages that have been opened.
78

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Note: If the user logs out of the SeqAPASS tool, upon logging back in. the data will reset to default
settings. Therefore, the View SeqAPASS Reports tab will not display the "Main," "Level 1," "Level 2®
or "Level 3"' buttons, until a query is chosen and Level 2 and Level 3 pages are opened.
Search, View, and Download Data Tables
The user can use the "Search"' box to enter text to search the table. Further, the user can use the arrow
buttons and page numbers on the bottom of the screen to view all data and the drop-down to expand the
table to 10, 20, or 50 rows. There are also left and right scroll bars at the bottom of the tables to allow the
user to view all columns of the table.
Search using text box on top of tables:
Search: Enter keyword
Options for viewing data:




(1 of 95)
1 2 3 4 5 6 7 8
9 10 -
~' 10- Download Table: ^
All data tables in the SeqAPASS tool can be downloaded as Excel or csv fdes. The icons for downloading
the files are present on the bottom right-hand side of all tables. Click the icon to download data.
Download Table:
Upon selecting a csv file, the user can choose to save or open the file. Each file is appropriately named by
Level of the SeqAPASS evaluation and report type.
The following links exit the site ©trr
# Save As
«- -» - t B • This PC • Desktop • SeqAPASS User Guide
Organize - New folder

Data
Version
NCBI Accession 0
Protein
Count c

6
NP 000116.2
2603582

6
ABY64717.1
1708

e
XP 003311596.1
171683

6
XP 030999114,1
52137

6
XP.003811544.1
71982

6
ABY64718.1
1609

e
XP 002817538.1
141069

6
XP 011922091.1
66421

6
XP 014992599.1
177851

6
XP.QQ55522Q9.1
98680
JTO.PC
^ 3D Objects	R SeqAPASS User Guide Pics
M Desktop
I Documents
~	Downloads
J1 Music
9 Pictures
£ Videos
•i£ OSDisIc (C:)
^ Data {\\AA\ORD
~	Data (\\AA\ORD
Filename: j SeqAPASSJ.evet1_Primary_Report.csv
Save as type; Microsoft Excel Comma Separated Values Me (*.csv)
Date modified	Type
10/29/2019 10:13 AM File folder
(1 of 137)
9541	Mammalia
123456789 10
Macaca fascicularis
10 v Download Table:
Crab-eating macaque
79

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Upon selecting a .xls file, the user can save the report to their desired location. Each file is appropriately
named by Level of the SeqAPASS evaluation and report type.

B Show Only Eukaryotes
View Level
f SaveAs X
<- - f Desktop > SeqAPASS Reports v O Search SeqAPASS Reports P
Organize * New folder jfEr • Q
Name Oate modified Type
*	Quick access
*	This PC
Network
Level 2 Data - Primary

The following links exit the site [jBflft




Search: Enter keyword |

Data
Version
NCBI Accession Z
Protein
Count 0
Species
Tax ID s
Taxonomic Filtered




4
NP 000116.2
1265506
9606
Mammalia Mammalia


4
4
XP_01499259«,1
A3Y64721.1
88400
931
3544
9534
Mammalia Mammalia
Mammalia Mammalia

File name: SeqAPASS.Level2.Primary.Report.xls	 vj
4
4
XP 025240309.1
38964
52618
61853
9565
Mammalia Mammalia
Mammalia Mammalia

Save as type: Microsoft Excel 97-2003 Worksheet (*xts)


4
XP 003811544.1
51891
9597
Mammalia Mammalia

" Hide Folders


4
4
4

66748
2023
145798
9531
Mammalia Mammalia
Mammalia Mammalia
Mammalia Mammalia

] Save
cancel
XP 002817538.1
9601
Pongo abelll
Sumatran orangutan


4
XP 011852190.1
38580
9568
Mammalia Mammalia
Mandrlllus leucophaeus
Drill
a



(1 of 95)

123456789 10
- •" 10 • Download Table:


Log out
The user can log out from any page in SeqAPASS, by clicking the "Log out" link on the upper right-hand
side of the page. If a user clicks Log out and then Logs back in, all settings will be set back to default.
User can log out at any time by clicking the "Log out" link on the upper right-hand side. Any successfully
submitted queries that were requested prior to logging out will continue running and when completed,
will be available to the user in the "View SeqAPASS Reports" tab.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Loa out
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

SeqAPASS Reports

Version 6.0

Logged in as: Donovan Blatz
Pop-up Messages
The Spinning Wheel pop-up is used as an indicator to alert the user that an action is taking place, where
the interface of the SeqAPASS tool is contacting the backend database. For example, upon clicking the
"SeqAPASS Run Status" tab, "Refresh Data" button, "View Level 2 Data" button, or "View Level 3
Data" button the Spinning Wheel will pop-up and disappear from the screen. There are multiple other
instances where the spinning wheel is used as an indicator to the user that an action is occurring.
Querying database ... Please wait
Pop-up messages are meant to guide the user to submit the correct mfonnation for a query, inform the
user of a successful or failed query submission, or otherwise inform the user of an error. All pop-up
messages will appear for 10 seconds on the upper right-hand side of the screen, and then disappear. If the
80

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
user would like to close the message before the 10 seconds is up, click on the message and an "x" will
appear of the upper right-hand comer of the message box. Click the x to close the message.
In the "Request SeqAPASS Run" tab, Compare Primary Amino Acid Sequences '"By Species" page, a
successful Level 1 query submission will display a pop-up message indicating that the query has been
submitted to the run queue or if "existing' message appears indicating that the accession has been ran
previously either by a user and is available to view.
J Success
Submitted NP_064393.2:
submitted
OR
J Success
NP_00pi16.2: existing
User did not select any query proteins from the "Request SeqAPASS Run" tab. Compare Primary Amino
Acid Sequences "By Species" or "By Accession" page, and clicked "Request Run" button.
(x) Error
Must select query
proteins
OR
0 Error
Must enter NCBI
accession
If the user enters non-sense text (or any text that is not an NCBI accession) into the "NCBI Protein
Accession" text box for submitting a Level 1 query in the "Request SeqAPASS Run" tab, in the Compare
Primary Amino Acid Sequences "By Accession" page, and clicked "Request Rim" button, the message
below will pop-up indicating that the Accession entered is not in the SeqAPASS database.
f Success
fgafgaf. not in database
In the "View SeqAPASS Reports" tab, Level 1 page, if a user clicks "View Level 2 Data," a successful
Level 2 query submission will display a pop-up message indicating that the query has entered the am
queue.
] Level 2 Run Requested
Status queued	
81

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
In the "View SeqAPASS Reports" tab. Level 1 page, if a user selects a domain that has already been
submitted (but not completed) and clicks "Request Domain Run" a message for successful Level 2 query
submission will display a pop-up message indicating that the query has entered the run queue
f Level 2 Run
Requested
Status Already run or
could not submit
In the "View SeqAPASS Reports" tab, Level 1 page, if a user clicks "View Level 2 Data" without
selecting a domain to view from the drop-down, the message below will pop-up to indicate that the user
must select a domain.
(x) Error
Must select domain from
drop-down
In the "View SeqAPASS Reports" tab, Level 1 page, a successful Level 3 query submission will display a
pop-up message indicating that the query has entered the run queue.
Level 3 Run Requested
Status queued
In the "View SeqAPASS Reports" tab, Level 1 page, if a user fails to type a user defined Level 3 Run
Name, the message below will pop-up to indicate that the user must do so.
jx) Error
You must specify a
Template Sequence and
Level 3 Run Name
In the "View SeqAPASS Reports" tab, Level 1 page, if a user fails to select species from the Level 1 Data
table to be compared with the template sequence, the message below will pop-up.
(x) Error
You must select
sequences from the
Level 1 Data table to
request a Level 3 Run
82

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
In the "View SeqAPASS Reports" tab, Level 1 page, if a user fails to select a Level 3 Run Name from the
Choose Query to View drop-down and clicks the "View Level 3 Date" button, the message below will
pop-up.
0 Error
Must select level 3 run
from drop-down
In the "View SeqAPASS Reports" tab, "Level 3 Template Protein Information" data page, if a user fails
to select amino acid residues using the "Select Amino Acid Residues" shuttle and clicks the "View Level
3 Date" button, the message below will pop-up.
No Residues Selected
User must select
residues
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Documentation
Query Species: The selection of the query species for a SeqAPASS analysis is dependent upon the
question the user is addressing. For example, the query species can be the target species (i.e., human or
companion animal in the case of drugs; or insect, plant, fungus, or pest in the case of pesticides) or,
depending on the application of the susceptibility prediction, the query species may be a species known or
hypothesized to be sensitive to a chemical acting on the protein molecular target of interest. There may be
instances where a protein for the species of interest has not been sequenced, in this case it may serve the
users purpose to identify another taxonomically related species from the same organism Class, Order,
Family, or Genus as a surrogate query species. In certain cases, when there is interest in the susceptibility
of a particular species (e.g., honey bee) and in the case that there are numerous potential target species
(e.g., neonicotinoids are intended to cause mortality in a number of pest insects) the species of particular
concern may serve as the query species.
Query Protein: SeqAPASS can be queried with any protein sequence available in the NCBI protein
GenBank database, by protein name, or NCBI Accession. It is suggested that the user of SeqAPASS
examines their query protein and species in the NCBI protein database prior to submitting a run to
SeqAPASS (use NCBI link on query page). It is not uncommon for a protein of a specific species to be
represented by more than one sequence. In such cases there are some guiding principles for identification
of the best sequence available for the SeqAPASS run.
General guidelines: These guidelines describe best practices for identifying the most useful sequence for a
species susceptibility prediction in SeqAPASS, however, in some cases, limited sequence information is
available and therefore less desirable sequences may be used. It is up to the user of SeqAPASS to
recognize the quality and limitations of the sequence chosen for the SeqAPASS query. The information
about a particular protein can be found on the Protein page in the NCBI database
(htto ://www.ncbi .nlm.nih.gov/protein/).
83

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
http ://www.ncbi .nlm ,nih .gov/protein/
3 Home - Protein - NCBI
C | | 0 * Google
3
www, ncbi, nlm. nih, gov/protein/
P. Most Visited Getting Started i ; Customize Links Windows Marketplace
% NCBI Resources © How To ©
Protein
1 Protein	vj j androgen receptor, homo sapiensf

Help
i Ai
SRc«EctT^Ti
M / t „ * 7~\/X/ ,
1 Protein
The Protein database is a collection of sequences from several sources, including translations from annotated coding
regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein sequences are
the fundamental determinants of biological structure and function.
Using Protein
Quick Start Guide
FAQ
Help
GenBank FTP
RefSeq FTP
Protein Tools
BLAST
LinkQut
E-Utilities
Blink
Batch Entrez
Other Resources
GenBank Home
RefSeq Home
CDD
Structure
Search for a protein of interest using protein name and/or species of interest: For the example above,
multiple hit proteins were identified.
% NCBI Resources© How To©
Protein
| Protein ~~ v] androgen receptor, homo sapiens
Save search Advanced
rtith
Help
Show additional filters
Species
Animals
Fungi
Bacteria
More ...
Enzyme types
Ligases
Oxidoreductases
Source
databases
DDBJ
EMBL
GenBank
PDB
PIR
RefSeq
UniProtKB / Swiss-Prot
Sequence length
Custom range. .
Molecular
weight
Custom range. .
Release date
Custom range. .
Revision date
Custom range...
Display Settings: R Summary, 20 per page, Sorted by Default order
Results: 1 to 20 of 540
Page |l | ot 27 Next > Last
Send to: R Filters: Manage Filters
Top Organisms ITreel
~	RecName: FulNAndrogen receptor. AltName. Full-Dihvdratestosterorie receptor. AltNarne.
1-	Full=Nuclear receptor subfamily 3 group C member 4
919 aa protein
Accession: P10275.2 Gl: 113830
GenPept FASTA Graphics Related Sequences Identical Proteins
~	androgen receptor |Homo sapiens!
2-	917 aa protein
Accession: AAA51772.1 Gl: 178882
GenPept FASTA Graphics Related Sequences Identical Proteins
~	androgen receptor, partial |Homo sapiens]
3-	2 aa protein
Accession: MD14959.1 Gl: 4262811
GenPept FASTA Graphics
~	androqen-receptor [Homo sapiens]
4-	906 aa protein
Accession: AAA.51780.1 Gl: 179034
GenPeot FASTA Graphics Related Sequences Identical Proteins
~	androgen receptor fHomo sapiensl
5- 917 aa protein
Accession: AAA51771.1 Gl: 178872
GenPeot FASTA Graphics Related Sequences Identical Proteins
~	androgen receptor |Homo sapiens!
Homo sapiens (531)
Aspergillus niger (4)
Chlorocebus aethiops (1)
Cardiobacterium valvarum F0432 (1)
Streptococcus pneumoniae MNZ41 (1)
All other taxa (2)
'More...
Find related data
Database: | Select
Search details
androgen receptor[All Fields] AND
("Homo sapiens"[Organism] OR homo
sapiens[All Fields])
Recent activity	—
Turn Off Clear
q androgen receptor, homo sapiens (540)
Pm;
Select one of the proteins by clicking on the link shown above to see detailed information about the
protein
84

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
% NCBI Resources© How To©


Sian in to NCBI




Protein I Protein
[



Advanced

Help
Display Settings: 0 GenPept
androgen receptor [Homo sapiens]
GenBank: AAA51771.1
FASTA Graphics
Send to: R
Go to: R
LOCUS
DEFINITION
ACCESSION
VERSION
DBSOURCE
KEYWORDS
SOURCE
ORGANISM
REFERENCE
AUTHORS
TITLE
JOURNAL
PUBMED
REFERENCE
AUTHORS
JOURNAL
PUBMED
COMMENT
FEATURES
source
PRI 31-0CT-1994
AAA51771	917 aa
androgen receptor [Homo sapiens]
AAAS1771
AAA51771.1 GI:178872
locus HUMARA accession M21748.1
Homo sapiens (human)
Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
1 (residues 1 to 917)
Tilley,W.D., Marcelli,M., Wilson,J.D. and McPhaul,M.J.
Characterization and expression of a cDNA encoding the human
androgen receptor
Proc. Natl. Acad. Sci. U.S.A. 86 (1), 327-331 (1989)
2911578
(si
s)
Marcelli,M., Tilley,W.D., Wilson,C.M., Griffin,J.E., Wilson,J.D.
and McPhaul,M.J.
Definition of the human androgen receptor gene structure permits
the identification of mutations that cause androgen resistance:
premature termination of the receptor protein at amino acid residue
588 causes complete androgen resistance
Hoi. Endocrinol. 4 (8), 1105-1116 (1990)
2293020
[2] sites; androgen resistant mutation.
Draft entry and computer-readable sequence [1] kindly submitted by
M.J. McPhaul, 09-DEC-1988.
Method: conceptual translation.
Location/Qualifiers
1..917
	/organism="Homo sapiens"	
Change region shown
Customize view
Analyze this sequence
Run BLAST
Identify Conserved Domains
Highlight Sequence Features
Find in this Sequence
Protein 3D Structure
Targeting The Binding
Function 3 (bf3) Site Of The
Human Androgen Receptor
PDB: 4HLW
Source: Homo sapiens
Method: X-Ray Diffraction
Resolution: 2.5 A
See all 54 structures-
Articles about the AR gene
Repression of cell proliferation and androgen
receptor activity in prostat [Anticancer Res. 2013]
TALEN-engineered AR gene rearrangements
reveal endocrim [Proc Natl Acad Sci USA. 2013]
Androgen receptor (AR) positive vs negative roles
in prostate cancer cell d [Cancer Treat Rev. 2014]
Identical proteins for AAA51771.1
Guiding principles: On the NCBI protein page, rows to examine include: "DEFINITION,"
"REFERENCES," COMMENTS," and "FEATURES." The information provided in these rows can aid a
SeqAPASS user in the identification of an ideal query sequence for SeqAPASS.
It is desirable to:
a.	Use accessions with the following prefix: NP_
b.	Avoid use of protein sequences labeled "partial," "PREDICTED," "PROVISIONAL," "INFERRED,"
or "hypothetical"
c.	Avoid using those labeled "TPA" (Third Party Annotation), however if TP A is all that is available
"TPA: experimental" would be preferred over "TPA: inferential"
d.	Look at the date associated with the protein in the "LOCUS" row of the detailed protein page. A more
recent date can have the most up-to-date annotation of the protein. Under the "DBSOURCE" row of the
detailed protein page other accessions associated with past protein sequences can be viewed. Many times,
if the "xrefs" row is heavily populated and has the most recent annotation update date, it is likely to be the
best sequence to use as a query sequence in SeqAPASS.
d.	Short sequences should be avoided when possible as query sequences. Many times, if one selects the
protein from the protein output derived from the NCBI protein database query, they will find that the
short sequence is actually a partial sequence described in the "DEFINITION" row of the Protein page.
e.	Unless there is reason for doing so (based on the question the user is trying to address), splice-variants
labeled in "FEATURES" rows of the Protein page as "alternatively spliced" would be less desirable
f.	It is important to check the references associated with the selected query protein. In some cases, certain
sequences are associated with sensitivity to a given chemical. This can be particularly useful when
predicting susceptibility to pesticides, where certain strains of insects are produced to be readily sensitive
or insensitive to a chemical.
85

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
g. A secondary check of the sequence used in the SeqAPASS run would be to look at the output derived
and see whether ortholog candidates were detected. Ideally a preferential sequence would have more
ortholog candidates identified.
Important Note: To identify which query protein has the greatest number of Ortholog Candidates the user
can choose to submit multiple proteins with the same species and protein. Upon the Level 1 runs
completing for those similar proteins, the user can then select the "View SeqAPASS Reports" tab and
look at the table for "Ortholog Count" the protein with the highest number is likely to be the most
appropriate query species for a SeqAPASS evaluation.
Example: Androgen receptor, Homo sapiens
Display Settings: I GenPept
androgen receptor [Homo sapiens]
GenBanklAAA51771.11
iceptor [Homo s api ens ] .
L0C1.13	AAA517 71
DEFIMITIOM [mdrogen i
AAA51771
AAA51771.1 GI:1?8872
locus CTJHMA accession M21748.1
PEll 31-OCT-1994
ACCESSION
VERSION
DBSOUECE
KEKOOKDS
SOURCE
0EGANI3M
sapier
_5iEi£r
: (hllhan)
; Primates
: Haplorrhini;
Eukaryota;
Majrmalia; Eutheria; Euarchontogli:
fatarrhini ; Hominidae ; Homo.
(residue* 1 to 917)
Tilley,W.D., Marcelli,M., Wilson,J.D. and McFhaul,M.J.
Characterization and expression of a cDHA encoding the human
JOUKRfcL
FUSED
REFERENCE
AUTHORS
JOURNAL
plJEITTTl
. Acad. Sci. U.S.A. 86 (1), 327-331 (1989)
(si
:s)
Marcelli,M., Tilley,W.D., Wil
and McPhaul,M.J.
Definition of the hrman andro
the identification of mutatioc
premature te rminati on of the
588 causes complete androgen resistance
Mol. Endocrinol, 4 («), 1105-1116 (1990)
19 3 0 £ 0
,C .M., triffin,J.E., Wilson,J.D.
that cause androger
:eptor protein at irt
[i]
"Draft entry and computer-readable
M.J. McPhaul, 09-DEC-1988.
fethod: conceptual translation.
Loc ation/Quali j
eguence [1] kindly submitted by
1..917
/organism="Homo sapiens"
/ db_xre f = "taxon:9606"
/map="Xgll.i-ql2"
/sex="male"
/tis sue_type = "prostate"
1..917
/product="androgen receptor"
6..446
/re gi cn_name = "Androgen_re c ep"
/note="Androgen receptor; pfamO£166"
/ db_xref = "CDD:111097"
55*..633
/re gi on_name = "NR_DBD_AR"
/note="DNfc-binding domain of androgen receptor (AR) is
composed of two C4-type zinc fingers; cd07173"
/db_xref = "CDD: 143547"
order(557,560,574,577,593,599,609,61£)
/s ite_type = "other"
/note="sinc binding site [ion binding]"
/db_xref="CUD:143547"
order(566..569,576,578..579,58i..583,591,606..607,610,613)
/site_type="DNfc binding"
/note:"DNA binding site [nucleotide binding]"
/db_xref="CDD:143547"
order(592..596,59$..600,605,608)
/s ite_type-"othe r"
Change region shov/n
Customize view
Analyze this sequence
Run BLAST
Identify Conserved Domains
Highlight Sequence Features
Find in this Sequence
Protein 3D Structure
Targeting The Binding
Function 3 (bf3) Site Of
"The Human Androgen
PDB: 4HUV
Method: X-Ray
1 Diffraction
Resolution: 2.5 A
See all 54 {filch res...
Articles about the AR gene
Repression of cell proliferation and androgen
receptor activity in pre [.Anticancer Res. 2013]
TALEN-engineered AR gene rearrangements
reveal endo [Proc Natl A;ad Sci U S A 2013]
Androgen receptor (AR) positive vs negative
roles in prostate car [Cancer Treat Rev. 2014]
Identical proteins for AAA51771.1
androgen receptor [Homo sapiens]
[AAA51772]
See all-
Pathways for the AR gene
Integrated Breast Cancer Pathway
SIDS Susceptibility Pathways
Nuclear Receptors
Reference sequence information	*
RefSeq genomic sequence
Seethe genomic reference sequence for the
AR gene (NG_009014.2).
RefSeq protein isoforms
See 4 reference sequence protein isoforms
for the AR gene.
86

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Example cont:
ie = "dimer interface
_xre f - "CDD:14? 5 47"
. .915
[polypeptide binding]"
gi on_n j/ne - "ME_LBD_AE"
note : "Li g and binding domain oi the nucleir receptor
androgen receptor, ligjnd Activated transcription
gulator; cd07073"
Ib_xre i - " CDD: 134758"
rder(699,704..703,705..706,709,739..740,743..744,747,750,
6*,778,705,871,075)
ite_type s "othe r"
iote-"ligond binding site [chemical binding]"
lb_xre f = "CDD: 134758"
•der(711,714,718,744,748 ,734,736,891..894,895.-896)
ite_type:"othe r"
note="co4Ctivitor recognition site [polypeptide binding]"
db_xre i-"CDD:134758"
.917
gene:"AE"
ode d_by="H41748.1:163..4916"
Ib_xre f = " UDE :GO 0-14 0 - 5 5 6"
1
rnevqlglgrv
61
qqqqqqqqqq
141
echpergcvp
181
il je j;tniql
441
jvimglgvei
301
edtieyjpfk
361
yynipliljg
441
s gjpj hiss
481
trppqgligq
541
rdhvlpidyy
601
idkf rrkncp
661
hiegyecqpi
741
pgf mlhvdd
781
qcvimrhliq
841
Ackrkxrptjc
901
vpkiljgkvk
yprppjktyr
qqqqqqqet;
epqiivii.'k
lqqqqqeiv;
lehljpgeql
ggytkglege
PEPPPPPPtlP
scohtlfta.ee
es dftipdwj
fppqktclic
scrlrkcyej

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Alignment page will be generated
O COBALT
Constraint-based Multiple Alignment Tool
MyNCBI
Home Recent Results Help

jjH|rSinn Inl tReo
Phyloaenetic Tree Edit and Resubmit >Download
- Cobalt RID EMV7SF1X211 (7 seqs)

All queries form only one cluster. No domain information was used for generating constraints. Decreasing maximum in-cluster distance or turning off query clustering option may improve
results.
T Descriptions 0 Select All Re-align .. >Alignment parameters
Legend for links to other resources: E UniGene E GEO ~ Gene O Structure CD Map Viewer
Accession
Description
JL
0 P10275.2
0 AAAM77? 1
0 AA/W17RTI 1
0 AAAB1771 1
0 AA/W1779 1
0 AAD45921.1
0 AAA518B6.1
RecName: FulMAndrogen receptor; AltName: Full=Dihydrotestosterone receptor; AltName: FulMNucle M ,'J
androgen receptor [Homo sapiens] >gb|AAA51771.11 androgen receptor [Homo sapiens]	E
androgen-receptor [Homo sapiens]	Mi'il
androgen receptor [Homo sapiens] >gb|AAA51772.11 androgen receptor [Homo sapiens]
androgen receptor [Homo sapiens] >sp|P10275.2|ANDR_HUMAN RecName: FulNAndrogen receptor; '^PubChem BioAssav Info linked to AAA51729.1
androgen receptor [Homo sapiens]	M n I
androgen receptor [Homo sapiens] >sp|P10275.2|ANDR_HUMAN RecName: Full=Androgen receptor;	HPubChern BioAssav Info linked to AAA51686.1
~ Alignments 0 Select All	) Mouse over the sequence identiferfor sequence title
View Format: | Compact ^ # Conservation Setting: | 2 Bits v w,
0P1O275
1
0AAA51772
1
0AAA51780
1
0AAA51771
1
0AAA51729
1
0AAD45921
1
0AAA51886
1
0P1O275
81
0AAA51772
80
0AAA51780
76
0 AAA51771
80
MEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPFHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET	80
MEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPFHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQ-ET	79
MEVQLGLGRVYPRPP3KTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQ	ET	75
MEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPKHPEAASAAPPGASLLLLQQQQQQQQQGQQQQQQQQQQ-ET	79
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET	80
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVBEVIQMPGPRHPEAASAAPPGA5LLLLQQQQQQQQQQQQQQQQ	ET	75
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVBEVIQHPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET	80
To evaluate sequences, change settings for "Conservation Setting" from "2 Bits" to "Identity"
p. COBALT
Constraint-based Multiple Alignment Tool
MyNCBI
Home Recent Results Help


Phvloqenetic Tree Edit and Resubmit >Download
- Cobalt RID EMV7SF1X211 (7 seqs)
All queries form only one cluster. No domain information was used for generating constraints. Decreasing maximum in-cluster distance or turning off query clustering option may improve
results.
T Descriptions 0 Select All Re-align t>Alianment parameters
Legend for links to other resources: e UniGene Q GEO e Gene Structure Map Viewer
Accession
Description
Links
0 P10275.2
0 AAA51772.1
0 AAA51780.1
0 AAA51771.1
0 AAA51729.1
0 AAD45921.1
0 AAA51886.1
RecName: Full=Androgen receptor; AltName: Full=Dihydrotestosterone receptor; AltName: Full=Nucle
androgen receptor [Homo sapiens] >gb|AAA51771.1| androgen receptor [Homo sapiens]
androgen-receptor [Homo sapiens]
androgen receptor [Homo sapiens] >gb|AAA51772.1| androgen receptor [Homo sapiens]
androgen receptor [Homo sapiens] >sp|P10275.2|ANDR_HUMAN RecName: Full=Androgen receptor;
androgen receptor [Homo sapiens]
androgen receptor [Homo sapiens] >sp|P10275.2|ANDR_HUMAN RecName: Full=Androgen receptor;
M I
E
M i'i I
M.'.l
LSluJPubChern BioAssav Info linked to AAA51729.1
M.'.l
BPubChem BioAssav Info linked to AAA51886.1
~ Alignments 0 Select All Re-align
View Format: | Compact i*j| #
Mouse over the sequence identiferfor sequence title
0P1O275	1
0AAA51772 1
0AAA51780 1
0 AAA51771 1
0AAA51729 1
0AAD 45921 1
0AAA51886 1
HEVQLGI
HEVQLGI
HEVQLGI
Conservation Setting: j 2 Bits
|1 Bit
2	Bits
3	Bits
JP.VYPRPPSKTYP.GAFQHL [ 4 gits
JRVYPRP P SKTYRGAFQNL
PGPPHILAASAAPPGASLLLLQQQQQQQQQOQQQQOQQQQOqET 80
'GPPHfiAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQ-ET 79
WWireAASAAPPGASLLLLQQQQQQQQQQQQQQQQ	ET 75
HEVQLGLGRVYPRPPSKTYRGAFQWLFQSVPEVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQ-ET	79
HEVQLGLGRVYPRPPSKTTRGAFQHLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET	80
MEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQ	ET	75
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET	80
88

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Look for differences in the sequence (e.g., conserved residues, gaps) and start by eliminating sequences
that have gaps.
i. If, after the suggested evaluations of the proteins are performed, questions remain as to which sequence
would be best to run in SeqAPASS, run all relevant sequences in SeqAPASS for the evaluation. The
individual residue differences between commonly named sequences will become most important when
evaluating residues known to be important for binding the chemical or activating the protein (Level 3
SeqAPASS analysis). After completing the SeqAPASS run, select the data that has the greatest number of
ortholog candidates for your evaluation of conservation and further predictions of cross species
susceptibility. Depending on the protein of interest, multiple subunits may be associated with a protein. In
this case, all relevant subunits can be queried using SeqAPASS.
Level 1 Calculated Percent Similarity
The SeqAPASS algorithms submit the query to NCBFs standalone BLASTp (using default settings,
including BLOSUM-62 matrix), which aligns the query protein with all proteins available in the NCBI
protein database and provides a variety of metrics associated with each pairwise alignment between the
query and hit sequences. SeqAPASS selectively captures output from BLASTp, including one sequence
per species with the highest bit score. Detailed descriptions of metrics derived from BLASTp (e.g.,
BLASTp Bitscore, E-Value, Positives, Identity, Hit length) can be found in:
The NCBI Handbook: (http://www.ncbi.nlm.nih.gov/books/NBK21106/);
BLAST® Help: (http://www.ncbi.nlm.nih.gov/books/NBK62051/) and the
NCBI Glossary Field Guide: (http://www.ncbi.nlm.nih.gov/Class/FieldGuide/glossary.html)
The top row of the Level 1 data corresponds to the queried protein selected by the user. For each sequence
queried, the Level 1, top row query sequence is used to determine the maximum bitscore for the analysis,
which is derived from aligning the query sequence to itself using BLASTp. To calculate percent
similarity, the bitscore for each hit sequence is normalized to the maximum bit score and then multiplied
by 100.
Note: SeqAPASS v2.0 and newer parse the BLASTp query and hit accessions to identify all the
species/accessions from the identical proteins. Therefore, if a hit sequence represents multiple species, all
species with the identical sequence will be found in the data table for Level 1 and Level 2. To determine
which sequence/species was identified from BLASTp as a hit and which sequence/species was parsed
from the identical sequence, view the "Full Report" for Level 1 or Level 2, column "Identical Protein,"
Where "N" is indicative of the original hit sequence and "Y" is the parsed sequence.
Common Domain Count
Reversed Position Specific BLAST (RPS BLAST) is used to compare each query and hit sequence to
conserved domains defined in NCBIs Conserved Domain Database. A hit domain is considered in
common with the query domain if it contains the same domain accession as the query and it aligns with
the NCBI curated domain with the same or greater amino acid residue coverage than the query sequence.
89

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Ortholog Candidate Identification
Ortholog sequences are those that have diverged from a speciation event and therefore are more likely to
maintain similar function. SeqAPASS uses reciprocal best hit (RBH) BLAST for ortholog detection by
automatically comparing each hit protein to all protein sequences available for the query species and if the
original query protein or one of its identical protein matches is identified to by the best match to the hit or
maintain the same bitscore, then the hit sequence would be considered an ortholog candidate. The
sequence is indicated an Ortholog Candidate or not with a yes (Y) or no (N) in the column.
Note: Many NCBI protein accessions represent multiple identical protein sequences in the BLASTp
output. This is due to BLASTp querying and presenting data from the non-redundant protein database.
Sometimes the identical sequences are from different species. This can be checked by following the link
for the top row "NCBI Accession" in the table to the NCBI protein page. Below the protein name
[species] title will be a link to "Identical Proteins."
Click the "Identical Proteins" link and look for a sequence in the list from the user defined query species.
1 % NCBI Resources © How To 0

Siqn in to Ncl



Protein Protein


Advanced

He
NCBI is phasing out sequence Gl numbers in September 2016. Please use accession.version! Read more...


GenPepW
Send to: ~

Change region shown

estrogen receptor isoform 1 [Homo sapiens]
Customize view

NCBI Reference Sequence: NP_000116.2

Identical Proteins FASTA GraDhics


AnaluTO thic eannanra
Note: If the top hit is a Protein DataBank (PDB) code (e.g., 1AHRA) from RBH BLAST there will be
no ortholog candidates identified. BLASTp when ran against all accessions for a given species does not
return PDB codes. It is recommended that the user identify a similar/identical sequence to the PDB code
and use that sequence as the query sequence.
Susceptibility cut-off
The susceptibility cut-off values listed on the "Level 1 (and Level 2) Susceptibility Cut-off' page are
determined by plotting the % similarity data from the "Primary Report" or "Full Report" and identifying
the local minimums in the data. The default cut-off is determined by taking the 1st local minimum and
moving up in percent similarity until the next ortholog candidate is found. The susceptibility cut-off
displayed in the list is the percent similarity of the identified ortholog candidate.
Criteria for Susceptibility Prediction (when "Primary> Report Settings " is set to "Species Read-Across: "
Yes)
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate = Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
90

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
If the hit sequence is below the susceptibility cut-off but belongs to any organism class found above the
susceptibility cut-off, the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for
"yes". This criterion allows susceptibility predictions to be made across taxonomic groups based on the
likelihood that the sequences above the cut-off are better matches to the query.
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate = N, for "no,") and does not belong to any organism class found above the susceptibility cut-
off, the hit is predicted to not be susceptible; therefore, Susceptibility Prediction = N for "no"
Note that the "Primary Report" may yield different Susceptibility Predictions than the "Full Report," as
the predictions are based on the data in the different reports. The Primary Report is filtered to only display
E-value < 0.01 and Common Domain Count > 1.
Criteria for Susceptibility Prediction (when "Primary> Report Settings " is set to "Species Read-Across: "
No)
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate = Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate = N, for "no,"), the hit is predicted to not be susceptible; therefore, Susceptibility Prediction =
N for "no"
Level 2 Calculated Percent Similarity
Data obtained from the Level 1 RPS BLAST evaluation is used to assign sequence ranges that aligned
with a user selected domain (from the NCBI CDD database) to each accession from the Level 1 Full
report. BLASTp is then used to align the query domain range to each hit domain range. The percent
similarity is calculated based on the bit scores from the BLASTp alignment of the domain regions. For
each sequence queried, the Level 2, top row query species is used to determine the maximum bitscore for
the analysis, which is derived from aligning the query sequence to itself using BLASTp. To calculate
percent similarity, the bitscore for each hit sequence is normalized to the maximum bit score and then
multiplied by 100.
Susceptibility cut-off (same method as used in Level 1)
The susceptibility cut-offs listed on the "Level 2 Susceptibility Cut-off' page are determined by plotting
the % similarity data from the "Primary Report" or "Full Report" and identifying the local minimums in
the data. The default cut-off is determined by taking the 1st local minimum and moving up in percent
similarity until the next ortholog candidate is found. The susceptibility cut-off displayed in the list is the
percent similarity of the identified ortholog candidate.
91

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/14/2021; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 2 Criteria for Susceptibility Prediction (when "Primary> Report Settings " is set to "Species Read-
Across: " Yes)
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate = Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off but belongs to any organism class found above the
susceptibility cut-off, the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for
"yes". This criterion allows susceptibility predictions to be made across taxonomic groups based on the
likelihood that the sequences above the cut-off are better matches to the query.
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate = N, for "no,") and does not belong to any organism class found above the susceptibility cut-
off, the hit is predicted to not be susceptible; therefore, Susceptibility Prediction = N for "no"
Note that the "Primary Report" may yield different Susceptibility Predictions than the "Full Report," as
the predictions are based on the data in the different reports. The Primary Report is filtered to only display
E-value < 0.01 and Common Domain Count > 1.
Level 2 Criteria for Susceptibility Prediction (when "Primary Report Settings " is set to "Species Read-
Across: " No)
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate = Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate = N, for "no,"), the hit is predicted to not be susceptible; therefore, Susceptibility Prediction =
N for "no"
Level 3 Sequence Alignments
COBALT is used to align all user selected sequences (from Level 1 hits) with a user defined template
sequence. Because COBALT algorithms align all sequences, it is recommended that the user align the
template sequence with sequences that are most similar to one another. As a means to capture the most
similar sequences from the SeqAPASS data it is recommended that the user filter the Level 1 data by
taxonomic group and step through the Level 1 data pages one by one while selecting sequences. It is
recommended that the user look at the name of the sequence and exclude 'partial" sequences when
possible. Requesting a query from one taxonomic group at a time, breaks the data down in manageable
alignments.
Selecting Amino Acid Residues to Align
The user may select up to 50 amino acid residues to compare across selected species in Level 3.
92

-------