EPA/6Q0/R-20/408
Sequence Alignment to Predict Across
Species Susceptibility
(SeqAPASS)
VERSION 5.0
UiDcieldilDiD 3WI1L11 LdtJC yM-^Vy y M
User Guide

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) User Guide
Quick Notes: Use Chrome for optimal performance and PLEASE DO NOT submit more than 10 Level 1
queries at a time. Wait until they run to completion prior to submitting more.
Table of Contents
Background	page 2
Accessing SeqAPASS	page 3-4
Returning Users (page 3)
First Time Users (page 4)
Messages from the SeqAPASS Development Team	page 4
SeqAPASS Home Tab	page 5
Request SeqAPASS Run Tab	page 5-11
Identify a Protein Target (page 6)
Query "By Species " (page 7)
Query "By Accession " (page 10)
SeqAPASS Run Status	page 12-13
View SeqAPASS Reports	page 14-19
View Report (page 15)
Save Report(s) (page 15)
Level 1: Primary Amino Acid Sequence Alignment	page 20-26
Primary Report Settings (page 22)
Susceptibility Cutoff Box for Level 1	page 26-29
No Orthologs Detected (page 28)
Level 2: Functional Domain(s) Alignment	page 30-32
View Level 2 Data Page	page 32-37
Primary Report Settings (page 35)
Susceptibility Cutoff Box for Level 2	page 38-41
No Orthologs Detected (page 40)
Level 1 and Level 2 Data Visualization	page 41-50
Level 1 and 2 Information Page (page 43)
Level 1 and 2 BoxPlot Page - Controls (page 44)
Level 3: Individual Amino Acid Residue Alignment	page 51-60
View Level 3 Individual Amino Acid Query and Data Page	page 61-66
Level 3 Data - Primary Report (page 64)
Level 3 Data - Full Report (page 65)
Level 3: Data Visualization
Information Page (page 67)
Level 3 Heat Map	page 67-71
Decision Summary Report	page 71-74
Download DS Report as PDF	page 74-77
Moving Between Level 1, Level 2, and Level 3 Data Pages	page 77-78
Search, View, and Download Data Tables	page 78-79
Log out	page 79
Pop-up Messages	page 79-82
SeqAPASS Documentation	page 82-91
1

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Background
The SeqAPASS tool has been developed to predict across species relative intrinsic susceptibility
to chemicals with known molecular targets (e.g., pharmaceuticals, pesticides) as well as evaluate
conservation of molecular targets from high-throughput screening assays (i.e., U.S. Environmental
Protection Agency ToxCast Program) and molecular initiating events (MIEs) and early key events in the
adverse outcome pathway framework, as a means to extrapolate such knowledge across species. The term
"relative" is used because it is recognized that molecular target similarity is one consideration, though an
important one, for making predictions of susceptibility to a chemical. Other important considerations for
susceptibility that are not evaluated using the SeqAPASS methodology include how well a chemical is
absorbed, distributed, metabolized, and eliminated, life stage, and other life history traits. Also, "relative"
indicates that the determination of sequence similarity between proteins is based on comparison to a
single protein sequence for a specific species. Additionally, we describe "intrinsic susceptibility" as the
vulnerability (or lack thereof) of an organism to chemical perturbation due to its inherent biological
composition.
Cross-species comparisons of proteins can be conducted through examination of sequence and
structural information, depending on how well the protein has been characterized and what is known
about a chemical-protein interaction. SeqAPASS allows the user to assess various levels of protein
sequence detail across species including comparisons of primary amino acid sequence (including ortholog
detection), functional domain(s), and individual amino acid residue positions. Each level requires a
greater understanding of the protein and its interaction with a chemical of interest (or similar ligand).
Because human and veterinary drugs, as well as pesticides, are designed to act specifically on well
characterized molecular targets, these chemical classes have proven useful for demonstrating the utility of
the SeqAPASS tool and its application to various hazard assessment/research scenarios.
The pertinent information necessary to begin a SeqAPASS query includes: the identification of a single
(or multiple) query species and a query protein, which would be the molecular target(s) of interest (e.g.,
receptor or enzyme).
The SeqAPASS algorithms mine, collect, and collate information from the National Center for
Biotechnology Information (NCBI) protein database (http://www.ncbi.nlm .nih. gov/protein/). conserved
domains database (http://www.ncbi .nlm .nih. gov/cdd/). taxonomy database
(http://www.ncbi .nlm .nih.gov/taxonomv/). strategically utilizes the Stand-Alone Basic Local Alignment
Search Tool for proteins (BLASTp)
(http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE TYPE=BlastDocs&DOC TYPE=Download
and the Constraint-based Multiple Alignment Tool (COBALT)
(http://www.st-va.ncbi.nlm.nih.gov/tools/cobalt/re cobalt.cgi).
2

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Accessing SeqAPASS
For optimal SeqAPASS performance use Chrome
Access SeqAPASS using the following URL: https://seaapass.epa.gov/seaapass/
Returning Users
Click "Login"
New to SeqAPASS Version 4.1 (See the About page for more details)
•	Updated protein, taxonomy, and conserved domain data
•	Updated BLAST executables
New to SeqAPASS Version 4 (See user guide for more details)
•	New EPA compliant login through the Web Application Access
•	Integrated information and help buttons
•	Links to guide user to an appropriate query protein
•	Level 1, Level 2, and Level 3 data summary reports
•	Interoperability with the ECOTOX Knowledgebase to compare sequence-based susceptibility predictions to existing empirical toxicity data
•	Expedited identification of literature to support Level 3, critical individual amino acid residue, comparisons using Reference Explorer
•	Ability to create Level 3 Data reports with combined taxonomic groups
•	Seleno-cysteine (U) added to Level 3, critical individual amino acid residue comparisons
Log In to SeqAPASS	Version 4.1
Welcome to SeqAPASS
©
Login

For optimal SeqAPASS performance use Chrome ©
Want an account? Click here for instructions.

About SeqAPASS
Select either "Login with EPA LAN User ID & Password" or "Login with Windows Kerberos SSO"'.
a a a
EPA Enterprise Authentication
Login with ...
User ID & Password
Login with User
ID & Password
3
Windows Single Sign-On
Login with Windows Kerberos SSO

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
First time users
To request a username and password to access the SeqAPASS tool, select "here" below the login and
follow the directions 011 the next page. The directions are different for the internal EPA user versus the
external non-EPA user; however, the user type does not limit access to the tool. Everyone that requests an
account will be given one in a timely manner. Individual account allows users to store all previous
SeqAPASS runs. Once the user has obtained their username, external users will select "Login with EPA
LAN User ID and Password."
EPA Users
1.	Go to https://waa.epa.aov and login with your existing EPA LAN id and password.
2.	Under the "Community Access" menu, select "Request Web Community Access"
3.	Select the "SeqAPASS Users" community and click submit.
4.	Return to the SeqAPASS login page to access SeqAPASS
External Users
1.	Go to https://waa.epa.gov and click on the "Self Register" link.
2.	Fill out the form using the following EPA Contact information:
o EPA Contact Name - Carlie Lalone
o EPA Contact's Email Address - lalone carlieQepa.gov
o EPA Contact's Phone Number 218 529-5038
3.	Select the "SeqAPASS Users" community from the dropdown menu at the bottom of the page.
4.	Once you submit the form you will receive an email confirming your request and a follow-up email with your username once
your account has been activated.
On the Log in screen the user will provide the necessary Login information:
EPA User: EPA LAN User ID & Password or Login with Windows Kerberos SSO
External User: Username and Password
Upon creating your password, login to SeqAPASS as described above for Returning Users. To change a
password at any time, go to waa.epa.gov and select "User Profile" to reset. The user will then use the new
password to login.
Messages from the SeqAPASS development team
Look for messages about planned version releases, data updates, and/or fixes to the SeqAPASS tool.
These will occasionally be displayed below the SeqAPASS banner when the development team has
information to share with SeqAPASS users.
Sequence Alignment to Predict Across Species Susceptibility
(SeqAPASS)
New to SeqAPASS Version 4.1 (See the About page for more details)
•	Updated protein, taxonomy, and conserved domain data
•	Updated BLAST executables
New to SeqAPASS Version 4 (See user guide for more details)
•	New EPA compliant login through the Web Application Access
•	Integrated information and help buttons
•	Links to guide user to an appropriate query protein
•	Level 1, Level 2, and Level 3 data summary reports
•	Interoperability with the ECOTOX Knowledgebase to compare sequence-based susceptibility predictions to existing empirical toxicity data
•	Expedited identification of literature to support Level 3, critical individual amino acid residue, comparisons using Reference Explorer
•	Ability to create Level 3 Data reports with combined taxonomlc groups
•	Seleno-cysteine (U) added to Level 3, critical individual amino acid residue comparisons
4

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
SeqAPASS Home Tab
The "Home" tab indicates who is logged in to the tool (right-hand of the screen) and contains links to
obtain information about the SeqAPASS tool (About SeqAPASS), including contact information for
support and references to published articles describing the SeqAPASS tool and its applications. Other
relevant references to databases and tools are also referenced. A link to the SeqAPASS User Guide can
also be found on this page. To Submit a Comment/Question click on the "Submit Comment/Question"
link to email the developer. "Log out" icon in upper right-hand corner of screen can be clicked at any time
to log out. "Information" buttons are present throughout SeqAPASS to give the user additional
information or instruction regarding features and functionality of the tool. "Exit" buttons are also present
by each external (non-EPA) link that takes the user to a page NOT maintained by the EPA.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)	Logout
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
Welcome to SeqAPASS	Version 4.0	Logged in as: Blatz,Donovan
SeqAPASS Home
About SeqAPASS

SeqAPASS User Guide exit

Submit Comment/Question or Report a Problem©

Request SeqAPASS Run Tab
Clicking the "Request SeqAPASS Run" tab opens a page to enter the query information necessary for a
SeqAPASS run. Each section of the "Request SeqAPASS Run" will be described below:
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Log out
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

Request Level 1 SeqAPASS Run
Version 4.0
Logged in as: Blatz,Donovan
5

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Identify a Protein Target
SeqAPASS is designed to predict cross species chemical susceptibility. Protein targets are often decided
based on chemical, adverse outcome pathway (AQP), or high-throughput screening (HTS) assay target.
Resources have been provided, as links, to aid the user in searching for appropriate protein targets and can
be accessed by selecting the drop-downs found in the "Identify a Protein Target" box.
Identify a Protein Target

SeqAPASS is designed to predict cross species chemical susceptibility based on a protein molecular target. The following resources have been identified to guide the user to an
appropriate protein target based on the chemical, adverse outcome pathway (AOP), or high-throughput screening (HTS) assay target of interest. Click the help buttons below for
descriptions of how to find relevant protein target information from these resources.
All links will open in a new tab.
The following links exit the site j EXlT|
*• Pharmaceutical protein targets:
httDs://www.druabank.ca
http://sitem.herts.ac.uk/aeru/vsdb/index.htm
httD://bidd.nus.edu.sa/aroup/cjttd/TTD HOME.asp
" Pesticides and other chemical protein targets:
http://www.t3db.ca
AOP chemical intiators:
httos://aoDwiki.ora
*¦ ToxCast HTS results by chemical:
https://comptox.epa.qov/dashboard
Select Search
There are two options for entering query information: "By Species" or "By Accession"' (See radio buttons
to the right of "Select Search"). Selecting "By Species'' will allow the user to enter text and select from a
dropdown list of species and then select a protein from any sequence available for that species in the
NCBI protein database. Selecting "By Accession"' allows the user to enter a NCBI protein accession.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Log out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports
Settings

Request Level 1 SeqAPASS Run Version 4.0
Logged in as: Blatz,Donovan





Identify a Protein Target
!_+]






Compare Primary Amino Acid Sequences
e


c i fC t. ® By Species
Select Search:
(J By Accession





6

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Query "By Species "
Type the name of the query species of interest in the "Query Species Search" text box. The species
common name, scientific name, or Taxid (ID number derived from the NCBI taxonomy database) may be
typed into the search bar. This is the species you would like to compare all other species to. The search
bar has an auto-complete function and will generate a list of species with corresponding Taxid. When text
is typed into the search bar, the auto-complete function queries the database in the order of "starts with"
then "contains." If an integer is typed in the search bar the auto-complete function queries the database in
the order of "Taxid", "starts with", then "contains."
Home Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

Request Level 1 SeqAPASS Run
Version 4.0
Logged in as: Blatz,Donovan
Identify a Protein Target
Compare Primary Amino Acid Sequences
i) By Species
Select Search:
Query Species Selection
Query Species Search:
[Homo sap





Add Query Species
I Homo sapiens (Taxid:9606) I
Query Species:
Homo sapiens Linnaeus, 1758 (Taxid:9606)
Homo sapiens neanderthalensis (Taxid:63221)
Homo sapiens ssp. 'Denisova' (Taxid:741158)
Homo sapiens ssp. Denisova (Taxid:741158)
Homo sapiens subsp. 'Denisova' (Taxid:741158)
Homo sapiens x Mus musculus hybrid cell line (Taxid:1131344)

Note: The user can also use the NCBI taxonomy database to identify query species using the NCBI link
on the right-hand side of the "Add Query Species" button.
Select species of interest by clicking on the name in the drop-down box. Once species is selected, click
"Add Query Species" button. This advances the species of interest to the "Query Species" box and fills
the "Query Proteins" box with all available protein sequences for that species from the NCBI protein
database (although the box only displays the initial 200 proteins/species based on lowest numerical
accession number). The protein list includes the protein NCBI accession, protein name, and species
scientific name.
Query Species Selection
°1
Query Species Search:
Add Querv SDecies NC.RI Taxonomy natahase
saDiens (Taxid:9606)




Query Protein Selection
4
Query Protein Search:
Filter Protein NCBI Protein Database IBPH

Query Proteins:
[NP_000005.2] alpha-2-macroglobulin isoform a precursor
[NP_000006.2] arylamine N-acetyltransferase 2
[NP_000007.1] medium-chain specific acyl-CoA dehydrogenase, mitochondrial isofori
[NP_000008.1] short-chain specific acyl-CoA dehydrogenase, mitochondrial isoform 1
[NP_000009.1] very long-chain specific acyl-CoA dehydrogenase, mitochondrial isofo ^


Add Selected Protein(s)

7

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
To filter the query protein list, type the query protein name or partial name in the "Query Protein Search"
box and click the "Filter Protein'" button. This action will filter the protein list in the "Query Proteins" box
to only display proteins that contain the user defined text (this search query does not contain an autotill
feature due to the filter feature). Proteins will be listed in alphabetical order based on NCBI accession
Example: typing "estrogen" retrieves all proteins that contain the word "estrogen" in the protein name
(the user can scroll to identify proteins of interest).
Query Protein Selection

©



Query Protein Search:
estrogen|
Filter Protein NCBI Protein Database EXIT
Query Proteins:
[NP 000116.2] estrogen receptor isoform 1


[NP 001035055.1] G-protein coupled estrogen receptor 1



[NP_001035365.1] estrogen receptor beta isoform 2



[NP	001091671.1 ] G-protein coupled estrogen receptor 1



[NP_001116212.1] estrogen receptor isoform 1



~

Add Selepted Protein(s)


Note: To explore details associated with a protein of interest, click the "NCBI Protein Database" link to
the right of the "Filter Protein" button to open NCBI proteins database (See SeqAPASS Documentation
section of user guide for details about searching for query proteins using NCBI database).
Highlight the protein or proteins of interest (Ctrl left click to select multiple proteins) in the "Query
Proteins" box and click "Add Selected Protein(s)" button. This moves the protein(s) of interest to the
"Final Query Protein(s)" box. To remove proteins from the "Final Query Protein(s)" box highlight those
to be removed and click the "Remove Selected Protein(s)" button. Select "Remove All Proteins" to
discard all proteins from "Final Query Protein(s)" box. The clear button removes all infonnation
previously entered on the "Request SeqAPASS Run" page.
Query Protein Selection
Query Protein Search:
Query Proteins:
Filter Protein
NQ8I Protein Database BEflTjl
[NP_001258805.1] estrogen receptor beta isoform 5
[NP_001258806.1] estrogen receptor beta isoform 6
[NP_001278170.1] estrogen receptor isoform 3
[NP_001278641.1] estrogen receptor beta isoform 2
J Selected Protein(s)
SeqAPASS Submission
Final Query Protein(s) [NP_001258805.1] estrogen receptor beta isoform 5
[NP_001278159.1] estrogen receptor isoform 2
[NP_001278641.1] estrogen receptor beta isoform 2
Remove Selected Protein(s) Relieve AM Protejns
Request Run Clear
8

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Once the user identifies the protein(s) to be queried, select "Request Run." A message will briefly appear
in upper right-hand corner of the screen for 10 seconds to alert the user of the request status.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Please note that SeqAPASS has been updated to Data Version 4 (see About page for details).
Success
Submitted
NP 001230447.1
submitted
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
S
T
Success


Submittec

Request Level 1 SeqAPASS Run
Version 4.0


NP_001230448.1:
submitted
Success
Submitted
NP 001248338.1
submitted
Identify a Protein Target
SeqAPASS is designed to predict cross species chemical susceptibility based on a protein molecular target. The following resources ha'
appropriate protein target based on the chemical, adverse outcome pathway (AOP), or high-throughput screening (HTS) assay target of ii
descriptions of how to find relevant protein target information from these resources.
All links will open in a new tab.
The following links exit the site IIBKUT
~	Pharmaceutical protein targets:
~	Pesticides and other chemical protein targets:
~	AOP chemical intiators:
~	ToxCast HTS results by chemical:
Multiple proteins can be added to the final list for multiple SeqAPASS runs. If another query species is
desired, return to; "Query Species Search" to select the next species. Follow the process described above
for selecting the proteins associated with this species. The proteins populated in the "Query Proteins" box
will always be associated with the species highlighted in the "Query Species" box.
Note; In the current version of SeqAPASS, PLEASE do not request more than 10 query proteins at a
time to avoid longer wait times for the completion of a run.
Query Species Selection
Query Species Search:
I
Add Query Species
NCBI Taxonomv Database EXfT
Query Species:
Homo sapiens (Taxid:9606)


Bos taurus (Taxid:9913)




Query Protein Selection
Query Protein Search:
Filter Protein	NCBI Protein Database
Query Proteins: [NP_001001133.2] protein argonaute-3
[NP_001001134.1] solute carrier organic anion transporter family member 3A1
[NP_001001135.2] collagen alpha-1 (II) chain isoform 1 preproprotein
[NP_001001136.2] hepatoma-derived growth factor-like protein 1
[NP 001001137.1] UAP56-interacting factor
Add Selected Protein(s)
Note; A user may check the progress of the run by clicking on the "SeqAPASS Run Status" tab. (See
SeqAPASS Run Status section of the user guide for more information)
9

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Query "ByAccession"
Users familiar with the NCBI database can utilize NCBI protein accessions (e.g., NP_000116.2) to query
the SeqAPASS tool. This is done by selecting the "By Accession" radio button to the right of the "Select
Search" text on the "Request SeqAPASS Run" page.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Log out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports
Settings

Request Level 1 SeqAPASS Run Version 4.0
Logged in as: Blatz,Donovan





Identify a Protein Target
S






Compare Primary Amino Acid Sequences
0


By Species
Select Search:
By Accession





Upon selecting the "By Accession" radio button, a new query page will be displayed. Type the NCBI
protein accession (e.g., NP_000116.2) for the protein of interest (this Accession comes from the NCBI
protein database; See "SeqAPASS Documentation" for details) in the "NCBI Protein Accession" box. If
desired, more than one NCBI Accession may be entered into the "NCBI Protein Accession" box by
clicking the enter key after each additional NCBI Accession entry.
Upon clicking the "NCBI Protein Accession" text box, a pop-up message will appear in the middle of the
text box, to provide an example for the proper format of Accessions to be entered.
SeqAPASS Submission

NCBI Protein Database (pClt"

NCBI Protein Accession:





Request Run Clear


Note: To avoid longer wait times for the completion of a run, in the current version of SeqAPASS, please
do not request more than 10 NCBI Accessions at a time.
10

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
Request Level 1 SeqAPASS Run	Version 4.0	Logged in as: Blatz.Donovan
Identify a Protein Target
Compare Primary Amino Acid Sequences
) By Species
Select Search:
By Accession
SeqAPASS Submission
NCBI Protein Database
NCBI Protein Accession:
NP 000116
Request Run
After the NCBI accession(s) of interest have been typed in the "NCBI Protein Accession" box, click the
"Request Run" button. To remove proteins from the "NCBI Protein Accession" box click the "Clear'
button. A message will briefly appear in the upper right-hand corner of the screen to alert the user of their
run request status.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Success
NP 001315029: submitted:
A Please note that SeqAPASS has been updated to Data Version 4 (see About page for details).
NP_001315029.1 j
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

Request Level 1 SeqAPASS Run
Version 4.0
Logged in as: Blatz.Donovan
Identify a Protein Target
Compare Primary Amino Acid Sequences
By Species
Select Search:
• By Accession


SeqAPASS Submission


NCBI Protein Data
base ¦¦


NCBI Protein Accession:



Request Run Clear




Note: All NCBI Accessions can include the version number (one digit after the decimal place, e.g.,
NP 000116.2). Otherwise, if the version is not included, the most recent version of the accession will be
queried automatically.
11

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
SeqAPASS Run Status
Level 1 SeqAPASS (primary amino acid sequence comparisons) status is displayed as the default. The
Accession in the column "Level 1 Query Accession" is that selected and queried by the user. For a query
to finish it must display "complete" in the BLASTp column, 100% in the "Common Domains" column,
and 100% in the "Ortholog Candidate" column. The "Common Domains" column displays the %
completion for running Reverse Position Specific (RPS)-BLAST (Default E-value of <0.01) on the
Accessions from the Level 1 Full Report. RPS-BLAST, and therefore "Common Domains" status, will
take the longest to complete. The "Ortholog Candidate" column displays the % completion for running a
reciprocal best hit BLAST evaluation for each hit sequence. The status for the "BLASTp" column is
described as "started," "analyzing," or "complete." If the user's successfully submitted query has entered
the run queue, the position of the submitted query in the queue will be indicated in the column (e.g., 2nd in
queue). The "Common Domains" and "Ortholog Candidate" columns will also describe the position of
the user's submitted query in the run queue. Once the run has begun processing, the % completed for
RPS-BLAST or reciprocal best hit BLAST, respectively, will be displayed. Please see example below:
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)	Log out
Home Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports Settings

SeqAPASS Run Status

Version 4.0
Logged in as: Blatz,Donovan
® Level 1 Status
Q Level 2 Status	^Refresh Data
Q Level 3 Status
SeqaPASS Level 1 Run Status
Search: Enter keyword
SeqAPASS Run
Id -
Data Version J User 5 Level 1 Query BLASTp 0 Common Ortholog Start Date; Date Completed : SeqAPASS Run Duration 3
Accession 5 Domains S Candidate ;
1310
4
Blatz.Don ovan@epa.gov
NP_001315029.1
complete
100%
100%
2019 09 04 10:24:21
2019 09 04 10:27:04
2 minute(s) 43 second(s)
1309
4
Batz.Donovan@epa.gov
NP 001230447.1
complete
100%
100%
2019 09 04 10:14:04
2019 09 0410:24:35
10 minute(s) 31 second(s)
1306
4
Blatz.Donovan@epa.gov
NP_001230448.1
complete
100%
100%
2019 09 04 10:14:04
2019 09 0410:24:37
10 minute(s) 33 second(s)
1300
4
Blatz.Donovan@epa.gov
NP 001248338.1
complete
100%
0%
2019 09 04 10:14:04
Not Finished

1308
3
Blatz.Donovan@epa.gov
NP 001258805.1
complete
100%
100%
2019 09 04 10:12:07
2019 09 04 10:12:07
1 seconds
1308
3
BJatz.Donovan@epa.gov
NP 001278159.1
complete
100%
100%
2019 09 04 10:12:07
2019 09 04 10:12:07
1 seconds
1308
4
BJatz.Donovan@epa.gov
NP 001258806.1
complete
100%
100%
2019 09 04 10:12:07
2019 09 04 10:19:24
7 minute(s) 17 second(s)
1306
3
BJatz.Donovan@epa.gov
NP_000116.2
complete
100%
100%
2019 08 29 14:53:03
2019 08 2914:53:03
1 seconds
1303
3
BJatz.Donovan@epa.gov
CAC38767.1
complete
100%
100%
2019 08 27 12:31:18
2019 08 2712:39:25
8 minute(s) 7 second(s)
1302
3
BJatz.Donovan@epa.gov
NP 571229.3
complete
100%
100%
2019 08 27 12:24:34
2019 08 27 12:50:34
26 minute(s) 0 second(s)
(1 of 3) fflEEEB MS Download Table: —
Top of Page
The user can view the status of requested SeqAPASS runs. Each Run is assigned a unique "SeqAPASS
Run Id.' A Run is considered a query that was requested either individually or as a batch in the "Request
SeqAPASS Run" tab. The user can view run start and end dates/times, and the duration of the run. (See
Search, View, and Download Data Tables section of user guide for more information). The "Data
Version" column indicates which version of NCBI data is being used (See "About" page for details on
Data Versions)
The user is also able to view the status of Level 2 (Functional domain(s)) and Level 3 (individual amino
acid residue alignments).
12

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View Level 2 Status by selecting the radio button. Also, while viewing the page, the user can click the
"Refresh Data" button to refresh the data. "'Level 1 Query Accession" column displays the NCBI
accession selected and queried by the user. Please see below:
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Log out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports
Settings

SeqAPASS Run Status
Version 4.0
Logged in as: Blatz,Donovan
Q Lever 1 StatJ5
$ Leve? 2 Status	RefreshjData
O Leve* 3 Status
SeqaPASS Level 2 Run Status
Search: Enter keyword
SeqflRASS Data Version Usef s NCBI Accessionc Domain Type; BLASTp; Start Date J Date Completed 8 SeqAPASS Run Duration i
2410
3
Blate.Donovai@epa.gov
AQZ30559.1
AQZ36559.1
p450
compete
2019 08 23 08:45:05
2010 08 28 08:45:29
24 seconds
2414
3
Biate.Dofiovan@epa.gov
XP 008582383.1
XP 006562383.1
PLN02183
complete
2010 08 23 14:14:16
2010 08 23 14:14:26
10 seconds
2413
3
Eatz.Donovan@epa.gov
XP 008562383.1
XP 008562383.1
PLN02425
compete
2010 08 23 13:50:45
2010 08 23 13:50:54
9 seconds
2412
3
Biatz.Don ovan@epa.gov
AQZ38556.1
AQZ36559 1
CypX
compiete
2010 08 2312:23:17
2)10 08 23 12:23:32
15 seconds
2411
3
Biate.Doflovan@epa.gov
ALG65G81.1
ALG65G81.1
CypX
complete
201008 2311:01:31
2010 08 2311:01:44
13 seconds
2410
3
Bi3tz.Donovan@epa.gov
NP C00118.2
NP 000118.2
NR LBD ER
compete
2010 03 23 00:40.41
2010 08 23 06:47:27
46 seconds
2409
3
BSatz.Donovan@ep3.gov
NP 000118.2
NP 000118.2
NR LBD HNF4 lice
compiete
201008 2011:54:38
2010 082011:54:40
13 seconds
2408
3
B5atz.Dooovan@epa.gov
NP 000452.2
NP 000452.2
NR LBD TR
compiete
2010 OS 1010:07:25
2010 08 19 16:07:36
11 seconds
2407
3
Ratz.Dofiwan@epa.gov
NP 001028.1
NP 001028.1
V-set
complete
2010 OS 1013:54:30
2019 08 19 13:54:32
2 seconds
(1 of 1)	[f]	110 * j Download Table:
Top of Page
View Level 3 Status by selecting the radio button. "Level 1 Query Accession" column displays the NCBI
accession selected and queried by the user. The "Job Name" is the user defined name chosen to describe
the Level 3 alignment. Also, while viewing the page, the user can click the "Refresh Data" button to
refresh the data. Please see below:
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Log out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports
Settings

| SeqAPASS Run Status
Version 4.0
Logged in as: Blatz,Donovan |
Q Levei 1 Status
© Levei 2 Status	Refresh Data
(i) Level 3 Status
SeqaPASS Level 3 Run Status
Search:| Enter keyword |
SeqAPASS
Run Id •
Data Version
User c
Job Name ;
Level 1 Query
Accession S
Template Accession :
COBALT :
Start Date :
Date Completed J
SeqAPASS Run Duration :
800
3
Si3CDonovan@ep3.go\
Actmopteri
NP 000110.2
NP_OOO110-2
compiete
2010 082014:55:57
2010 08 29 14:55:50
2 seconds
061
3
BsaC.Donovan@epa.gc>
Bee run
AQZ3S559.1
AQZ36559.1
compete
2019 08 27 12:33:07
2019 08 27 12:38:06
2 seconds
860
3
BiaCDonovan@epa.go
Tea
AQZ36559.1
XP 006502363.1
complete
2019 08 23 12:20:48
2019 08 23 12-.20-.50
2 seconds
850
3
B
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View SeqAPASS Reports Tab
The "View SeqAPASS Reports" tab provides a table of completed SeqAPASS runs. From this page the
user can choose to either "View Report" or "Save Report* s)."
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)	Log out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports	Version 4.0	Logged in as: Blatz,Donovan
^Partial Protein Sequence
Reguest Selected Report
Refresh Available Reports
# View Report


G Save Report(s)


The completed runs, by default, are listed in the order in which they were completed, with the most recent
runs at the top. The table includes information for each ran, such as SeqAPASS Run ID (unique for every
run regardless of if it is the same protein/species combination ran twice), Data Version, Ortholog Count
(number of orthologs detected from the aligned hit sequences in Level 1; see Detailed Documentation
page 79), NCBI Accession, Query Protein Name, taxonomy information for the query species, and the
date/time of ran completion.
While viewing the page, the user can click the "Refresh Available Reports"' button to refresh the table
with additional completed runs. Partial protein sequences are highlighted in yellow as illustrated in the
example below. (See Search, View, and Download Data Tables section of user guide for more
information).
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports	Version 4.0	Logged in as: Blatz,Donovan
(^Partial Protein Sequence
Request Selected Report
Refresh Available Reports
# View Report


© Save Report(s)


Available Reports



Search: Enter keyword


SeqAPASS
Run Id *
Data Version
Ortholog Count
Level 1 Query
Accession 0
Query Protein Name 0
NCBI
Taxonomy ID 0
Query

1310
4
3
NP_001315029.1
estrogen receptor isoform 4
9606
Ho

1309
4
16
NP_001230447.1
estrogen-related receptor gamma isoform 6
9606
Ho

1309
4
57
NP_001230448.1
estrogen-related receptor gamma isoform 2
9606
Ho

1308
3
9
NP_001258805 1
estrogen receptor beta isoform 5
9606
Ho

1308
3
45
NP_001278159.1
estrogen receptor isoform 2
9606
Ho

1308
4
38
NP 001258806.1
estrogen receptor beta isoform 6
9606
Ho
Q
1306
3
348
NP_000116.2
estrogen receptor isoform 1
9606
Ho

1303
3
305
CAC38767 1
cytochrome P450 aromatase
90988
Pimep

1302
3
104
NP_571229.3
aromatase
7955
D

1301
3
0
APO40848.1
PsbA, partial (plastid)
93036
P
(1 of 3)	12 3 - : 10* Download Table:
Top of Page
14

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View Report
To select a completed am and view Level 1 data, select the corresponding radio button in the first column
of the table and click "Request Selected Report." This will open the Level 1 page to view the Level 1 data
and to set up queries for Level 2 and Level 3.
Note: The user MUST select a radio button PRIOR to clicking "Request Selected Report." If the user
fails to select a radio button and clicks "Request Selected Report" a Spinning Wheel will appear and
disappear, and no completed run will be opened. Further, there is no pop-up message indicating that the
user did not select a radio button.
SeqAPASS Reports
Version 4.0
Logged in as: Blatz,Donovan
Home
Request SeqAPASS Run
SeqAPASS Run Status View SeqAPASS Reports Settings
MPartial Protein Sequence
Request Selected. Report Refresh Available Reports
View Report

Q Save Report(s)

Available Reports
Search: tnter keyword |
SeqAPASS 1 Data Version Ortholog Count Level 1 Query 0 p , ( ^ . NCBI Q Snprjps N,mp „
Run Id - : i Accession ; Uueiy Frotein Name v Taxonomy ID ; uuery species Name -

1310
4
3
NP_001315029.1
estrogen receptor isoform 4
9606
Homo sapiens

1309
4
16
NP_001230447 1
estrogen-related receptor gamma isoform 6
9606
Homo sapiens

1309
4
57
NP_001230448 1
estrogen-related receptor gamma isoform 2
9606
Homo sapiens

1308
3
9
NP_001258805 1
estrogen receptor beta isoform 5
9606
Homo sapiens

1308
3
45
NP_001278159 1
estrogen receptor isoform 2
9606
Homo sapiens

1308
4
38
NP_001258806.1
estrogen receptor beta isoform 6
9606
Homo sapiens
O 1306 3
348
NP_0001162
estrogen receptor isoform 1 9606 Homo sapiens

1303
3
305
CAC38767.1
cytochrome P450 aromatase
90988
Pimephales promelas

1302
3
104
NP_571229.3
aromalase
7955
Danio rerio

1301
3
0
APO40848 1
PsbA partial (plastid)
93036
Poa annua
(1 of 3)	1 ,2j3 [rllljio • Download Table: - —
Save Report(s)
To download completed Level 1, 2, and/or 3 data, select the "Save Report(s)" radio button. Upon doing
so the user can select which accession(s) to download by clicking the checkbox in the first column of the
table associated with desired accession and click "Save Selected Report(s)."
Home Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

| SeqAPASS Reports

Version 4.0

Logged in as: Blatz.Donovan |
SPartial Protein Sequence	save.Selected Reporus) .Refresh A/a liable.Reports
O View Report
<§> Save Report(s)
Available Reports
Search:! E nter keyword |
SeqAPASS '
Run Id *
Data Version '
Ortholog Count
Level 1 Query
Accession ;
Query Protein Name :
NCBI
Taxonomy ID c
Query Species Name s
H 1310
—J,
m 3
NP_001315029-1
estrogen receptor ispform 4
9606
Homo sapiens
H 1309
-4
16
NP_001230447.1
eslrogefl-related receptor gamma isoform 6
9606
Homo.sapiens
rar
1309
4
67
NP_001230448 1
estrogen-related receptor gamma isoform 2
9606
Homo sapiens
y I 1308
3
9
NP_001258805 1
estrogen receptor beta isoform 5
9606
Homo sapiens

.3.
45
NPJ»1278159.1
estrogen receptor isoform 2
9606
Hompsapiens
yI 1308
4
38
NP_001258806 1
estrogen receptor beta Isoform 6
9606
Homo sapiens


348
NP_000116.?.
estrogen receptor isofonn 1
9606
Homo.sapiens


305
CAC38767.1
cytochrome PfSO aromatase
90988.
Pimephales promelas
,302
3
104
NP_571229.3
aromatase
7955
Danio rerio
y| 1301

0
APO408481
PsbA. partial (plastid)
93036
Poa annua
(1 of 3)	12 3"- 10'
15

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The user can also deselect data that is not wanted in the download by scrolling to the far right of the table
and deselecting the checkboxes for the different levels of the SeqAPASS analysis. By default, all
available data for the selected accession will be downloaded in a zip fde.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Log out
Home Request SeqAPASS Run SeqAPASS Run Status
View SeqAPASS Reports
Settings
SeqAPASS Reports
Version 4.0
Logged in as: Blatz,Donovan
BiPartial Protein Sequence
Save Selected Ret
warasl^. j RefreshAvailable Reports.
Q View Report


# Save Report(s)


Available Reports




1
Search JEnteMreyword [
ry Protein Name i
NCBI
Taxonomy ID o
Query Species Name ;
Query Common Name ;
Taxonomy c Level 1 Level 2
Level 3
en receptor isoform 4
9606
Homo sapiens
Human
Mammalia
n
¦
¦
ed receptor gamma isoform 6
9606
Homo sapiens
Human
Mammalia



ed receptor gamma isoform 2
9606
Homo sapiens
Human
Mammalia
~

a
¦

receptor beta isoform 5
9606
Homo sapiens
Human
Mammalia
¦

a
¦

j
en receptor isoform 2
9606
Homo sapiens
Human
Mammalia
a

a
a

i receptor beta isoform 6
9606
Homo sapiens
Human
Mammalia



en receptor isoform 1
9606
Homo sapiens
Human
Mammalia
a
a

a

ome P450 arqmatase
90988
Pimephales promejas
Fathead minnow
Actinopten
a
a

a

1
aromatase
7955
Danio rerio
Zebrafish
Actinopten
B
a
y
oA, partial (plastid)
93036
Poa annua
Bluegrass
Liiiopsida
B
~
~
(1 of 3) 1
I 2 j| 3 j' i 10'

Top of Page
A WinZip fde will be created for all the selected Reports.
Request SeqAPASS Run SeqAPASS Run Status
View SeqAPASS Reports Settings
Versil C Save As
SeqAPASS Reports
" * > This PC > Downloads
SjPartial Protein Sequence
Q View Report
# Save Report(s)
. Refresh Ava i lable Repo
* Quick;
match your search.
& Network
Available
Search: Enter
iy Protein Name :
Query
ed receptor gamma isoform 6
Save as type;
Actinopten
bA partial (plastid)
Liiiopsida
(1 of 3)
Top of Page
en receptor isoform 4
WinZip File (".zip)
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Log out
16

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
A pop-up seqapass.zip file should appear with data files for each selected report. The naming convention
is the NCBI Protein Accession and the Data Version (e.g., AAG31441.2_v2).
1 13 & v ! seqapass - WinZip


1 ° II a lba.1
Unzip/Share Edit Backup Tools Settings View
Help Upgrade


seqapass.zip

Actions


Recent Zip Files


Unzip All Files


3j> seqapass.2! p
. AAG31441.2_v2
P Type: Folder
Date modified; 5/17/2017 8:58 AM
A Unzip to:
1* ,\Aa.ad,e.,.\seqapass


seqapass-l.zip
™ . 1
. AAK85198.1_v2
t Type: Folder
Date modified; 5/17/2017 8:58 AM
Convert & Protect Files


~ f seqapass-2 ^ip
i AAQ03208.1_v2
h Type: Folder
Date modified: 5/17/2017 8:58 AM
Date modified; 5/17/2017 8:58 AM
When adding files to this zip:
H Encrypt Off


Places
i ACD44939.1_v2
p*1 Convert to PDF llll^l!


1 Type: Folder




Favorites
. CAA10110.1_v2
„ ... r=i ' Resize Photos Off


uate moarriea: o:x mm




t . Type: Folder

J. Watermark Off


' • "j Libraries
. NP_001267576.1_v2
f Type: Folder
Date modified: 5/17/2017 8:58 AM
Save or Share Zip


irMgl Computer
B82G8 free of 464 GB
> P68279.2_v2
1 Type: Folder
Date modified; 5/17/2017 8:58 AM
Save as...

Network


iSi Email



~ 7 item(s)
Zip File: 44 item(s), 130 MB



By clicking on one of the Reports for a Protein Accessionversion, all available files for each Level of the
SeqAPASS evaluation are available.
Note; This download includes default settings only. If susceptiblity cut-off or any defaults were
manipulated on Level 1 or 2 pages they will NOT be downloaded here and can ONLY be downloaded
directly from the Level 1 or Level 2 page where the setting was manipulated by the user. Also, data
visualizations can ONLY be downloaded from the Level 1 and 2 pages. They DO NOT populate in the zip
file folders.
B b f* * seqapass-2 - WinZip	[ <=> [| B [[dSd
A-** Unzip/Share Edit Backup Tools Settings View Help
Upgrade



Files >
Recent Zip Files
©
AAB53939.1_v2
seqapass-2.zip

Actions
Unzip Selected Files

seqapass-2.zip
m, i
k
Level IReports
Type Folder
Date modified: 5/17/2017 9:03 AM
A Unzip to:
0 \\Aa.ad..,.\seqapass-2

—v seqapass-Lzip
m , i
I
Level2Reports
Type: Folder
Date modified: 5/17/2017 9:03 AM
Convert & Protect Files

aii seqapass.zip

Level3Reports
Date modified: 5/17/2017 9:03 AM
When adding files to this zip;



Type: Folder

Q Encrypt







=
Places



pj, Convert to PDF Ofi








W Favorites
\



Resize Photos Ofi











Watermark Ofi


* " • | Libraries



Save or Share Zip


itM-i Computer
382 GB free of 464 GB



f**j Save as...

-
Network



Email



~ 3 item(s)
Zip File: 78 item(s), 158 MB



17

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
By selecting "Level 1 Reports", both full and primary reports are available as csv files as well as a graphic
of the density plot for determining the susceptibility cut-off
0 M' ' seqapass-2
WinZip

1 ° II B ll^»l
Unzip/Share Edit
Backup Tools Settings View Help Upgrade
¦is

Files
Recent Zip Files
> ©
Levell Reports
seqapass-2 .zip ~ AAB53939.1_v2
Actions
Unzip Selected Files

—u seqapass-2.zip
1
e
AAB53939.l_Full_v2.csv
Type; Microsoft Excel Comma Separated Values File
Date modified; 5/17/2017 9:03 AM -K Unzip to:
Size: 167 KB -* 44.8 KB » \\Aa.ad....\seqapass-2

» i seqapass-l.zip
1

AAB53939.l_Full_v2_cutoff.png
Type; PNG Image
Date modified: 5/17/2017 9:03 AM
Size: 16.0 KB •+ 14.6 KB Convert & Protect Files

seqapasszip
1
G
AAB53939.l_Primary_v2.csv
Type Microsoft Excel Comma Separated Values File
. .	, ... When adding files to this zip:
Date modified: 5/17/2017 9:03 AM
Site 105 KB -» 26.3 KB Q Encrypt C D

Places
Favorites
n

AAB53939.l_Primary_v2_cutoff.png
Type; PNG Image
Date modified; 5/17/20179:03 AM M Co^rttoPDf Off M
Size: 161 KB -> 14.7 KB
Resize Photos Off
_L Watermark Off

' • I Libraries


Save or Share Zip

ilKgl Computer
™ 382 G8 free of 464 GB


S- Save as...

^2^1 Network


55 Email


I | 4 item(s)
Zip File: 78 item(s), 1.88 MB

By selecting ""Levcl2Rcpoits". all completed domain comparisons will be available and named by NCBI
domain accession with the starting amino acid residue position for the domain (e.g.. pfam00001(54)).
- 0 & v seqapass-2
WinZip

1 - II 'IKH
Unzip/Share Edit
Backup Tools Settings View Help
Upgrade
#
Files
> Level2Reports
seqapass-2.zip » AAB539391_v2

Actions
Recent Zip Files

Unzip Selected Files

seqapass-2.zip
> 1
j pfam00001(54)
r Type: Folder
Date modified: 5/17/2017 9:03 AM
A Unzip to:
0F \\Aa.ad....\seqapass-2

seqapass-lzip
^>1 , 1
i pfaml0320(54)
P1 • Type: Folder
Date modified: 5/17/2017 9:03 AM
Convert & Protect Files

seqapass.zip
m , i
k pfaml3853(54)
Date modified: 5/17/2017 9:03 AM
When adding files to this zip:


tPb Type; Folder

Encrypt OH





=
Places
^, Favorites


fyr Convert to PDF



Resize Photos ¦ B^^Wi



J. Watermark Off

"" * | Libraries


Save or Share Zip

Computer
382 GB free of 464 GB


f**|. Save as...

Network


US Email


~ 3 item(s)
Zip File: 78 item(s), 1.88 MB


18

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Upon selecting a domain file to view, both fhll and primary reports are available as csv files as well as a
graphic of the density plot for determining the susceptibility cut-off.
' B & ^ t seqapass-2 - WinZip	r^ni~Bi£ai
Unzip/Share Edit
Backup Tools Settings View Help Upgrade

#
Files
Recent Zip Files
> (£) pfam00001(54)
seqapass-2.zip ~ AAB53939.1_v2 ~ Level2Reports

Actions
Unzip Selected Files

jyv seqapass-2.zip
- pfam00001(54)_Full_v2.csv
W3,| Type: Microsoft Excel Comma Separated Values File
Date modified: 5/17/2017 9:03 AM
Size: 191 KB -> 45.0 KB
^ Unzip to: v
0 \\Aa.ad....\seqapass-2

g/ seqapass-Lzip
pfam00001(54)_Full_v2_cutoff.png
Type: PNG Image
Date modified: 5/17/2017 9:03 AM
Size: 18.4 KB -> 171 KB
Convert & Protect Files

seqapass.zip
Ipi pfam00001(54)_Primary_v2.csv
Date modified: 5/17/2017 9:03 AM
When adding files to this zip:


Type: Microsoft Excel Comma Separated Values File
Size 162 KB 4 37.4 KB
ft Encrypt |

Places
pf a rnOOOOl (54)_Pri m a ry_v2_cutoff. p ng
Date modified: 5/17/2017 9:03 AM
pj, Convert to PDF ; - i
5
Type: PNG Image
Size: 18.4 KB -» 171KB
qII Resize Photos C

Favorites
\






Watermark Off H

"''" "I Libraries


Save or Share Zip

f' ta' Computer
382 GB free of 464 GB
Network


B Save as...
^5 Email


4 item(s)
Zip File: 78 item(s), 1.88 MB


By selecting "L.e\ elSReports", all user defined Level 3 alignments are available as csv.
Note: These csv files show the alignments across the entire sequence, not just those amino acid residues
selected by the user.
0 El |1> v seqapass-2 ¦
Unzip/Share Edit
Files
Recent Zip Files
seqapass-2.zip
1
• seqapass-l.zip
m k i
£j|» seqapass.zip
k 1
WinZip
Backup Tools Settings View Help Upgrade
Places
" " ] Libraries
r Ay I Computer
382 GB free of 464 GB

> (£) LeveBReports
seqapas5-2,zip ~ AAB53939.1 v
seqapass-2.zip ~ AAB53939.1_v2
3 try(318)_v2.CSV	Date modified: 5/17/2017 9
Type; Microsoft Excel Comma Separated Values FileSize: 22.0 KB •¥ 4.77 KB
closer yet(310)_v2.CSV	Date modified: 5/17/2017 9
Type; Microsoft Excel Comma Separated Values FileSize: 513 KB -¥ 7.38 KB
fOUr(316)_v2.CSV	Date modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 28j5 KB -~ 4.98 KB
multi part test(313)_v2,csv	Dlte modified: 5/17/2017 9
Type; Microsoft Excel Comma Separated Values FileSize: 34.7 KB 8.06 KB
multijest with non canonicals(320)_v2.„. Datc modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 31.8 KB -~ 7.95 KB
not yet working(309)_v2.csv	Date modified: 5/17/2017 9
Type; Microsoft Excel Comma Separated Values FileSize; 51.2 KB 4 8.57 KB
repeat of 301(311)_v2.CSV	Date modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize; 31.5 KB -> 8.02 KB
Should be 3(319)_v2.csv	Drte modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 25.2 KB -* 4.76 KB
Actions
Unzip Selected Files
A Unzip to;
0 \\Aa.ad....\seqapass-2
Convert & Protect Files
When adding files to this zip:
U Encrypt	Off
Convert to PDF Oft
q Resize Photos Of |
Save or Share Zip
fH Save as.,.
^ Email
| | 14 'rtem(s)
Zip File: 78 item(s), 1.88 MB
19

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1: Primary Amino Acid Sequence Alignment
From the "View SeqAPASS Reports" tab, upon selecting a radio button and clicking "Request Selected
Report" the Level 1 data will be displayed.
The "Level 1 Query Protein Information" box contains the SeqAPASS Run ID, Query Accession,
Ortholog Count (# of hits identified as ortholog candidates to the query species protein sequence), NCBI
Data updates ("Protein and Taxonomy Data:" displays the date that NCBI databases were downloaded
and incorporated into the SeqAPASS database; BLAST Version: and Software Version: displays the
version being used by the SeqAPASS tool for the selected data), Query Species, and Query Protein. Other
information in this box will be described below.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)	Log out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports	Version 4.1	Logged in as: Donovan Blatz
Main Level 1 DS Report
Level 1 Query Protein Information
Hit proteins are identified for the following query protein. Use the main button to g
o back to the SeqAPASS Reports list.





SeqAPASS ID: 1631 Query Accession: NP 000116.2 exit
Ortholog Count: 410
Protein and Taxonomy Data: 06/08/2020



Query Species: Homo sapiens




BLAST Version: 2.10.0



Query Protein: estrogen receptor isoform 1




Software Version: 4.1



Susceptibility Cut-off


|
Level 2
o*|
Level 3
o +)

Primary Report Settings
O®










Visualization
OH

Refresh Level 2 and 3 runs



The default table displayed at the bottom of the page is the "Primary Report", which includes query
protein information in the first row below the column titles, followed by hit proteins whose sequences
aligned with the query protein. The hit proteins are ordered from the highest to lowest percent similarity
(Maximum percent similarity =100%). For each hit protein, Data version, NCBI Accession and species
information is provided including the "Protein Count" which indicates the number of protein records per
species in the NCBI protein database, taxonomic information (See Primary Report Settings section
below in user guide for more detail on "Taxonomic Group" versus "Filtered Taxonomic Group"
columns), and species names. Also included are the NCBI protein accession, protein name, BLASTp
bitscore (describes overall quality of the alignment, See NCBI BLASTp tutorials), and percent similarity
([hit bitscore/query bitscore]* 100). If the hit protein has been identified as an ortholog candidate (using
reciprocal best hit blast method), it will be noted with a "Y" for yes or if not an ortholog candidate, a "N",
for no. If the hit protein is predicted to be susceptible according to the susceptibility cut-off criteria, that
will also be noted with a "Y" for yes or alternatively an "N" for no. The date the analysis was completed
is also identified. The data also includes a column describing the number of ortholog candidates identified
using the reciprocal best hit BLAST method. The susceptibility cut-off is also listed in a column. The cut-
off is determined through identifying local minimums in the density plot of the percent similarity values
for the primary report data set and evaluation of ortholog candidates. Additionally, there is a column that
identifies if the species is a Eukaryote noted with a "Y" for yes or alternatively an "N" for no. Links out
to the NCBI Protein Database, NCBI Taxonomy Database, and ECOTOX Knowledgebase (specific to the
data row) are embedded in the Level 1 data table for "NCBI Accession," "Species Tax ID," "Scientific
Name," "Protein Name", and "ECOTOX" columns. (See Search, View, and Download Data Tables
section of user guide for more information).
20

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Default highlights identify partial protein sequences, sequences with a bitscore higher than the query
sequence and therefore percent similarity greater than 100% (commonly synthetic constructs), and when
zero ortholog candidates are identified (in this case a user should consider a different query sequence or
check the full report). Please see Susceptibility Cutoff Box for Level 1 section of user guide for details
when no orthologs are detected. Additionally, the default setting for the report shows only eukaryote data
if a eukaryote is selected as the query protein, excluding prokaryote data from the table with the "Show
Only Eukaryotcs" checkbox checked. To view prokaryote data, deselect this checkbox. If a prokaryote is
selected as the query protein, the default setting will include both eukaryote and prokaryote data and the
"Show Only Eukaryotcs" checkbox will not be selected. To limit the data to eukaryotes only, the user
would check the "Show Only Eukaryotcs" checkbox.
Columns in left side of table:
IPI Partial Hit Protein Sequence	®
i#iS Primary Report
View LeveM Summary Report. ©
Q Full Report	B	™	—	1
S3 Show Only Eukaryotes
Level 1 Data - Primary
The following links exit the site StatfB	Download Current Level 1 Report Settings ©
Search: Enter keyword

Data
Version
NCBI Accession 0
Protein
Count 0
Species
Tax ID C
Taxonomic
Group 0
Filtered
Taxonomic
Group C
Scientific Name 0
Common Name C
Protein
a
5
NP 000116 2
1797018
9606
Mammalia
Mammalia
Homo saDiens
Human
estrogen reo

5
XP 030868114 1
52117
9595
Mammalia
Mammalia
Gorilla gorilla gorilla
Western lowland gorilla
estrogen rece
¦
5
ABY64717.1
1678
9593
Mammalia
Mammalia
Gorilla gorilla
Western gorilla
estrogen rt

5
XP 003311596 1
171813
9598
Mammalia
Mammalia
Pan troglodytes
Chimpanzee
estrogen rece
~
5
XP 003811544 1
51211
959Z
Mammalia
Mammalia
Pan Danisms
Pygmy chimpanzee
estrogen rece

5
ABY647181
1601
9600
Mammalia
Mammalia
Ponoo ovomaeus
Bornean orangutan
estrogen re
a
5
XP 0028175381
141140
9601
Mammalia
Mammalia
Ponoo abelii
Sumatran orangutan
estrogen rece

5
XP 005552209 1
97432
9541
Mammalia
Mammalia
M3C9C3 fascicules
Crab-eating macaque
PREDICTED; estrog

5
XP 011922091 1
66421
9531
Mammalia
Mammalia
Cercocebus atvs
Sooty mangabey
PREDICTED: estroo

5
XP 014992596 1
175464
9544
Mammalia
Mammalia
Macgca mglgtta
Rhesus monkey
estrogen rece
(1 of 102)	1 2 3 4 5 6 7 8 9 10 ~ 10^ Download Table:
Columns in right side of table:
Level 1 Data - Primary
The following links exit the site	Download Current Level 1 Report Settings ©
Search: Enter keyword ®
Protein Name 0
BLASTp
Bitscore 0
Ortholog
Candidate 0
Ortholog
Count
Cut-off 0
Percent
Similarity 0
Susceptibility
Prediction 0
Analysis Completed :
Eukaryote
ECOTOX
estrogen receotor ISoform1
1241 87
Y
410
34 43
100 00
Y
2020 08 28 10 00:50
Y

estrpgen receptor isoform X2
1229.54
Y
410
34.43
99 01
Y
2020 08 2810:00:50
Y

estrogen receDtor aloha
1229 54
Y
410
34.43
99 01
Y
2020 08 28 10:00:50
Y

estrogen receptpr isofprm X2
1229.54
Y
410
34 43
9901
Y
2020 08 28 10:00:50
Y

estrogen receDtor isoform X2
1228.00
Y
410
34.43
98 88
Y
2020 08 28 10:00:50
Y

estiogen receptor gipjia
1227 62
Y
410
34.43
98.85
Y
2020 08 28 10:00:50
Y

gstrogen receptor isoform X2
1227.62
Y
410
34.43
98 85
Y
2020 08 28 10:00:50
Y

PREDICTED: estrogen receptor isoform X1
1227.23
Y
410
34 43
98.82
Y
2020 08 28 10:00:50
Y

PREDICTED: estrogen receDtor isoform X2
1227.23
Y
410
34 43
98 82
Y
2020 08 28 10:00:50
Y

estrogen receptor isoform X2
1227 23
Y
410
34 43
98.82
Y
2020 08 28 10:00:50
Y

(1 of 102)	123456789 10 " • |10-| Download Table:
21

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1: Primary Report Settings
Default settings
The "Primary Report Settings" drop down allows the user to view default settings on the table below and
manipulate certain settings. "Primary Report Settings" are only available on the "Primary Report"
display, not the "Full Report." The default settings show data for hits whose E-value are < 0.01 and have
been identified to have > 1 domain in common with the query sequence. The default setting for the
"Sorted by Taxonomic Group" is "class," therefore the "Filtered Taxonomic Group" column in the table
is set to identify and report the taxonomic lineage of "class" from the NCBI Taxonomy Database.
However, if class is not identified in the NCBI Taxonomic Hierarchy associated with the hit accession,
then the algorithm will report the next available Taxonomic Group moving from class to subclass, to
superorder, to order, to suborder, to superfamily, to family, to subfamily, to genus. Finally, the
susceptibility predictions are set by using species read across. (Please view Documentation Section of
the User Guide for details on Read-Across settings). Briefly, Species Read-Across is used to set the
susceptibility prediction, where all ortholog candidates are Susceptible = Y; all species listed above the
susceptibility cut-off are Susceptible = Y; all species below the cut-off from the same taxonomic group of
one or more species above the cut-off are Susceptible = Y; and those below the cut-off that are not
ortholog candidates and do not belong to a taxonomic group above the cut-off are Susceptible = N.
Primary Report Settings
E-value:	0.01	I ©
Sorted by
Taxonomic
Group:
Common
Domains:
Species Read-
Across:
Changing Default Settings
The "E-value" and "Common Domains" settings can be manipulated by the user by entering the desired
E-value or number of Common Domains in the respective text boxes and clicking "Update Report." The
table and data visualization will automatically be updated after a few seconds. The user may choose to
change the level of the taxonomic hierarchy that is used for the susceptibility prediction. From the "Sorted
by Taxonomic Group" dropdown the user may choose to display a different taxonomic group in the
"Filtered Taxonomic Group" column of the data table.
Update
Report
Use Default Settings
22

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Primary Report Settings
OrTI

E-value: 0.01
| O
Sorted by
Taxonomic


order
©
Group:
class

Common ^
Domains:
subclass
superorder
o
Species Read-
Across:


suborder
o
Update
Report
superfamily
family
subfamily
8
Visualiz;
genus
0 *
If the user chooses "order' for example, the "Filtered Taxonomic Group" column in the data table will
report the taxonomic lineage of "order"' from the NCBI Taxonomy Database and all species read-across
for the susceptibility prediction will be based on order instead of class. The data visualization will also
update. As described previously, if order is not identified in the NCBI Taxonomic Hierarchy associated
with the hit accession, then the algorithm will report the next available taxonomic group moving from
suborder, to superfamily, to family, to subfamily, to genus. Upon selecting the taxonomic group from the
dropdown and clicking "Update Report," the Level 1 Data for the Primary Report will update to the
selected taxonomic level.

B
Partial Hit Protein Sequence
©


<§) Primary Report
m
mmmmamsm






View Level 1 Summary Report

Full Report







Show Only Eukaryotes



Level 1 Data - Primary
The following links e
at the sit
¦¦


Download Current Level 1 Report Settings
Search: Enter keyword **
Version NCB. Accession 0
Protein Species Taxonomic
Count 0 Tax ID 0 Group 0
Filtered
Taxonomic
Scientific Name S
Common Name 0 Protein Name 0

4
NP_00Q11(?.2
1265506
9606
Mammalia
Mammalia
Homo sapiens
Human
estrogen receptor isofgrm 1

4
XP 003311596.1
178219
9528
Mammalia
Mammalia
Pan troglodytes
Chimpanzee
estrogen receptor isoform X2
4
ABY64717.1
2023
2522
Mammalia
Mammalia
Gorilla gorilla
Western gorilla
estrogen receptor alpha

4
XP 018884801 1
47068
9595
Mammalia
Mammalia
Gorilla gorilla oorilla
Western lowland gorilla
PREDICTED estrooen receptor isoform X?

4
XP 003811544.1
51891
9597
Mammalia
Mammalia
Pan oaniscus
Pygmy chimpanzee
estrooen receptor isoform X?
4
XP 0028175381
145798
9601
Mammalia
Mammalia
Ponoo abelii
Sumatran orangutan
estrogen receptor isoform X2

4
ABY64718.1
1718
9600
Mammalia
Mammalia
Ponao pvamaeus
Bomean orangutan
estrooen receptor aloha

4
*P 0119220911
66748
2521
Mammalia
Mammalia
Cercocebus atvs
Sooty mangabey


4
XP 011751932.1
69122
2545
Mammalia
Mammalia
Mammalia
Mammalia
Macaca fascicularis
Macaca nemestrina
Piq-tailed macaque estrogen receDtorisoform X2
(1 of 94)	1 23456789'10 " 1' 10» Download Table: '
23

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level One Summary Report
The user can view a summary of the data for each taxonomic group by clicking on the "View Level 1
Summary Report" button. The data includes, number of species, mean percent similarity, median percent
similarity and susceptibility prediction. This data can also be downloaded.
o
View Level 1 Summary Report ©
Push Level 1 To DS Report ©
Level One Summary Report
Taxonomic Group 0
Filtered Taxonomic
Group 0
Number of Mean Percent Median Percent Susceptibility
Species 0 Similarity 0 Similarity 0 Prediction 0
Mammalia
Mammalia
195
73.47
87.25
Y
Testudines
Testudines
13
67.66
79.16
Y
Aves
Aves
122
67.00
78.40
Y
Crocodylia
Crocodylia
7
69.23
78.29
Y
Lepidosauria
Lepidosauria
25
63.76
74.50
Y
Amphibia
Amphibia
25
48.39
64.98
Y
Chondrichthyes
Chondrichthyes
8
41.11
39.30
Y
Dipnoi
Dipnoi
3
43.11
57.01
Y
Coelacanthimorpha
Coelacanthimorpha
2
46.56
46.56
Y
Actinopteri
Actinopteri
204
36.19
40.90
Y
(1 of 6) 1 2 3 4 5 6 10- Download Table: ^—
The user may also choose to turn species read-across off, by using the "Species Read-Across"' drop-down
and selecting "No" and clicking "Update Report." When "No" is selected, the susceptibility predictions
will only be "Y" in the table below if Percent Similarity is above the Cut-off or if the hit is identified as
an Grtholog Candidate, yes or ""Y." Any hit below the cut-off will yield a susceptibility prediction of no
or "N."
Primary Report Settings

oS

E-value: 10.01


~ ®
Sorted by Taxonomic
Group:




order
"
-j ®




Common Domains: 1



m °





Species Read-Across:

No
-
o


Yes

Update Report Use
ngsg

it- ~r—V.—\
24
Partial Hit Protein Sequence
$ Primary Report	—
Full Report
•s Show Only Eukarvotes

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The user can select the "Full Report" on the "Level 1" page, which includes the same information as the
"Primary Report" and additional information pertaining to the alignment of the protein sequence using
BLASTp. Additional information includes the number of amino acid residues in the sequence (Hit
Length), the number of exact matching amino acids between the hit and query sequence (Identity), the
number of exact and similar matches in amino acids between the hit and the query sequence (Positives),
the expect value (E-value) describing the number of different alignments expected to occur in the
database search by chance, and the conserved domain count. The conserved domain count identifies all
domains associated with the query protein in the NCBI conserved domains database (Specific hits, Non-
specific hits, Superfamilies, and Multi-domains; See NCBI conserved domains database for details).
SeqAPASS algorithms record the query sequence coverage of each curated domain and compares that
coverage to that of the hit sequence. If the hit sequence covers the curated domain greater than or equal to
the query sequence, then the domain is considered a common domain between the hit and query. The
number of common domains comparing each hit sequence to the query sequence are summed and
reported. This column displays "0" when the hit protein and query protein do not have any common
domains. (See Search, View, and Download Data Tables section of user guide for more information).
The user can also download the currently applied report settings by selecting the "Download Current
Level 1 Report Settings." This csv allows the user to track which settings were used or changed by the
user when downloading a data table.

B
Partial Hit Protein Sequence
©


^ Primary Report
m



View Level 1 Summary Report _ ©
® Full Report
h
m
Show Only Eukaryotes


Push Level 1 To DS Report . ©
Level 1 Data • Full
The following links exit the site 1 EXTT	Download Current Level 1 Report Settings J 0
Search:|Enter keyword |

Hit Length 0
Identity C Positives 0
BLASTd Orthol°0
Evalue 5 BitscoreC Candidate

Common Percent
Domain Count 0 Similarity 0
Susceptibility
Prediction C
Analysis Completed 0
Eukaryote ECOTQX

595
595
595
0.000E0
1241.87
Y
410
34.43
78
100.00
Y
2020 07 21 16:58
13
Y


595
590
592
0.000E0
1229.54
Y
410
34.43
75
99.01
Y
2020 07 21 16:58
13
Y


595
590
592
0.000E0
1229.54
Y
410
34.43
75
99.01
Y
2020 07 21 16:58
13
Y


595
595
595
590
592
0.000E0
1229.54
Y
410
34.43
75
99.01
Y
2020 07 21 16:58
13
Y

589
592
0.000E0
1228.00
Y
410
34.43
75
98.88

2020 07 21 16:58
13
Y
589
591
0.000E0
1227.62
Y
410
34.43
75
98.85

2020 07 21 16:58:13
Y


595
589
591
0.000E0
1227.62
Y
410
34.43
75
98.85
Y
2020 07 21 16:58:13
Y


595
588
592
0.000E0
1227.23
Y
410
34.43
75
98 82
Y
2020 07 21 16:58:13
Y


595
588
592
0.000E0
1227.23
Y
410
34.43
75
98.82
*
2020 07 21 16:58:13
Y


595
588
592
0.000E0
1227.23
Y
410
34.43
75
98.82

2020 07 21 16:58:13
Y




(1 of 104)
1 2 3 4 5 6 7 8
9 10 ** 10 ~ Download Table:


Note: SeqAPASS v2.0 and newer parse the BLASTp query and hit accessions to identify all the
species/accessions from identical proteins. Therefore, if a hit sequence represents multiple species, all
species with the identical sequence will be found in the data tables for Level 1. To determine which
sequence/species was identified from BLASTp as a hit and which sequence/species was parsed from the
identical sequence, view the "Full Report" for Level, column "Identical Protein," where "N" is indicative
of the original hit sequence and "Y" is the parsed sequence.
25

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
A
B
1	Level 1 Report Settings
2
3
4	Analysis TimeStamp
5	SeqAPASS version
6	Query Species
7	Query Protein
8	Query Accession
9	Ortholog Count
10	LI Cutoff
11	LI Cutoff Value
12	E-value
2019 05 16 11:04:08
3.2
Homo sapiens
estrogen receptor isoform 1
NP 000116.2
Default
33.93221513
0.01
13	Sorted by Taxonomic Group CLASS
14	Common Domains
1
15	Species Read Across
16	Show Only Eukaryotes
17	Report
Y
Checked
Primary
When downloading the current Level 1 report settings, the following information will be present in the
csv fde. If the user decides to change the default settings, the csv file can be utilized for quick information
if the SeqAPASS page is no longer open.
Susceptibility Cutoff Box for Level 1
The susceptibility prediction is determined by identifying ortholog candidates, sequences above a defined
susceptibility cutoff, or by identifying those species below the susceptibility cut-off from an organism
class above the susceptibility cutoff. The default susceptibility cut-off is set by plotting the distribution of
percent similarities calculated for each hit protein. From this plot, the critical points are identified, and the
local minimums and maximums reported. Using the ortholog candidate data, a susceptibility cut-off is
automatically determined by identifying the first ortholog candidate at an equal or higher percent
similarity than the first local minimum. The user can view this graph by clicking the "Cutoff Settings"
button in the "Susceptibility Cut-off' box, which will open a new tab in the web browser. The "Select
Cut-Off' drop-down can allow the user to select between the default cut-off, the 2nd local minimum or a
user defined cut-off. The 2nd susceptibility cut-off is identified in the density plot by finding the 1st
ortholog candidate at an equal or higher percent similarity to that of the 2nd local minimum. Upon
selecting the User defined cut-off from the dropdown, the user can view and closely examine the density
plot and manipulate the cut-off. The "Enter Cut-off' text box becomes active and the user can enter a
number 1-100. To update the cut-off in the Level 1 data report and/or close the cutoff tab and return to the
Level 1 page, click "Update Cut-off' button.
26

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Susceptibility Cut-off
a



Irt
125
I!;



ib 29 #. 4o sb « >6 no *9 109
Ptrcml Similarity ^

Cutoff Settings

This will open in a separate tab
Note: The user should have a justification for changing the susceptibility cut-off, either based on
evaluation of Ortholog cutoffs in the data visualization or from empirical evidence.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level 1 Susceptibility Cut-off: Primary Report
Local minimums are identified and susceptibility cut-off is set based on % similarity of next ortholog candidate. Use update cut-off button to go back to Level 1 data.
SeqAPASS ID: 1290	Query Accession: NP 000116.2	Ortholog Count: 348	Protein and Taxonomy Data: 02/28/2019
Query Species: Homo sapiens	BLAST Version: 2.8.1
Query Protein: estrogen receptor isoform 1	Software Version: 3 2
Select Cut-off: | Default: Identify 1st local minimum and find next ortholog candidate	- Enter Clit-Off:	©
Density Plot
5.5
5.0
4.5
4.0
Cut-off Susceptibility	3.5
#	Cut-off
1	33.93	^ 3.0
2	51.64
3	61.97	o 2.5
4	71.68
5	85.11	2.0
6	96.53
1.5
1.0
0.5
0.0
P * £	^
Percent Similarity
All potential susceptibility cut-offs generated by the data distribution and ortholog candidate
identification are reported in the table with columns "Cut-off #" and "Susceptibility Cut-off. The user
can use these numbers to define a cut-off if empincal evidence suggests that the "Default"' or "2nd
minimum" are not supported.
Cut-off Based on Ortholog Candidates







¦	Density
¦	Local Max
Hi Local Min

A






11
I	1





¦ Inflection Point











\
1









1
1



















1
i
	1
\








#
/
\
\







\ \ r
y
\


r



\


J


27

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
No Orthologs Detected
Level 1 Query Protein Information
Hit proteins are identified for the following query protein Use the main button to go back to the SeqAPASS Reports list
SeqAPASS ID: 1292	Query Accession: NP 001317544 1 wwt	Ortholog Count: 0
Query Species: Homo sapiens
Query Protein: peroxisome proliferator-acfivated receptor gamma isoform 3
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2 8.1
Software Version: 3.2
Susceptibility Cut-off
» I
Primary Report Settings
Visualization
©i*j I
Refresh Level 2 and 3 ru
® Primary Report
Q Full Report
Partial Hit Protein Sequence
@3 Show Only Eukaiyotes
View Level 1 Summary Report
Level 1 Data - Primary
The following links exit the site 3
Search: Enter keyword n

Data
Version
NCBI Accession o

Species
Tax ID i
Taxonomic
Filtered
Taxonomic
Scientific Name C
Common Name 0
Prote







Homo sapiens
Humw








Pan troolodvtes







Mammaita
Gorilla ooli lla gtgilla
¦yvejiswi (owutra awo*
PREDICTED Peroxisome o







Pan oaniscus
ctimrpxrees
peroxisome arohferator-activ






Mammalia
l/rsus aictos horribilts
Brown beat
oe.oxisomenrontera.or-achv







Nomascus laucooenvs
Mffw »I'Hfe-cheeliid gibtw
PREDICTED peioxisome p







Ponoo abets
Sumatra^ orangutan
— NWM.






Zatoohuscalifamanus
Cairfowa sea "ion
petoxtsome proiiferalor-actiM






vswnmafl. |
Cotobus anaoleBsis palUatus
Ang'olen catofcu:
PREDICTED peroxisome p







Mandnllus leucophaeus
~nit
PREDICTED peioxisome p
(1 of 82)
123456789 10
10 * Download Table:
If no orthologs are detected from reciprocal best hit blast analysis, the "Ortholog Count" will be "0" at the
top of the "Level 1 Query Protein Information" page. The cutoff will be set by the local minimums only,
therefore the susceptibility prediction will NOT consider ortholog candidates. It is recommended that the
user checks the full report for ortholog candidates or identifies a different query sequence for the
susceptibility predictions. Here, the susceptibility predictions will be highlighted in dark pink in the Level
1 data table to indicate that 0 orthologs were detected and the susceptibility cutoff was determined from
plotting the distribution of percent similarities and identifying the local minimums.
Level 1 Query Protein Information
Hit proteins are identified for the following query protein. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1299	Query Accession: APQ40848.1 ¦¦	Ortholog Count: 0
Query Species: Poa annua
Query Protein: PsbA, partial (plastid)
Note; De-select the "Show Only Eukaryotes" checkbox to see if prokaryotes were identified as orthologs.
Protein and Taxonomy Data:
02/28/2019
BLAST Version: 2.8.1
Software Version: 3.2
28

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
By clicking on the "Cutoff Settings" button when no orthologs are detected, the "Cut-off #" and
"Susceptibility Cut-off columns will report only the local minimum values.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)	Log out
Level 1 Susceptibility Cut-off
Local minimurns are identified and susceptibility cut-off is set based on % similarity of next ortholog candidate. Use update cut-off button to go baoc to Level 1 data.
SeqAPASS ID: 19 Query Accession: CAA74340.1 Ortholog Count 0	NCBI Data: 02/01/2015
Query Species: Bubal us bubalis
Query Protein: insulin receptor
Select Cut-off. [ Default: Identify 1st local minimum and find next ortholog candidaf~~| Enter Cut-off:
Density Plot
Cut-off Based on Ortholog Candidates
Density
Local Max
Inflection Point
Cut-off
Susceptibility
Cut-off
Percent iimSarity
From the "Level 1" page the user can return to the list of completed SeqAPASS runs by clicking the
"Main" button on the upper left-hand side of the "Level 1 Query Protein Information" page.
Main Level 1
Level 1 Query Protein Information
Hit protetns are identified for the followng query protein. Use the mam button to go back to the Se<}APASS Reports list
SeqAPASS ID: 1300	Query Accession: NP 301230448.1 IBM	Ortholog Count: 57
Query Species: Homo sapiens
Query Protein: estrogen-related receptor gamma isofbrm 2
Susceptibility Cut-off
Cutoff Settings
This will open in a seoarate'
Primary Report Settings
0.01
	| o
Trl o
I o
Sorted by laxonomic Group	| clas
Common Oomains:	|l
Species Read-Across:	l*« id
Update Report	Use Default Settings
Visualize Data This will open in a
|-i
Protein and Taxonomy Data: 02*28/2018
BLAST Version: 2.8.1
Software Version: 4.0
Level 2 Query Domain
Functional Domains
-Select Domain -
\em] O
~li I ©
View Level 2 Data
le Domain to View
-Select Completeo Domain -

View Level 2 Data
Refresh Level 2 and 3 ru
~ Reference Explorer
Level 3 Query Amino Acid Residues
NCBi Protein Database 1S9
Select Terrolaie Sequence
Attditional Compansons (optional i
Enter Level 3 Run Name
Choose Taxonomic Groupfsj
A* Groups
Use table below tc- select sequences
Request
Residue Run
Choose Query to View
[ -Select Level 3 Run Nsne -
View Level 3 Data
View Combined Report
Combine Level 3 Data

IE O
29

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 2: Functional Domain(s) Alignment
In the "View SeqAPASS Reports" tab. on the "Level 1 Query Protein Information" page, there is a
"Level 2" box for comparing hit domains to the query domain. In the "Level 2" dropdown box, there is a
link out to the "NCBI Conserved Domain Database"' for the query protein of interest. Below this link the
user will find a drop-down containing functional domains associated with the query sequence for
comparison across species.
Level 1 Query Protein Information
Hit proteins are identified for the following query protein Use the main button to go back to the SeqAPASS Reports fist.
SeqAPASS ID: 1290	Query Accession: NP 000116.2 'ffiBP	Ortholog Count: 348
Query Species: Homo sapiens
Query Protein: estrogen receptor isoform 1
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
Software Version: 3 2
Susceptibility Cut-off
m
Primary Report Settings
0 ~ ' |
Visualization
mm
m
Level 2 Query Domain
NCBI Conserved Domain
Functional Domains
-Select Domain -
View Level 2 Data
Choose Domain to View
-Select Completed Domain -
View Level 2 Data
• o
Refresh Level 2 and 3 runs
In the drop-down box (below the words "Functional Domains") the user will find all domains associated
with the query protein listed in the "NCBI Conserved Domains Database''. To compare a domain from the
query protein to domains of the hit proteins, the user will use the drop-down to highlight a domain and
click the "Request Domain Run" button.
Note; Domains in the drop-down are listed with the first amino acid residue position that aligns with the
NCBI curated domain in parenthesis, followed by the NCBI domain Accession, domain name, and
description.
Level 2
OH
Level 2 Query Domain
NCBI Conserved Domain Database	gt
mm	"
Functional Domains
-Select Domain -	*	0
Leve
L
~ Reference Explo
Level 3 Query Am
NCBI Protein Datab
Q-alart Tamrvlata
-Select Domain -
(243) cd06157, NR_LBD, The ligand binding domain of nuclear rec
(105) cd06916, NR_DBD_like, DNA-binding domain of nuclear rece
(245) cd06929, NR_LBD_F1, Ligand-blnding domain of nuclear rec
(242) cd06930, NR_LBD_F2( Ligand-binding domain of nuclear rec
(215) cd06931, NR_LBD_HNF4_like, The ligand binding domain of.
Note: Hie user can also use the text box on the top of the drop-down to search the "Functional Domain"'
list in the drop-down.
30

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
It is recommended that the user click on the "NCBI Conserved Domains Database"
http://www.ncbi.nhn nih.gov/cdd/ link to identify which domains are "Specific hits" in the NCBI
Conserved Domains Database. On the NCBI page, the user can scroll over the graphical representation of
the domains associated with the query sequence to highlight and identify the Accession associated with
domain "Speci fic hits." The example below shows the user hovering over the NR LBD ER domain with
the computer
mouse.


% NCBI

Conserved _ „
Domains j
1

s LitJr*
HOME SEARCH GUIDE
NewSearch |
Structure Home | 3D Macromolecular Structures
Conserved Domains | Pubchem ] BioSystems |
Conserved domains on [gi|6282i794|ref|NP_oooii6.2|]
estrogen receptor isoform 1 [Homo sapiens]
View Concise Results ~ (2)
Graphical summary
ID Zoom to residue level I
show extra options »
Query seq»
Specific hits
Superf anilies
List of domain hits
Ki Name Accession
H NR_LBD_ER





QVHlLESAWLELlML SO
90	100	110	120	130	140	150	160
gi 62821794 390 GL\^SHEH?SKLLFAPH"LLI DM Q-GKCVE SMVEIFTMLLAT SSR FPMMtfLQGEE FV'CLKS 11LLNSGVYT F133TLKSL 469
Cdd:cd06949 81 GL\^SMEHPGKLLFAPDI JJ.DRHQGSCVEGMVEI FDHLL&IASRFRELQLQREEYVCLKM ILLNSSVYTF— -LLESL 157
170	180	190	200	210	220	230
	*	|	*	|	*	|	*	|	*	|	*	|	*	|	*...
gi 62S21794 470 EEKDHIHRVLDKITDTLIHLMAKASLTLQi^QHQRU-QLLLILSHIRHMSHKGMEHLYSHKCKtTWFLYDTJ .T FMLDAH 547
After identifying the domain(s) of interest and the corresponding starting residue and domain Accession,
the user can return to the SeqAPASS tool, scroll to the domain of interest in the drop-down. If that
domain has not been previously run by the user, the "Request Domain Run" button will become active
and the user can click it to submit the domain query.
Level 2
Level 2 Query Domain
NCBI Conserved Domain Database a
E»T
Functional Domains
| (243) cd06157. NR_LBD The ligand tj] O
Request Domain Run
1 V' L 12 D ta



Choose Domain to View

-Select Completed Domain -
• 1©
View Level 2 Data
31

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
When user clicks the "Request Domain Run"' button, the following message will appear if the runs has
been submitted successfully.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Lo
gout
| Level 2 Run Requested
Status queued -
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings



When sequence comparisons have completed for the selected functional domain, the domain will be
present in the "View Level 2 Data" drop-down. The drop-clown is not automatically populated with the
completed domain run. The user must click on the "Refresh Level 2 and 3 runs" button to update the
page for the newly completed domain to present itself in the Choose Domain to View drop-down.
To view a completed Level 2 domain, highlight the domain of interest in the drop-down box and click the
"View Level 2 Data" button. This will bring the user to the "Level 2" data page for the selected query
protein/domain.
"Note; The user can also use the text box on the top of the drop-down to search the "Completed Domain"
list.
Level 2 Query Domain
-Select Completed Domain -
View Level 2 Data
Level 2
Level 2
©H
-Select Completed Domain -
(316) cd06931, NR_LBD_HNF4_like, The ligand binding domain of h<
(310) cd06949, NR_LBD_ER, Ligand binding domain of Estrogen rec
NCBI Conserved Domain Database EXff) 0
Functional Domains
-Select Domain -
Choose Domain to View
View Level 2 Data
Level 2 Query Domain
View Level 2 Data Page
The "Level 2 Query Domain Information" box contains the SeqAPASS Run ID, Query Accession,
Ortholog Count (# of hits identified as ortholog candidates to the query species protein sequence), NCBI
Data updates ("Protein and Taxonomy Data:" and "CDD Data:" display the dates that NCBI databases
were downloaded and incorporated into the SeqAPASS database; "BLAST version:" and "Software
Version:" displays the version being used by the SeqAPASS tool for the selected data), Query Species,
Query Domain (with link out to NCBI domain page), Query Protein name.
32

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)	Logj2Ui
Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports
Logged in as: Blatz,Donovan
Level 2 Query Domain Information
Ortholog Count: 348
Hit domains are identified for the following query domain. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1290	Query Accession: NP 000116.2 ebb
Query Species: Homo sapiens
Query Domain: (310) cd06949 lExm NR_LBD_ER , Ligand binding domain of Estrogen receptor, which are activated by the hormone 17beta-estradiol (estrogen)
Query Protein: estrogen receptor isoform 1
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
CDD Data: 12/08/2016
Software Version: 3 2
Susceptibility Cut-off
This will open in a separate tab
Visualization
Primary Report Settings © 3 1
E-value:
|10.0
©
Sorted by Taxonomic Group:
I class Q
©
Species Read-Across:

o
Update Report
Use Default Settings

Visualize Data | This will open in a separate ta
The default "Level 2" table is the "Primary Report", which includes query domain information in the first
row below the column titles, followed by hit domains whose sequences aligned with the selected query
domain. The hit domains are ordered from the highest to lowest percent similarity (Maximum percent
similarity =100%). For each hit domain, Data Version, NCBI Accession and species information is
provided, including the "Protein Count" which indicates the number of protein records per species in the
NCBI protein database, taxonomic information, and species names. Also included are the NCBI accession
for the query protein, query protein name, Domain Type, BLASTP bitscore (describes overall quality of
the alignment, See NCBI BLASTp tutorials), and Domain percent similarity ([hit bitscore/query
bitscore]* 100). If the hit protein has been identified as an ortholog candidate (using reciprocal best hit
BLAST method), it will be noted with a "Y" for yes or if not an ortholog candidate, a ""N". for no.
A prediction of susceptibility is displayed based on the susceptibility cut-off, identified with a "Y" for yes
or an "N" for no. The date/time the analysis was completed is also identified. (See Search, View, and
Download Data Tables section of user guide for more information). There is a column that identifies if
the species is a eukaryote, noted with a "Y" for yes or alternatively a ""N" for no if the hit is a prokaryote.
Additionally, a column with a link to the U.S. EPA ECOTOX Knowledgebase
(https://cfpub.epa.gov/ecotox/help.cfm) is available when there are empirical toxicity data curated for
the species identified in the row. This link allows the user to view available single chemical toxicity data
from the literature for specific species.
Default highlights identify partial protein sequences, sequences with a bitscore higher than the query
domain and therefore percent similarity greater than 100% (commonly synthetic constructs), and when
zero ortholog candidates are identified (in this case a user should consider a different query sequence).
Additionally, the default setting for the report shows only eukaryote data, excluding prokaryote data from
the table with the "Show Only Eukaryotes" checkbox checked. To view prokaryote data, deselect this
checkbox.
33

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
0 Partial Hit Protein Sequence
® Primary Report
^ Full Report
m
View Level 2 Summary Report
f3 Show Only Eukaryotes
Level 2 Data - Primary
The following links exit the site IfiSiEl	Download Current Level 2 Report Settings
Search: [inter keyword
Data
Version
NCBI Accession 0
Protein
Count C
Species
Tax ID c
Taxonomic
Group 5
Filtered
Taxonomic
Group 0
Scientific Name 0
Common Name c
Protein Name 5
4
NP 000116.2
1265506
9806 Mammalia
Mammalia
Homo sapiens
Human
estrooen receptor isoform 1
4
ABY64717 1
2023
9593 Mammalia
Mammalia
Gorilla qorilla
Western gonlla
estroaen receptor aloha
4
XP 002817538.1
145798
9601 Mammalia
Mammalia
Ponao abelii
Sumatran orangutan
estrooen receotor isoform X2
4
XP 011852190.1
38580
9568 MammaOa
Mammalia
Mandrillus leucoohaeus
Drill
PREDICTED, estrogen receptor isoform X2
4
XP 023061905.1
54518
591936 Mammalia
Mammalia
Piliocolobus tephrosceles
Ugandan red Colobus
estroaen receptor isoform X2
4
XP 018884801 1
47068
9595 Mammalia
Mammalia
Gorilla aorilla aorilla
Western lowland gorilla
PREDICTED estroaen receptor isoform X2
4
XP 008005788.1
62315
60711
Mammalia
Mammalia
ChlorQcetjgs sabaeus
Green monkey
PREDICTED estroaen receotor isoform X2
4
XP 011751932.1
69122
9545 Mammalia
Mammalia
Macaca nemestnna
Pig-tailed macaque
estroaen receptor isoform X2
4
ABY64719.1
712
9580 Mammalia
Mammalia
Hvtobates lar
Common gibbon
estroaen receptor aloha
4
NP 001158059 1
68224
9555 Mammalia
Mammalia
Papio anubis
Olive baboon
estrogen receptor
(1 of 95)	1 2 3 4 5 6 7 8 9 10 " 1' 10' Download Table: ;
Level Two Summary Report
The user can view a summary of the data for each taxonomic group by clicking on the "View Level 2
Summary Report". The data includes, number of species, mean percent similarity, median percent
similarity and susceptibility prediction. This data table can also be downloaded.
Level Two Summary Report
Filtered
Taxonomic Group _ "
r Taxonomic Group
Number of Mean Percent Median
Species 0 Similarity C _. .rce™
^ ' Similarity 0
Susceptibility
Prediction 0
Mammalia
Mammalia
176
80.60
97.63
Y
Aves
Aves
96
83.78
95.73
Y
Crocodylia
Crocodylia
7
84.98
95.97
Y
Testudines
Testudines
9
86.30
94.55
Y
Lepidosauria
Lepidosauria
22
71.14
92.21
Y
Amphibia
Amphibia
22
60.74
81.03
Y
Chondrichthyes
Chondrichthyes
7
55.68
67.59
Y
Coelacanthiformes
Coelacanthiformes
2
70.43
70.43
Y
Actinopteri
Actinopteri
179
51.66
62.13
Y
Ceratodontimorpha
Ceratodontimorpha
3
53.96
71.15
Y

(1 of 6) 12 3
4 5 6 ¦* 10' Download Table:

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 2: Primary Report Settings
Default settings
The "Primary Report Settings" box allows the user to view default settings on the table below and
manipulate certain settings. The "Primary Report Settings" box is only available on the "Primary Report"
display. The default settings show data for hits whose E-value are <10. The default setting for the "Sorted
by Taxonomic Group" is "class," therefore the "Filtered Taxonomic Group" column in the table is set to
identify and report the taxonomic lineage of "class" from the NCBI Taxonomy Database. However, if
class is not identified in the NCBI Taxonomic Hierarchy associated with the hit accession, then the
algorithm will report the next available Taxonomic Group moving from class to subclass, to superorder,
to order, to suborder, to superfamily, to family, to subfamily, to genus. Finally, the susceptibility
predictions are set by using Species Read-Across. (Please view SeqAPASS Documentation Section of
the User Guide for details on Read-Across settings). Briefly, "Species Read-Across" is used to set the
susceptibility prediction, where all ortholog candidates are Susceptible = Y; all species listed above the
susceptibility cut-off are Susceptible = Y; all species below the cut-off from the same taxonomic group of
one or more species above the cut-off are Susceptible = Y; and those below the cut-off that are not
ortholog candidates and do not belong to a taxonomic group above the cut-off are Susceptible = N.
Primary Report Settings 0 -


E-value:
10.0 0


Sorted by Taxonomic Group:
class ^T ©


Species Read-Across:
Yes - O


Update Report
Use Default Settings

Changing Default Settings
The user may choose to change the level of the taxonomic hierarchy that is used for the susceptibility
prediction. From the "Sorted by Taxonomic Group" dropdown the user may choose to display a different
taxonomic group in the "Filtered Taxonomic Group" column of the data table.
Primary Report Settings
E-value:
Sorted by Taxonomic Group:
Species Read-Across:
10.0
Update Report
order
class
subclass
superorder

©
o
suborder
superfamily
family
subfamily
genus
35

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
If the user chooses "order' for example, the "Filtered Taxonomic Group"' column in the data table will
report the taxonomic lineage of "order" from the NCBI Taxonomy Database and all species read-across
for the susceptibility prediction will be based on order instead of class. As described previously, if order
is not identified in the NCBI Taxonomic Hierarchy associated with the hit accession, then the algorithm
will report the next available Taxonomic Group moving from suborder, to superfamily, to family, to
subfamily, to genus. Upon selecting the Taxonomic Group from the dropdown and clicking "Update
Report," the "Level 2" data for the Primary Report will update to the selected taxonomic level. The user
can also download the currently applied report settings by selecting the "Download Current Level 2
Report Settings". This csv file allows the user to track which settings were used or changed by the user
when downloading a data table.
Level 2 Data - Primary
The following links exit the site (IXIT
Search: Enter keyword ®
Data
Version
NCBI Accession 0
Protein
Count 0
Species Taxonomic Taxonomic
Tax ID 0 Group 0 T~'=
Scientific Name 0 Common Name 0
4
NP 000116.2
1265506
9606
Mammalia
Primates
Homo sapiens
Human
4
XP 014992596.1
88400
9544
Mammalia
Primates
Macaca mulatta
Rhesus monkey
4
ABY64721.1
931
9534
Mammalia
Primates
Chlorocebus aethiops
Grivet
4
XP 003255939.1
38964
61853
Mammalia
Primates
Nomascus leucoqenys
Northern white-cheeked gibbon
4
XP 025240309.1
52618
9565
Mammalia
Primates
Theropithecus qelada
Gelada
4
XP 003811544.1
51891
9597
Mammalia
Primates
Pan paniscus
Pygmy chimpanzee
4
XP 011922091 1
66748
9531
Mammalia
Primates
Cercocebus atys
Sooty mangabey
4
ABY64717.1
2023
9593
Mammalia
Primates
Gorilla aorilla
Western gorilla
4
XP 002817538.1
145798
9601
Mammalia
Primates
Ponqo abelii
Sumatran orangutan
4
XP 011852190 1
38580
9568
Mammalia
Primates
Mandrillus leucophaeus
Drill
(1 of 95)	BE 10 * Download Table:
The user may also choose to turn species read across off, by using the "Species Read-Across" drop-down
and selecting "No" and clicking "Update Report". When "No" is selected, the susceptibility predictions
will only be "Y" in the table below if Percent Similarity is above the Cut-off or if the hit is identified as
an Ortholog Candidate, yes or "Y." Any hit below the cut-off will yield a susceptibility prediction of no
or "N".
Primary Report Settings
E-value:
Sorted by Taxonomi c Grou p:
Species Read-Across:
order
No | ^
Yes
17*
1*1
Update Report
ult Settings




The user can select the "Full Report" on the "Level 2" data page, which includes the same information as
the "Primary Report" and additional information pertaining to the alignment of the protein sequence using
BLASTp and domain information. Additional information includes the NCBI PSSM ID, NCBI Domain
ID, Domain Name, number of amino acid residues in the sequence (Hit Length), the number of exact
matching amino acids between the hit and query sequence (Identity), the number of exact and similar
36

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
(similar side-chain substitutions) matches in amino acids between the hit and the query sequence
(Positives), and the expect value (E-value) describing the number of different alignments expected to
occur in the database search by chance. (See Search, View, and Download Data Tables section of user
guide for more infonnation).
Level 2 Data ¦ Full
The following Inte ad teste MB	Download Current Level 2 Report Settings
Search: Enter keyword
0
n
Domain Name
Hit Length ; Identity 0
Positive ;
Evalue 0
BLASTp
Bitscore 0
Ortholog
Candidate $
Ortholog
Count
Cut-off i
Percent Susceptibility
Similarity 0 Prediction 0
Analysis Completed C
Eukaryote
EcoTox

NR_LBD_ER
238 238
238
1.621E-179
487 26

348
4150
100.00 I Y
201908 23 09:47:27
Y


NR_LBD_ER
238 237
238
9.910E-179
485.34
Y
348
41.50
99.60 Y
201908 23 09:47:27
Y


NR LBD ER
238
237
238
9.910E-179
485.34
Y
348
4150
9960 Y
201908 23 09:47:27
Y


NR LBD ER
238
237
238
9.910E-179
485 34
Y
348
41.50
99.60 Y
201908 23 09:47:27
Y


NR LBD ER
238
237
238
9.910E-179
485.34
Y
348
4150
99.60 I Y
201908 23 09:47:27
Y

NR LBD ER
238
237
238
9.910E-179
485.34
v
348
4150
99.60 Y
2019 08 23 09:47:27
Y


NR LBD ER
238
237
238
9.910E-179
485 34
*
348
41.50
99.60 Y
201908 23 09:47:27
Y


NR LBD ER
238
237
238
9.910E-179
485.34
Y
348
41.50
99.60 Y
201908 23 09:47:27
Y


NR_LBD_ER
238
237
238
9.910E-179
485 34
Y
348
41.50
9960 Y
2019082309:47:27
Y


NR_LBD_ER
238
237
238
9.910E-179
485.34
V
348
41.50
99.60 Y
201908 23 09:47:27
Y

(1 of 95)	1 2 3 4 5 6 7 8 9 10 " : 10' Download Table:
Note: SeqAPASS v2.0 and newer parse the BLASTp query and hit accessions to identify all the
species/accessions from identical proteins. Therefore, if a hit sequence represents multiple species, all
species with the identical sequence will be found in the data tables for Level 2. To determine which
sequence/species was identified from BLASTp as a hit and which sequence/species was parsed from the
identical sequence, view the "Full Report" for Level, column "Identical Protein,'' where "N" is indicative
of the original hit sequence and "Y" is the parsed sequence.

A
B
1
Level 2 Report Settings

2


3


4
Analysis TimeStamp
2019 05 1611:04:08
5
SeqAPASS version
3.2
6
Query Species
Homo sapiens
7
Query Protein
estrogen receptor isoform 1
8
Query Domain
(310) cd06949, NR_LBD_ER,
Ligand binding domain of
Estrogen receptor, which are
activated by the hormone
17beta-estradiol (estrogen)
9
Query Accession
NP_000116.2
10
Ortholog Count
348
11
L2 Cutoff
Default
12
L2 Cutoff Value
41.5003807
13
E-value
10
14
Sorted by Taxonomic Group
CLASS
15
Species Read Across
Y
16
Show Only Eukaryotes
Checked
17
Report
Primary
When downloading the "Current Level 2 Report Settings'', the following information will be present in
the csv. If the user decides to change the default settings, the csv can be utilized for quick infonnation if
the SeqAPASS page is no longer open.
37

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Susceptibility Cutoff Box for Level 2
The susceptibility prediction is set by identifying ortholog candidates, sequences above a defined
susceptibility cutoff, or by identifying those species below the susceptibility cut-off from an organism
class above the susceptibility cutoff. The default susceptibility cut-off is set by plotting the distribution of
percent similarities calculated for each hit protein. From this plot, the critical points are identified, and the
local minimums and maximums reported. Using the ortholog candidate data, a susceptibility cut-off is
automatically determined by identifying the first ortholog candidate at an equal or higher percent
similarity than the first local minimum percent similarity. The user can view this graph by clicking the
"View Cutoff button in the "Susceptibility Cut-off' box. Radio buttons located to the right of the
graphical display indicate which Cut-off has been applied for the evaluation of susceptibility in the report.
These radio buttons can be selected to change the cut-off in the table to the 2nd local minimum, where the
2nd local minimum is identified in the density plot and the first ortholog candidate at an equal or higher
percent similarity than the second local minimum percent similarity is used to set the cut-off. Or the user
can define the local minimum by clicking on the "User Defined'' radio button. Alternatively, the user can
view the closely examine the density plot and manipulate the cut-off by clicking the "View Cutoff'
button.
Level 2 Query Domain Information
Hit domains are identified for the following query domain. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1290	Query Accession: NP 000116 2 fxit	Ortholog Count: 348	Protein and Taxonomy Data: 02/28/2019
Query Species: Homo sapiens	BLAST Version: 2 8.1
Query Domain: (310) cd06949 exit , NR_LBD_ER, Ligand binding domain of Estrogen receptor, which are activated by the hormone 17beta-estradiol (estrogen)	CDD Data: 12/08/2016
Query Protein: estrogen receptor isoform 1	Software Version: 3.2
Susceptibility Cut-off




I:






View Cutoff
This will open in a separate tab
Visualization
Upon clicking "View Cutoff button, a new page is displayed with a drop-down that allows the user to set
the susceptibility cut-off using the first local minimum and the identified ortholog candidate, the second
local minimum and the identified ortholog candidate, or by the "User defined cut-off' (where the user
selects the cutoff). To update the cut-off in the Level 2 data report and/or return to the Level 2 page, click
"Update Cut-off' button.
Note: The user should have direct empirical evidence that species above the user defined cutoff are
susceptible via the protein of interest, or that the species below the user defined cutoff are not susceptible.
38

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Upon selecting the User defined cut-off from the dropdown, the "Enter Cut-off' text box becomes active
and the user can enter a number 1-100.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level 2 Susceptibility Cut-off: Primary Report
Local minimums are identified and susceptibility cut-off is se! based on % similarity of next ortholog candidate. Use update cut-off button to go back to Level 2 data.
SeaAPASS ID: 1290 Querv Accession: NP 000116.2 Ortholoa Count: 348
Query Species: Homo sapiens
Querv Domain: (310) cd06949 NR LBD ER Liaand bindma domain of Estroaen receotor. which are activated bv the hormone 17beta-estradiol (estroaen)
Query Protein: estrogen receptor isoform 1
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
CDD Data: 12/08/2016
Software Version: 3.2
Select Cut-Off: ! Default: Identify 1st local minimum and find next ortholog candidate [ ~ Enter Cut-Off:
©
Update Cut-off

Density Plot
Cut-off Based on Ortholog Candidates
¦	Density
¦	Local Max
¦	Local Min
¦	Inflection Point
Susceptibility
Cut-off
¦£ <*> ^
Percent Similarity
All potential susceptibility cut-offs generated by the data distribution and ortholog candidate
identification are reported in the table with columns "Cut-off W and "Susceptibility Cut-off'. The user
can use these numbers to define a cut-off if empirical evidence suggests that the "Default" or "2
minimum"' are not supported.
39

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
No Orthologs Detected
Level 2 Query Domain information
Hit domains are identified for the following query domain. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1326	Query Accession: NP 001317544.1 fiaflfi	Ortholog Count: 0
Query Species: Homo sapiens
Query Domain: (110) cd06965 S8r, , NR_DBD_Ppar, DNA-binding domain of peroxisome proliferator-activated receptors (PPAR) is composed of two C4-type zinc fingers
Query Protein: peroxisome proliferator-activated receptor gamma isoform 3
Susceptibility Cut-off
Si |



E






View Cutoff

This will open in a separate ta

Visualization
o+3|
Primary Report
Q Full Report
Partial Hit Protein Sequence
Show Only Eukaryotes
View Level 2 Summary Report
Level 2 Data - Primary
The following links exit the site wm
Search: Enter keyword ®
Data
Version
NCBI Accession 0
Protein
Count C
Species
Tax ID i
Taxonomic
Group c
Filtered
Taxonomic
Group 5
Scientific Name J
|
Common Name e
4
NP 001317544.1
1265506
9606
MatnmaliB
Msmmalra HI
Homo saoiens
Human
4
XP_008150376.1
50340
29078
Mammate
Mammals
EDtesmis fuscus
ag Crown oai
i
XP 019283665.1
58782
9691
l.tamiralia
MarnmflllB
Panthera oardus
Lsapani
A
XP 021047523 1
362S7
10093
Mammalia
Mamrnatl'a
Mus oaharl
Stwuw mouse
If no orthologs are detected from reciprocal best hit blast analysis, the "Ortholog Count" will be "0" at the
top of the "Level 2 Query Protein Information" page. The cutoff will be set by the local minimum s only,
therefore the susceptibility prediction will NOT consider ortholog candidates. It is recommended that the
user checks the full report for Ortholog candidates or identifies a different query sequence for the
susceptibility predictions. Here, the susceptibility predictions will be highlighted in dark pink in the Level
2 data table to indicate that 0 orthologs were detected and the susceptibility cutoff was determined from
plotting the distribution of percent similarities and identifying the local minimums.
Main
Level 1
Level 2
Level 2 Query Domain Information
Hit domains are identified for the following query domain. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1321 Query Accession: BAF57671.1 HHP Ortholog Count: 0
Query Species: Mus caroli
Query Domain: (24) CHL00070 c&SR . petB . cytochrome b6
Query Protein: cytochrome b, partial (mitochondrion)
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
CDD Data: 12/08/2016
Software Version: 4.0


Susceptibility Cut-off
[~;

Primary Report Settings
m






Visualization
o ~









By clicking on the "View Cutoff' button when no orthologs are detected, the "Cut-off #" and
"Susceptibility Cut-off columns will report only the local minimum values.
40

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level 2 Susceptibility Cut-off: Primary Report
Local mlnimums are identified and susceptibility cut-off is set based on % similarity of next ortholog candidate. Use update cut-off button to go back to Level 2 data.
SeqAPASS ID: 1326	Query Accession: NP 0013175441	Ortholog Count: 0
Query Species: Homo sapiens
Query Domain: (110) cd06965 , NR_DBD_Ppar, DNA-binding domain of peroxisome protiferator-activated receptors (PPAR) is composed of two C4-type zinc fingers
Query Protein: peroxisome proliferator-activated receptor gamma isoform 3
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
CDD Data: 12/08/2016
Software Version: 4.0
Select Cut-off: I Default: Identify 1st local minimum and find next ortholog candidate
Enter Cut-off:
Update Cut-off
Density Plot
Cut-off Based on Ortholog Candidates
Cut-off
Susceptibility
#
Cut-off







¦	Density
¦	Local Max
¦	Local Min
















— Inflection J
Point
















A



















I

















I \
































Percent Similarity
The user can return to the "Level 2" data page by clicking the "Update Cut-off button or exiting the tab.
Level 1 and Level 2: Data Visualization
From the Level 1 or Level 2-results page SeqAPASS users can access an interactive data visualization for
both the "Primary Report" or "Full Report" by clicking on the "Visualize Data" button.
Example of Level 1 page:
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
I SeqAPASS Reports
Logged in as: Blatz,Donovan
Main Level 1
Level 1 Query Protein Information
SeqAPASS Reports lis'
Susceptibility Cut-off
¦ 1
2 I
Primary Report Settings
Visualization
41

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Example of Level 2 page:
Home
Request SeqAPASS Run SeqAPASS Run Status
View SeqAPASS Reports
Settings

SeqAPASS Reports
Version 4.0
Logged in as: Blatz,Donovan

Main
Level 1 Level 2




Level 2 Query Domain Information
Hit domains are identified for the following query domain. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1290	Query Accession: NP 000116.2 Bar	Ortholog Count: 348
Query Species: Homo sapiens
Query Domain: (310) cd06949 Ban . NR_LBD_ER, Ligand binding domain of Estrogen receptor, which are activated by the hormone 17beta-estradiol (estrogen)
Query Protein: estrogen receptor isoform 1
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2 8.1
CDD Data: 12/08/2016
Software Version: 3.2
Susceptibility Cut-off
View Cutoff
This will open in a separate tab


Primary Report Settings
0-
E-value:
110.0
~~I o

Sorted by Taxonomic Group:
[class
—H o

Species Read-Across:

i ©

Update Report
Use Default Settings

Visualize Data This will open in a separate tab.
The data visualization will then open in a new web browser tab, one for Level 1 and a different one for
Level 2. The visualization will display for the report selected by the user on the Level 1 or Level 2 report
page and be identified as "Level One Visualization - Primary Report" or "Level One Visualization - Full
Report" and "Level Two Visualization - Primary Report" or "Level Two Visualization - Full Report."
Note: One report type at a time, either "Primary Report" or "Full Report," can be displayed in the
visualization tab for Level 1 and Level 2. Therefore, if the user is viewing the "Level One Visualization -
Primary Report" page and returns to the Level 1 results page and clicks the radio button for "Full Report,"
the data visualization tab will update to "Level One Visualization - Full Report."
42

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level I and 2 Information Page
The initial page that opens upon clicking the "Visualize Data" button provides the respective level query
protein information, including SeqAPASS ID, query protein, query species, ortholog count, and query
accession information. A link out to the NCBI protein database page corresponding to the queried
accession is available by clicking the query accession. Information on the visualization is provided in the
"Visualization Info" text box. To view the data visualization boxplots click the BoxPlot icon.
Level One Visualization - Primary Report
Level 1 Query Protein InfofmaOon
Select to Open Information or Date Visualization
omi
Info
Visualization Info
The fciowng data visualization is available for Level 1 and Level 2 data:
• BoxPlot ¦ Boxplots depicting SeqAPASS data Illustrating the percent similarity across species compared to the query species examining Ihe primary arnno acts sequences (Level 1 Visualization I or
functional domain (Level 2 Visualization).
° The open circle, o, represents the query species and closed circles, •, represent the species with Ihe highest percent similarity within me specified taxonomic group.
° The top and bottom ol each box rejyesent the 75th and 25th percentiles respect veay. The top arxi bcttom whiskers extend to 1.5 times the interquartile range,
o The irean and median values for each laxonomic group are represented by horizontal thick and Ihin black lines on the box, respectively,
o The dashed line indicates the cut-off for susceptibility predictions (based on ortholog analysis).
Level Two Visualization - Primary Report
Select to Open Information or Data Visualization
BoxPlot - Bexplsts deptctng SesAPASS data
furefcral dwnain (Lewi 2 Visualization).
Level 3 Visualization Information Text
• Heat Map - Heat Maps depicting SeqAPASS data illustrating the comparison between the
template species and the user selected species allows for a summary of species" protein sequence
comparisons.
o The predicted susceptibility between species compared to the template species and the
user selected amino acids is denoted with either a (Y)—yes, or (N)—no. The color green
is associated with "yes" similar susceptibility to the template and red is associated with
"no" not similar susceptibility to the template,
o Similarities between amino acids are determined by comparing the species-specific
amino acids against the template species. The amino acids can be either a Total Match,
Partial Match, or Not a Match,
o The user can add or remove five settings (Susceptibility Prediction, Susceptibility
Prediction Text, Alignment Prediction Heat Map, Amino Acid, and Amino Acid
Position) to allow for a customizable Heat Map.
o Selecting one of the Optional Selections will highlight the species names that are
associated with that selection.
43

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1 and 2 BoxPlot Page - Controls
Upon clicking the "BoxPlot" icon on either Level 1 or Level 2 Visualization Information pages, a box for
the boxplot ""Controls" and a box for the interactive boxplot will open, respectively.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level Two Visualization - Primary Report
Level 1 Query Protein Information
SeqAPASS ID: 1290	Query Accession: NP 000116 2
Query Species: Homo sapiens
Qrtholog Count: 348
Query Domain: (310) cd06949 , NR_LBD_ER . Ligand binding domain of Estrogen receptor, which are activated by lhe hormone 17beta-es!radiol (estrogen)
Select to Open Information or Data Visualization
Ollli
Controls
Taxonomic
H Groups
Select
A Species
U for
Legend
© Optional Selections:
Mammalia	|| Crocodyiia * 1 Aves * Testudines	* |: Leptdosauria	* j i Amphibia « Chondrichthyes " I Coetacanthifomnes	* 1! Actinopteri * I Ceratodontimorpha ¦
Cladistia » , [ Myxintformes » 11 Petromyzontiformes »| [ BiValvia " Branchiostomidae « 11 Gastropoda » Errtefopneusta " Priapulimorpha * Ascidiacea «i
Cephalopoda *|i Potychaeta * Arachnida «j. Malacostraca « Insects » Collembola « Hexanauplia « LilTopsida " j Pflfdiophora * Lingutata »j"*']
f) Common Name
, Scientific Name
Q Group by Common Name
Ortholog
Threatened
Endangered
Common Model
Candidates
Species:
Species:
Organisms:
y
~
~
~
Download BoxPlot...	Open Size Controls...
E
|r
60-
1
CO 50-
c
CD
O 40-
(5
Q.

•"'*<	B
' ifl. # •
i |l| ;
I till!r
S ? O c -
§ t o f :
V, I E i
L < "I £ < -s o s
L 1, | I ° i
; ^ "2 ^ -2 ^ '
Taxon
44

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Manipulating Taxonomic Groups on x-axis
The boxplot controls allow the user to edit the taxonomic groups that are displayed on the x-axis by
clicking on the ""X" for the Taxonomic Group name (e.g., Aves). This action removes the selected group
from the x-axis. To the right of the "Taxonomic Groups" controls box is a drop-down that allows the user
to remove or add back taxonomic groups to the x-axis of the boxplot graphic, by deselecting or selecting
checkboxes in the dropdown. Similarly, unwanted taxonomic groups may be removed directly from the
boxplot by hovering the cursor over the taxonomic groups listed along the x-axis. The user will notice
that the selection arrow changes to a black arrow with a red "x' next to it; clicking the taxonomic group
will then remove it from the boxplot and the "Taxonomic Groups" controls box. The user can delete
multiple species by pressing CTRL and either clicking individual species or slowly dragging across
multiple species. Additionally, that taxonomic group will have the checkbox deselected in the
"Taxonomic Groups'' controls box drop-down list.
BoxPlot
Taxonomic
Groups:
(x-axis
labels)
Select
Species
for
Legend:
Species
Legend
Options:
Enteropneusta * Gastropoda * Bivalvia * Branchiostomidae * | Cephalopoda * Priapulidae *
Mammalia	Testudines * Aves * Crocodylia * Lepidosauria * Amphibia
Chondrichthyes
Ceratodontimorpha * Coelacanthiformes * Actinopteri * Cladistia * Petromyzontiformes * Myxiniformes
Lingulata * Polychaeta * Arachnida * Malacostraca »P| (Mnsecta
Enopla * I j Maxillopoda * 1
Branchiopoda * !! Echinoidea * Merostomata * Clitellata * Liliopsida * Eutardigrada * Monogononta *
Rhopaluridae * Anthozoa * Asteroidea * Appendicularia * Hydr<
Peripatopsidae *§ [jTricladida
Diplopoda * I Anopla
Chromadorea * Enoplea *
* Scyphozoa * Trichoplax *
m Common Name
I i Scientific Name
EP1 Group by Common Name
Ortholog
Threatened
Endangered
Common Model
Candidates
Spedes
Species
Organisms

a
a
Q
Download BoxPlot...	Open Size Controls...
Boxplot
II
I'l

1 i ! 1111! 311
L 5. ~ £ %
i i? -5 1 75 :
114 ?
i &
fl
-	Arctic lamprey
-	Sea lamprey
-	Southern lamprey
Taxon
45

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Customize Boxplot Legend
The user may customize the "Boxplot" by adding a legend that will pinpoint species of interest on the
boxplot. Upon clicking the drop-down for "Select Species for Legend" in the controls box the user may
search in the text box for specific species to display in the boxplot legend. Upon identifying a species
from the drop-down menu and selecting the checkbox the species name will be placed in the boxplot
legend and a corresponding data point will be produced on the graph. The default settings display the
species common name both in the "Select Species for Legend" dropdown and on the boxplot. However, if
the species scientific name is desired, the user can select the radio button for '"Scientific Name" in the
controls box for "Species Legend Options." This action will change the drop-down menu and species in
the legend to display the species scientific name.
Note: The database will take a moment to update the list upon changing between "Common Name" and
"Scientific Name."
Mammalia	Testudines » || Aves
* j Amphibia
* Coelacanthiformes *
Taxonomic
q Groups:
(x-axis
labels)
Select
0 Species
Actinopteri * Cladistia « Petromyzontiformes * Myxiniformes * Enteropneusta * Gastropoda * Bivalvia * Branchiostomidae
Cephalopoda * Priapulimorpna * Ascidiacea * Lingulata * Polychaeta * Arachnida * Ma!accstraca * Insecta * Collembola *
Hexanauplia * Enopla * Branchiopoda * Echinoidea * Merostomata * Clitellata * Liliopsida * Eutardigrada * Monogononta *
Holothuroidea * Rhopaluridae * Anthozoa * Asteroidea * Appendicularia * Polyplacophora ¦ if Hydro;
Chilopoda * Cubozoa » Udeonychophora * Rhabditophora * Chromadorea * Enoplea * Trematoda * Cestoda * Diplopoda *
Pilidiophor
3
Aardvark
Abalones
Acorn worms
Adelie penguin
African clawed frog
African cotton leafworm
J Group by Common Name
Endangered
Common Model
Species
Organisms
~
a
Download BoxPlot...
Open Size Controls...
_CU 60-
I
(/) 50
CD
CL
®	Abalones	~ Chimpanzee
0	American beaver © Chum salmon
¦	Anna's hummingbird
A	Bactrian camel
I

¦ 1111 i | i I f ! 111
IIIs litJ If!
5 i°£
3 3 1	«
Sjo •¦5 05-35
J 2. £
° < 9 "5. OT
2 x K 9-f
o -5 % E "J I O
Taxon
46

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Change Species Display on Plot
Multiple scientific names can be represented by only one common name (e.g., Common name: Teleost
fishes; corresponding scientific names: Spinibarbus denticulatus, Sinocvclocheilus rhinocerous,
Sinocyclocheilus grcthami, Sinocvclocheilus anshuiensis, Gobiocypris rants, Thamnacomis
septentrionalis). Therefore, if a species common name that represents multiple species was used to create
the legend, and the user decides to instead select "Scientific Name," by default the boxplot legend will
change to display multiple scientific names that representing the individual common name and each
scientific name will be represented by a unique color/shape point on the plot. However, if the user selects
the checkbox "Group by Common Name" in the "Species Legend Options" control box, then the
scientific names that are represented by one common name will all display the same color/shape point on
the plot.
The user has the option of removing selected species from the legend either by removing them directly
from the "Select Species for Legend" drop-down box or by hovering the mouse directly over the species
name in the legend. The mouse will change to a black arrow with a red 'x' next to it. Clicking the name
while this arrow is displayed will remove the species from the legend and from the control box.
Mammalia Testudines * Aves * Crecodylia * [ Lepidosauria « Amphibia * Chondrichthyes «I Ceratodontimorpha » Coelacanthiformes « I Actinopteri «1 j Cladistia * ¦
Taxonomic Petromyzontiformes * | Myxiniformes * Enteropneusta *1 Gastropoda * Bivalvia Eg| [j Branchiostomidae » Cephalopoda «l! Priapulimorpha * Ascidiacea * Lingulata * ' Polychaeta »i
~	Arachnida * Malacostraca * Insecta * Collembola * Hexanauplia * Enopla * I Branchiopoda * Echinoidea " Merostomata * Clitellata * Liliopsida PETutardigi'dUj"^I]
labels)	MonogonOnta * Holothuroidea * Rhopaluridae * Anthozoa * Asteroidea *1 r Appendicularia *11 Polyplacophora * | Hydrozoa »|1 ScyphoZoa I" Trichoplacidae * Chilopoda * Cubozoa *
Udeonychophora * | Rhabditophora * Chromadorea * Enoplea * i Trematoda * ! Cestoda * i Diplopoda *1 Pilidiophora *
Select
<> Species
f0f nd
© Optional Selections"
Download BoxPlot...
Haliotis diversicolor * Castor canadensis * Calypte anna *
Camelus bactrianus » Pan troglodytes * Oncorhynchus keta * Gy
mnogyps californianus * /
Aplysia californica *
Sinocyclocheilus anshuiensis * Sinocyclocheilus rhinocerous *
Sinocyclocheilus grahami * Spinibarbus denticulatus * Gobiocypris r
arus «

. . Common Name
<$§> Scientific Name
| Group by Common Name
Ortholog
Candidates
Threatened
Endangered
Common Model Organisms


H

Open Size Controls...
®	Hallotls dverslcolor
0	Castor canadensis
¦	Calypte anna
A	Camelus baolrianus
~	Pan troglodytes
©	Onccrhynchus keta
0	Gymnogyps calitornianus
¦	Aplysia californica
~	Sinocyclocheilus anshuiensis
~	Sinocyclocheilus rhinocerous
~	Sinocyclccheilus granami
~	Spinibarbus denticulatus
~	Gobiocypris rarus
"i-*
iii-
5 ^ >,"§_£•
a 8
ifSillli:
I! 1 S11 i £,
i j § i
c o '= 9 o = "S. 2
47

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Customize the Legend to Display Species Groups of Interest
In the "Optional Selections" controls box, the user has the option of displaying "Ortholog Candidates,"
"Threatened Species," "Endangered Species," or "Common Model Organisms." Upon selecting one of
the checkboxes, red data points corresponding to species will be displayed on the boxplot. By hovering
the mouse over a single red point, a pop-up box will appear with the corresponding species name,
taxonomic ID, query protein, and percent similarity.
Note: The user can select to display either species common name or scientific name in the hover over
information box by selecting from the "Species Legend Options."
If the user selects either "Threatened Species" or "Endangered Species," clicking on an individual red dot
will open a new web browser tab and link to the corresponding species page on th US Fish and Wildlife
Service's Environmental Conservation Online System (USFWS, ECOS; e.g.,)
(https: //ecos. fws. gov/ccpO/profilc/spccic sProfile ? sld= 1506).
0 Optional Selections:
Ortholog Candidates: Threatened Species: Endangered Species:
~	B	0
Common Model Organisms:
a
Download BoxPlot... Open Size Controls...
Boxplot
• Endangered Species
"l~
03 60-
E
C0 50-
C
V
O 40j
L_
0
Q_
Rainbow trout (taxid: 8022)
Estrogen receptor isoform X3
64.51% similarity

I*
*


o O ^
c p to
= o
11
o £ e
O — <1> c
^ o
if
— a — -a o
3> <= £ © o
c m r? - I
™ £ 31 •
ro o ro o o
- "o ™ Q_ 3
1 2- I §" & O
° ° O ^ TO
iz r. £ o. .
CL Q. ro O 3
Taxon
48

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
BoxPlot Controls Widget for Bar Width, Zoom and Pan
By clicking the "Open Size Controls" button, a "BoxPlot Controls" widget opens that allows the user to
adjust the size of the bars on the boxplot by increasing or decreasing the "Bar Width" using the up and
down arrows. The minimum and maximum size for bars are 6 and 60, respectively. To reset the bar width
on the boxplot to default size, click the "Reset" button to the right of the "Bar Width" adjustment box in
the "BoxPlot Controls" box. The user can also Zoom and Pan the boxplot by toggling the on /off button
under the "Zoom" heading. The user can then zoom in or out by clicking the up or down arrows or
entering a number in the text box and clicking enter. To reset the zoom on the boxplot to default size,
click the "Reset" button to the right of the "Zoom" adjustment box in the "BoxPlot Controls" widget.
The pan option is available when the "Zoom and Pan" option is toggled to the "on" position, which
allows the user to click on the boxplot and drag the plot around the screen to reposition. To reset all
BoxPlot Controls to default settings click the "Reset All" button.
Note: Upon exiting out of the BoxPlot Controls widget, the Zoom and Pan options are automatically
turned off.
BoxPlot Controls
Bar Width


18^j
Reset



Zoom


125:
Reset

Zoom & Pan
on
Reset All
Download BoxPlot Widget
To download the boxplot, click "Download BoxPlot" button in the controls box. A "Download Boxplot"
Widget will pop up. It will be necessary to specify which type of file (SVG, PNG, or JPG,) to
downloaded by clicking on the desired radio button for "Image Type." The user may customize the
resolution of the boxplot for PNG and JPG files prior to download by altering the "Width" and "Height"
of the BoxPlot. To change "Width" or "Height," enter the desired number in the text boxes. Click
"Download Image" button to download the file. To close the "Download Boxplot" widget, click the "x"
on the top right of the widget.
Download Boxplot
Image 0 #

Type: SVG PNG JPG
Width: 1,236

Height: 755

Download Image


49

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Hover-over Features in the BoxPlot
By hovering over a taxonomic group name on the x-axis of the boxplot, an information box will pop-up
listing the top three species in order by highest percent similarity. If only one or two species are
represented in the taxonomic group, then only those species will be displayed. Hovering the mouse over
any of the species in the boxplot, that is present in the legend, will generate a pop-up box with the
corresponding species name, taxonomic ID, query protein, and percent similarity. The susceptiblity cut-
off is displayed in a pop-up text box upon hovering over the dashed horizontal cut-offline.
Summary Table for Species in a Specific Taxonomic Group
By clicking on a box representing a taxonomic group in the boxplot a table will pop-up providing
summary information for that particular group. The table header will provide summary statistics (i.e.,
mean and median percent similarity), including the Taxonomic Group name, number of species
represented in the box, the overall susceptiblity prediciton for the selected taxonomic group. Data table
includes protein and species information along with metrics for evaluated protein similarity and
predicting suseptiblity. Also inlcuded in the table are columns indicating if a species belongs to a certain
group of interest (e.g., Threatened Species; Endangered Species, Model Organism). Table can be
downloaded by clicking on the icon for excel or csv file.
Interactive Visualization with Level 1 Data Page and Level 2 Data Page
The data visualization is programmed to update with changes made to the Level 1 Data page and Level 1
Data page, respectively. Therefore, if the user updates the Susceptibility Cut-off (See user guide section
Susceptibility Cutoff Box for Level 1 and Susceptibility Cutoff Box for Level 2) to the "Second Local
Minimum" or "User Defined Cut-off," the previously opened data visualization boxplot tab will update
the cut-off accordingly. Similarly, the user modifies the Primary Report Settings (See user guide section
Level 1: Primary Report Settings and Level 2: Primary Report Settings), the data visualization will
update accordingly.
Note: If the user updates the "Primary Report Settings" for "Sorted by Taxonomic Group" the boxplot
will update to display the new taxonomic group selection that is present in the "Filtered Taxonomic
Group" column in the data table. The user should be aware that manipulating the "Sorted by Taxonomic
Group" to a different level in the taxonomic lineage (e.g., from class to order; from class to genus) adds a
larger number of taxonomic groups to the x-axis. Therefore, the plot may require greater user
manipulation using the "BoxPlot Controls" to view the data.
50

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3: Individual Amino Acid Residue Alignment
In the "View SeqAPASS Reports" tab, on the "Level 1 Query Protein Information" page, there is a
"Level 3" dropdown for setting up the query for comparing individual amino acid residues to a template
sequence. It is anticipated that the choice of template sequence and residues that are selected to align will
be derived from the published literature in most cases. Publications evaluating homology models, protein
crystal structures, pesticide field resistance, or utilizing site-directed mutagenesis are a few examples of
the types of studies that may contain such information to guide a Level 3 SeqAPASS evaluation.
Level 3
— Reference Explorer
Additional	I
Names:	I	
Add Protein Name
estrogen receptor isofbrm 1
Remove Selected Protein	Restore Default Proteins
Generate Google Scholar Link
Level 3 Query Amino Acid Residues
NCBI Protein Database fBcrr
Select Template Sequence
O
Additional Comparisons (optional)
I	lo
NCBI COBALT (EXIT
Enter Level 3 Run Name
I*
NCBI Taxonomy Database extt
Choose Taxonomic Group(s)
| All Groups	| w | 0
Use table below to select sequences
0 species selected
Request Residue Run
View Single Report
Choose Query to View
[ -Select Level 3 Run Name - ~p|Q
View Level 3 Data
View Combined Report
Combine Level 3 Data
Relevant literature containing these data can be identified using the SeqAPASS "Reference Explorer."
The user can search for literature with the protein(s) of interest with an auto-populated search term that is
integrated into a predefined Boolean string and generate a Google Scholar link that will take them to
scientific articles containing their protein(s).
— Reference Explorer
Additional
Names:
IE
Add Protein Name
estrogen receptor isoform 1
Remove
Selected
Protein
Restore
Default
Proteins
Generate Google Scholar Link
51

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The user can modify the Boolean search string by adding text to the "Additional Names"' text box and
clicking the "Add Protein Name" button. By selecting a name that is currently in the text box and clicking
the "Remove Selected Protein" button, the user can delete names from the text box and therefore these
names will not be included in the Boolean string for the Google Scholar search.
— — Reference Explorer
Additional I """T""""]
Names:	I
Add Protein Name
estrogen receptor isoform 1
oestrogen
Remove

Restore
Selected

Default
Protein

Proteins
Generate Google Scholar Link
When satisfied with the protein names to be included in the Boolean search string, the user will select the
"Generate Google Scholar Link" button. A pop-up will appear displaying the Boolean sting to be
searched in Google Scholar. The user can continue to modify the Boolean string by clicking in the text
and adding additional information. Hie Boolean string can be copied and pasted elsewhere by the user by
clicking the "Copy to Clipboard" button. The user can also choose to use the generated Boolean string to
search Google Scholar. To do so the user will select the "Search Google Scholar" button.
Google Scholar
https://scholar qooqle.CQm/scholar?hl=en&as sdt=0%2C34&q=(estroqen receptor isoform 1 )AND("site-directed mutagenesis"
OR "molecular docking" OR "docking analysis" OR "docking simulations" OR "x-ray crystallography" OR "crystal structure"
OR "homology modeling" OR "protein structure" OR "protein binding" OR "molecular model" OR "binding" OR "field
resistance" OR "amino acid" OR "amino acid residues" OR "mutation" OR "mutations" OR "molecular dynamics" OR
"transcriptional activation" OR "3D-pharmacophore" OR "pharmacophore" OR "structure-based" OR "chemo-bioinformatics"
OR "3D-stuctures" OR "3D-QSAR")
Search Google Scholar	Copy to Clipboard
Upon selecting the "Search Google Scholar" button, a new tab will be generated in the browser for
Google Scholar that contains the Boolean string in the search with publications and articles that matched
the SeqAPASS generated Boolean sting. The literature displayed by Google Scholar for the user should
be evaluated to identify appropriate articles for determining Level 3 template sequences and critical
individual amino acids for comparisons across species.
© Seqence Alignment to Predict Ai X ^ (estrogen receptor isoform 1)AP X +	—OX
C O A scholar.google.com/scholar?hl=en&as_sdt=0%2C348tq=(estrogen%20receptor%20isoform%201)AND("site-directed%20mutagenesis"%200..,	Q 0 0 • i
I
Go gle Scholar (estrogen receptor isoform 1 )AND("site-directed mutagenesis" OR "molecular I
Articles
About 18,500 results (0.16 sec)
My profile if My library £
Any lime
Since 2019
Since 2018
Since 2015
Custom range...
Sort by relevance
Sort by date
•/ include patents
•/ include citations
Role of Pit-1 in the gene expression of growth hormone, prolactin, and
thyrotropin
LE Cohen, FE Wondisford, S Radovick - Endocrinology and metabolism 1996 - Elsevier
90 The ERE is distinct from but may interact cooperatively with, the other hormone response
elements 1 binding sites and the ER are required for distal enhancer activation by estradiol in
vitro ... Other Pit-1 binding sites also contribute to the estrogen response of the Prl gene, so ...
~	00 Cited by 187 Related articles All 6 versions Web of Science: 108 S>S>
[html] Understanding the selectivity of genistein for human estrogen receptor-3
using X-ray crystallography and computational methods
ES Manas, ZB Xu, RJ Unwalla, WS Somers - Structure, 2004 - Elsevier
up the possibility of targeting other tissues while avoiding certain classical estrogenic effects both
known to enhance ligand-dependent transcriptional activation of the estrogen receptor and they
GEN, 17-j3 estradiol (E2), diethylstilbestrol (DES), and daidzein (see Figure 1) were
~	00 Cited by 176 Related articles All 7 versions Web of Science: 125 S>J>
[html] sciencedirect.com
52

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
In the "Level 3" box, there is a link out to the "NCBI Protein Database" for identifying the template
sequence of interest. Below this link the user will find a text box where the user can enter an NCBI
Protein Accession with the version number (e.g., NP_000116.2) or a FASTA formatted sequence (e.g., <
>gi|62821794|ref|NP_000116.2| estrogen receptor isoform 1 [Homo sapiens]
MTMTLHTKASGMALLHQIQGNELEPLNRPQLKIPLERPLGEVYLDSSKPAVYNYPEGAAYEFNA
AAAANA
QVYGQTGLPYGPGSEAAAFGSNGLGGFPPLNSVSPSPLMLLHPPPQLSPFLQPHGQQVPYYLENE
PSGYT
VREAGPPAFYRPNSDNRRQGGRERLASTNDKGSMAMES AKETRY C AV CNDYASGYHY GVW SC
EGCKAFFK
RSIQGHNDYMCPATNQCTIDKNRRKSCQACRLRKCYEVGMMKGGIRKDRRGGRMLKHKRQRD
DGEGRGEV
GSAGDMRAANLWPSPLMIKRSKKNSLALSLTADQMVSALLDAEPPILYSEYDPTRPFSEASMMG
LLTNLA
DRELVHMINWAKRVPGFVDLTLHDQV).
Upon clicking on in the "Select Template Sequence" text box, a pop-up message will appear to provide
examples for the proper format of Accessions or FASTA files to be entered. A link out to the NCBI
Protein Database is available for the user and found above the template entry text box.
NCBI Protein Database
Select Template Sequence
Additional Comparisons (optional)
NCBI COBALT
Enter Level 3 Run Name
NCBI Taxonomy Database exit
Choose Taxonomic Group(s)
All Groups
Use table below to select sequences
0 species selected
Request Residue Run
Choose Query to View
-Select Level 3 Run Name
View Level 3 Data
Combine Level 3 Data
View Single Report
View Combined Report
Level 3 Query Amino Acid Residues
-Enter NCBI Protein Accession OR FASTA Sequence-
Examples:
NP 000116.2
OR
>Sequence description in first line
MTMTLHTKASGMALLHQIQGNELEPLNRPQLKIPLERPLGEVYLDSSKPAVY
53

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Additional sequences can (this is an optional field the user can choose to fill in) also be incorporated into
the Level 3 alignment using the "Additional Comparisons (optional)" text box. Upon clicking on the
"Additional Comparisons (optional)" text box, a pop-up message will appear to provide examples for the
proper format of Accessions or FASTA files to be entered.
Note: In the "Additional Comparisons (optional)" text box, zero or more NCBI Protein Accession must
be entered prior to FASTA sequence(s) if they are to be included in the Level 3 alignment.
NCBI Protein Database [exit
Select Template Sequence
Additional Comparisons (optional)
NCBI COBALT EXIT
Enter Level 3 Run Name
NCBI Taxonomy Database H
Choose Taxonomic Group(s)
All Groups
Use table below to select sequences
0 species selected
Request Residue Run
Choose Query to View
-Select Level 3 Run Name -
View Level 3 Data
View Combined Report
Combine Level 3 Data
View Single Report
Level 3 Query Amino Acid Residues
-Enter 0 or more NCBI Protein Accession(s) followed by 0 or more FASTA Sequence(s)-
Examples:
NP 000116.2
1JLYA
>Sequence description of first FASTA
MTMTLHTKASGMALLHQIQGNELEPLNRPQLKIPLERPLGEVYLDSSKPAVY
>Sequence description of second FASTA
XAGLPVIMCLKSNNHQKYLRYQSDNIQQYGLLQFSADKILDPLAQFEVEPSKTYDGLV
Below the text box where the user can choose to add additional sequences for comparison, is a link to
NCBI COBALT (Constraint-based Multiple Protein Alignment Tool). The NCBI COBALT allows the
user to align multiple sequences and is the alignment tool that SeqAPASS algorithms utilize to set up the
query of individual amino acid residues across species.
Note: The user does not need to use the COBALT link to run a Level 3 evaluation, however the link is
available in case the user chooses to further evaluate or compare multiple potential template sequences.
Under the text "Enter Level 3 Run Name," there is a text box where the user can enter a user defined
name for the run. The user may only enter letters or integers as text for the name. The user defined name
will appear in the "View Level 3 Data" dropdown upon completion of the Level 3 sequence alignment.
54

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
— Reference Explorer
Level 3
© ~
Additional
Names:
Add Protein Name
| estrogen receptor isoform 1
Remove Selected Protein	Restore Default Proteins
Generate Google Scholar Link
Level 3 Query Amino Acid Residues



NCBI Protein Database exit
Select Template Sequence




©
Additional Comparisons (optional)




©

NCBI COBALT exit
Enter Level 3 Run Name





o

NCBI Taxonomv Database exit
Choose Taxonomic Group(s)



All Groups
» 1
0

Use table below to select sequences
0 species selected


Request Residue Run
— View Single Report
Choose Query to View


| -Select Level 3 Run Name - - ©
View Level 3 Data


View Combined Report





Combine Level 3 Data


To complete the set-up for a Level 3 query the user must select which sequences to compare to the
identified template sequence. Listed in the "Choose Taxonomic Group(s)" drop-down are all Taxonomic
Groups that were identified as hits in the "Level 1" primary amino acid sequence alignment data. Because
COBALT is used to align all sequences that are selected, it is recommended that the user selectively
identify sequences from the hit table below to align. For example, selecting sequences with low similarity
to the template sequence along with sequences sharing high similarity to the template sequence can skew
the alignment because COBALT is trying to align all the sequences together. It is recommended that the
user select sequences by first selecting a taxonomic group from the "Choose Taxonomic Group(s)" drop-
down. The user can also use the NCBI taxonomy link to type in the name of the "Taxonomic Groups"
found in the drop-down to look up which species fall in that group.
55

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3 Query Amino Acid Residues
NCBI Protein Database (EXIT
Select Template Sequence
Additional Comparisons (optional)
NCBI COBALT EXIT
Enter Level 3 Run Name
Actinopteri
©
NCBI Taxonomy Database pi
Choose Taxonomic Group(s)
All Groups
r


Actinopteri

Amphibia

Arithozoa

Appendicularia

Arachnida


-
View Combined Report
Combine Level 3 Data
Note: The "Choose Taxonomic Group(s):" drop-down will display the level of the taxonomic hierarchy
being displayed in the "Filtered Taxonomic Group" column of the "Level 1 Data" table. For example, if
the user changes the default option from "class'' to "order," then "order will be displayed in the
dropdown.
+ Reference Explorer
Level 3 Query Amino Acid Residues
NCBI Protein Database Id
Select Template Sequence
Additional Comparisons (optional)
NCBI COBALT
Enter Level 3 Run Name
NCBI Taxonomy Database IK
Choose Taxonomic Group(s)
All Groups
View Combined Report
Combine Level 3 Data
Level 3
Acipenseriformes
Actiniaria
Amphipoda
Anabantiformes
Anguilliformes
Ansnriffirrrns—
56

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
By choosing a group from the drop-down menu, the "Level 1 Data" table below will be fdtered by the
selected Taxonomic Group (see column "Taxonomic Group" in "Level 1 Data" table). When a
"Taxonomic Group" is selected from the drop-down, it can take up to a few seconds for the "Level 1
Data" table to filter completely, depending on the size of the table. The user can then examine each hit
protein in the "Level 1 Data" table and select those that they would like to compare to the template
sequence. To select sequences/species from the filtered "Level 1 Data" table, the user will select the
check boxes in the first column of the table. Although it is not typically recommended, the user may also
select the header check box in the first column to select all sequences/species in the filtered table.
Note: The user can also type the "Taxonomic Group" of interest in the text search box at the top of the
drop-down for quick filtering.
Below is an example where the user selected the "Taxonomic Group" Actinopteri from the drop-down
and then selected individual sequences/species to align with the template sequence. The number of
selected species will be shown in the text above the "Request Residue Run" button.
Primary Report Settings
OH

Enter Level 3 Run Name
E-value:
Sorted by Taxonomic Group:
Common Domains:
Species Read-Across:
Update Report
).01	0
| class	0
I	©
!"•« In	o
Use Default Settings
Refresh Level 2 and 3 runs
Actinopteri
NCBI Taxonomv Database era

Choose Taxonomic Group(s)

[Actinopteri
:
Use table below to select sequences
3 species selected
Request Residue Run

Visualization

Visualize Data
This will open in a separate tab.

View Single Report
Choose Query to View
| -Select Level 3 Run Name -
View Level 3 Data
View Combined Report
Combine Level 3 Data
0) Primary Report
Q Full Report
Partial Hit Protein Sequence
Show Only Eukaryotes
View Level 1 Summary Report
Level 1 Data • Primary
The following links exit the site |
Download Current Level 1 Report Settings







Search: Actinopteri ®




Data
Version
NCBI Accession 5
Protein
Count 5
Species
Tax IDC
Taxonomic
Group 5
Filtered
Taxonomic
Group 5
Scientific Name o
Common Name 5
Protein Name J
BLASTp
Bitscore o

¦
BAG826531 m\l^m
512342


Atractosteus tropicus


I|,i
¦
n
RXM34939.1
I 22508
I -T,,
7906
Actinopteri
Actinopteri
AciDenser ruthenus
Sterlet
Estrooen receDtor
I
629.79

"
	1	1	
"™"



	—m	1			!	¦—L
(See Search, View, and Download Data Tables section of user guide for more information)
The user can choose to align sequences/species from multiple taxonomic groups with the template
sequence, by going back to the "Choose Taxonomic Group" drop-down and selecting another group,
which filters the Level 1 table based on the group selected, and then the user can select additional species
from the newly filtered table. As before, the number of selected species can be tracked in the text above
the "Request Residue Run" button that reads "X species selected".
When the user has selected all sequences they want to align, then click the "Request Residue Run" button.
Upon successful submission of a Level 3 query the user will see the following pop-up message. If
submission is unsuccessful, a message will appear describing the reason for the unsuccessful submission.
57

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
| Level 3 Run Requested
Status queued
Home Request SeqAPASS Run SeqAPASS Run Status
View SeqAPASS Reports Settings

i. —
SeqAPASS Reports
Version 4.0
Logged in as: Blatz,Donovan
To update the "Choose Query to View'' drop-down menu with the completed Level 3 alignments, the user
can click on the "Refresh Level 2 and 3 runs" button.
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

SeqAPASS Run Status


Version 4.0
£) Level 1 Status
Q Level 2 Status	Refresh Data
(if) Level 3 Status
Additionally, the user can check the status of the Level 3 run by clicking the "SeqAPASS Run Status" tab
and the radio button for "Level 3 Status." Typically, Level 3 alignments complete in a few seconds. When
the Level 3 query completes and the Level 1 page has been updated, the user defined Level 3 Run Name
will be available in the "Choose Query to View" drop-down menu. After selecting the desired Run Name
from the drop-down, click "View Level 3 Data" button to view the aligned sequences and set up the
individual amino acid residue alignments with the selected sequences/species.
View Level 3 Data
Choose Query to View
-Select Level 3 Run Name -
Actinopteri
Amphibia
Chondrichthyes
COBALT V1 to COBLAT v2
View Level 3 Data
Choose Query to View
Actinopteri
View Level 3 Data
Upon a successful Level 3 query submission a pop-up message will be displayed as follows in the upper
right-hand side of the screen:
'} Level 3 Run Requested
Status queued
58

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View Single Report
Choose Query to View
¦Select Level 3 Run Name ¦
View Level 3 Data
* ©

View Combined Report





Combine Level 3 Data


Once the Level 3 run has completed, the user can select the "Select Level 3 Run Name" drop down in the
"View Single Report" box to view an individual user defined Level 3 run. If the user has completed
multiple Level 3 alignments, between a template sequence and more than one taxonomic group, the user
can combine Level 3 reports by selecting the "Combine Level 3 Data" button. A pop-up will appear for
the "'Combine Level 3 Reports". There are a series of three steps to combine Level 3 reports. First the user
will "Choose a Level 3 Template" from the dropdown that contains a list of all templates used to generate
alignments in Level 3 by the user. The template sequence must be in-common to the Level 3 runs that will
be combined.
Combine Level 3 Reports
X
Level 3 Jobs Order Level 3 Jobs
Choose a level 3 Template:
-Select Level 3 Template
n
NP_000116.2
(user defined) NP_00Q116,2 estrogen receptor isoform 1 [Homo sapi(
After selecting the template, the user will click the "Next" button. At this point the user will select all
Level 3 Jobs that are to be combined by selecting the check box in the "Level 3 Jobs" dropdown next to
the user defined names. After all jobs that are to be combined are selected the user will click the "Next"
button. Note that as the user moves through each step of the Combine Level 3 Reports feature, the step
the user is currently on is indicated by highlighting the button in blue coloring (example "Level 3 Jobs"
button is highlighted when working on selecting Jobs to combine).
59

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Combine Level 3 Reports
Level 3 Templates

Order Level 3 Jobs
Choose level 3 Job(s):
Choose level 3 Job(s) *
La

o
pi
Amphibia

¦
Aves

a
Actinopteri

The next step in the "Combine Level 3 Reports" feature is to put the jobs in order as to how they should
be displayed in the output. Typically, sequences from an individual taxonomic group are aligned to a
template sequence and named accordingly (e.g., Actinopteri, Amphibia, Aves, etc.). It may be useful to
order the combined report similarly to how the taxonomic groups are displayed on the x-axis of the Level
1 or Level 2 data visualization. Therefore, the user can select the user defined name from the "Order
Level 3 Jobs. " text box and drag and drop the name to the desired order from top to bottom. To move on
to select individual amino acids for sequence comparisons the user will select the "View Level 3 Data"
button.
Combine Level 3 Reports
Level 3 Templates Level 3 Jobs
Order Level 3 Jobs:


Amphibia
Aves






View Level 3 Data
»- Back
The order selected will translate to the top to bottom order displayed in the data table, with the template
sequence only displayed once in the first row and all selected jobs below.
Level 3 Data - Primary
The following links exit the site	Download Current Level 3 Report Settings
Search: Enter keyword ®
Data
Version
Job Name
NCBI Accession 0
Protein
Count 0
Taxonomic Group 0
Scientific Name 0
4
Amphibia
NP 000116.2
1265506
9606
Mammalia
Homo sapiens
4
Amphibia
OCT77903 1
130454
8255
Amphibia
Xenopus laevis
4
Amphibia
BAF30926.1
83
166789
Amphibia
Andrias japonicus
4
Amphibia
AUW64608 1
1591
141262
Amphibia
Andrias davidianus
4
Amphibia
BAE81788.1
94392
8364
Amphibia
Xenopus tropicalis
4
Amphibia
BAJ05031.1
18
2040589
Amphibia
Sclerophrvs capensis
4
Aves
XP 0194684581
34219
9103
Aves
Meleaaris aallopavo
4
Aves
XP 025978017 1
31563
8790
Aves
Dromaius novaehollandiae
4
Aves
KFQ02396 1
30590
8969
Aves
Haliaeetus albicilla
4
Aves
XP 0105801951 25311
52644
Aves
Haliaeetus leucocephalus
(1 of 2)	1 2 | »:|;£] 10' Download Table: ^
60

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View Level 3 Individual Amino Acid Query and Data Page
Clicking the "View Level 3 Data" button, the Level 3 data page opens. The "Level 3 Template Protein
Information" box contains the SeqAPASS Run ID, Query Accession (with link out to NCBI), Ortholog
Count (# of hits identified as ortholog candidates to the query species protein sequence), NCBI Data
(displays the date that NCBI databases and executables were downloaded and incorporated into
SeqAPASS), Level 3 Run Name (defined by user), Template Species (Entered by user in Level 3 query),
Template Protein, and Query Residues (this field is populated with residues upon selection and successful
table update).
Main Level 1 Level 3
Level 3 Template Protein Information
Individual amino acid residue(s) aligned with template sequence. U
SeqAPASS ID: 1290
Level 3 Run Name: Actinopteri
Template Species: Homo sapiens
Template Protein: [NP_000116.2] estrogen receptor isoform 1
Query Residues: No Residues Selected
Show Amino Acid Info...
e the main button to go back to the SeqAPASS Reports list.
Query Accession: NP 000116.2 hub
Ortholog Count: 348
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2 8.1
Cobalt Data: 07/09/2010
Cobalt Version: 2.1.0
Software Version: 3.2
Select Amino Acid Residues
Enter Amino Acid Residue Positions
Copy to Residue List
@ Primary Report
0 Full Report
el 3 Summary Report
Level 3 Data - Primary
The following links exit thi
Download Current Le
Search: Enter keyword ®
Version "CB. Accession 0 ™
Species
Tax ID J
Taxonomic Group o
Scientific Name s
Common Name S Protein Name S Analysis Completed S
Similar
Susceptibility as
Template c
4 NP 000116.2
1265506
9606
Mammalia
Homo sapiens
Human
estrooen receDtor isoform 1
2019082914:55:59
TBD
4 AAU87498.1
495
90988
Actinopteri
Pimeohales Dromelas
Fathead minnow
estroaen receDtor aloha
2019 08 2914:55:59
TBD
4 XP 014061037 1
112166
8030
Actinopteri
Saimo salar
Atlantic salmon
PREDICTED: estroaen receDtor isoform X2
2019 08 2914:55:59
TBD
4 XP 020570152.1
47555
8090
Actinopteri
Orvzias latipes
Japanese medaka
estroaen receDtor
2019 08 2914:55:59
TBD
4 XP 021454037.1
124397
8022
Actinopteri
Oncorhvnchus mvkiss
Rainbow trout
estroaen receDtor isoform X3
2019 08 2914:55:59
TBD
4 AAI624661
87698
7955
Actinopteri
Danio rerio
Zebrafish
Estroaen receDtor 1
2019082914:55:59
TBD
(1 of 1) 1 10 t | Download Table: ~
The user can view the "Level 3" data page, which includes the Data Version, NCBI Accession, Protein
Count, Taxonomic information, Protein Name, and date/time the Level 3 run completed. The data table
remains in order of percent similarity, with those sequences having the highest percent similarity to the
template sequence, on the top, to those with the lowest percent similarity on the bottom. (See Search,
View, and Download Data Tables section of user guide for more information).
61

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
For additional information on Amino Acid Residues, including definition of the acronym, the amino acid
residue name, the classification for the amino acid side chain and the size of the amino acid residue based
on molecular weight, the user can click the "Show Amino Acid Info..." button. A pop-up table, "Amino
Acid info," will be displayed providing this information.
Main
Level 1 Level 3
Level 3 Template Protein Information
Individual amino acid residue(s) aligned with template sequence Use the main button to go back to the SeqAPASS Reports list
SeqAPASS ID: 1290	Query Accession: NP 000116 2 EOT
Level 3 Run Name: Actinopten
Template Species: Homo sapiens
Template Protein: [NP 000116.2] estrogen receptor isoform 1
Query Residues: No Residues Selected
Ortholog Count: 348
Amino Acid info
Show Amino Acid Info...
2T
13M
i;®} Primary Report
Q Full Report
The following links exit the site (EXIT
ID 0
Name ?
Side Chain 0
Size 0
A
Alanine
Aliphatic 89.094
C
Cysteine
Sulfur-Containing
121154
D
Aspartic Acid
Acidic
133.104
E
Glutamic Acid
Acidic
147.131
F
Phenylalanine
Aromatic
165.192
G
Glycine
Aliphatic
75.067
H
Histidine
Basic
155.156
1
Isoleucine
Aliphatic
131.175
K
Lysine
Basic 146.189
L
Leucine
Aliphatic
131 175
M
Methionine
Sulfur-Containing
149.208
N
Asparagine
Amidic
132.119
P
Proline
Aliphatic
115.132
Q
Glutamine
Amidic
146.146
R
Arginine
Basic
174.203
S
Serine
Hydroxylic
105.093
T
Threonine
Hydroxylic
119.119
U
Seleno-cysteine
Sulfur-Containing 168.064
V
Valine
Aliphatic
117.148
W
Tryptophan
Aromatic
204.228
X
Unknown
Unknown
Y
Tyrosine
Aromatic
181.191

Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
Cobalt Data: 07/09/2010
Cobalt Version: 2 1.0
Software Version: 3.2
Download Current Level 3 Report Settings
To obtain individual amino acid residue alignment data in the Level 3 data table, the user must use the
shuttle in the "Level 3 Template Protein Information" box to select positions and amino acid residues
from the chosen template sequence to align with the sequences/species that were selected by taxonomic
group. Single letter abbreviations are used for the amino acid sequences.
G: Glycine	A: Alanine S: Serine T: Threonine
L: Leucine	I: Isoleucine M: Methionine P: Proline
Y: Tyrosine	W: Tryptophan D: Aspartic Acid
N: Asparagine	Q: Glutamine H: Histidine K: Lysine
C: Cysteine V: Valine
F Phenylalanine U: Seleno-cysteine
E: Glutamic Acid
R: Arginine
62

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Select Amino Acid Residues
1M
>

3M
2T
~

219Y
4T


267H
5L


26SK
6H

I - I
272D
7T

[	J
594T
BK



Q6



Update Report
The user can select one residue at a time by clicking and highlighting the residue of interest and then
clicking the top right arrow shuttle button to move the residue to the right-hand box for inclusion in the
alignment. Each time a residue is added to the right-hand box, the left-hand box resets itself to the 1st
residue. Or the user can select multiple residues at the same time by holding the Ctrl button, clicking on
residues, and then clicking the top right arrow shuttle button to move the residues to the right-hand box.
The user can choose to remove selected residues by using the left arrow button to clear one at a time or
the double left arrow button to remove all selected residues at once. When residues of interest (likely
defined from the literature as described above) have been selected, click the "Update Report" button,
which then updates the "Level 3 Data" table with the individual residue alignment data.
Alternatively, the user can enter the amino acid positions in the "Enter Amino Acid Residue Positions"
text box (e.g., 351,353,362) and click the "Copy to Residue List" button.
Upon clicking "Copy to Residue List" the "Select Amino Acid Residues" shuttle box is populated with
the position and residues typed. The user can then click the update Report button to produce Level 3
results in the table below.
Enter Amino Acid Residue Positions
351,353,362,364,394,524
Copy to Residue List
Select Amino Acid Residues
o -
1M

351D
2T

353E
3M
L- I
362K
4T

364V
5L
a
394R
6H
524H
7T
8K
a

9A





Enter Amino Acid Residue Positions
1351,353,362,364.394,524
Copy to Residue List
Update Report
The individual amino acid residue alignment data will then be updated on the right most columns of the
Level 3 Data table. The user can submit a maximum of 50 individual amino acid residues from the
template sequence to compare to the other selected sequences. The individual amino acid residues will be
listed in numerical order starting with the 1st position in the template sequence to the last position in the
template sequence.
63

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3 Data - Primary Report
The default report is the "Primary Report" and can be recognized as such because the radio button for
"Primary Report" above the "Level 3 Data" table is selected.
The "Primary Report" columns for the alignment will be titled "Similar Susceptibility as Template" ("Y"
or "N" for yes or no, respectively), followed by Position 1, Amino Acid 1, Total Match 1, Position 2
Amino Acid 2, Total Match 2, Position 3, Amino Acid 3, Total Match 3.... The template sequence will
always be in the top row of the "Level 3 Data" table followed by the previously selected sequences.
Further, the residues selected in the shuttle will also be displayed in the top row corresponding to the
template sequence. Each Position and Amino Acid in the following rows are those corresponding to the
Protein Accession identified in that row and aligning with the template sequence. The Total Match X
describes whether the amino acid residue matches the template based on side-chain classification and
molecular weight, "Y," for yes, or "N," for not a match to the template. The user can evaluate this data to
understand how well conserved an amino acid residue is across species or in a species of interest to add
an additional line of evidence to support (or question) susceptibility predictions. The user can also
download the current report settings by selecting the "Download Current Level 3 Report Settings." This
csv allows the user to track which settings were used or changed by the user when downloading a data
table.
® Primary Report
0 Full Report
View Level 3 Summary Report
Level 3 Data - Primary
The following links exit the site lEIBTl
Download Current Level 3 Report Settings
Search: Enter keyword
Protein Name 5
Analysis Completed 0
Similar
Susceptibility as
Template 0
Position 1
Amino Acid
Total Match

Position 2
Amino Acid
Total Match
Position 3
Amino Acid
Total Match
Po:
estrooen receptor isoform 1
2019 0
2914:55:59
Y
351
D
Y

353
E
Y
362 I K
Y

estroaen receptor alpha
2019 0
2914:55:59
Y
320
D
Y

322
E
Y
331
K
Y

PREDICTED: estroaen receDtor isoform X2
2019 0
2914:55:59
Y
316
D
Y

318
E
Y
327
K
Y

estroaen receptor
2019 0
2914:55:59
Y
355
D
Y

357
E
Y
366
K
Y

estroaen receDtor isoform X3
2019 0
2914:55:59
Y
319
D
Y

321
E
Y
330
K
Y

Estroaen receptor 1
2019 0
2914:55:59
Y
319
D
Y

321
E
Y
330
K
Y



(1 Of 1) 1

10 ~ Download Table:







When downloading the current "Level 3 Report Settings", the following information will be present in the
csv. If the user decides to change the default settings, the csv can be utilized for quick information if the
SeqAPASS page is no longer open.

A
B
1
Level 3 Report Settings

2


3


4
Analysis TimeStamp
2019 05 16 11:04:08
5
SeqAPASS version
3.2
6
Level 3 Run Name
Actinopteri
7
Template Species
Homo sapiens
8
Template Protein
[NP 000116.2] estrogen receptor isoform 1
9
Query Residues
1M, 2T, 3M, 4T, 5L, 6H, 71, 8K, 9A, 10S
10
Query Accession
NP 000116.2
64

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3 Data - Full Report
The user may choose to view the Full Report for Level 3 data by selecting the radio button above the
"Level 3 Data" table for "Full Report." The table below will automatically update to display all the
alignment details.
The "Full Report" columns for the alignment will be titled "Similar Susceptibility as Template" ("Y" or
"N" for yes or no respectively), followed by Position 1, Amino Acid 1, Direct Match 1, Side Chain 1,
MW1, MW Match lTotal Match 1, Total Match 1, Position 2, Amino Acid 2, Direct Match 2, Side Chain
2, MW2, MW Match Total Match 2, Total Match 2	The template sequence will always be in the
top row of the "Level 3 Data" table followed by the previously selected sequences. Further, the residues
selected in the shuttle will also be displayed in the top row corresponding to the template sequence. Each
Position and Amino Acid in the following rows are those corresponding to the Protein Accession
identified in that row align with the template sequence. The Total Match X describes whether the amino
acid residue matches the template based on side-chain classification and molecular weight, "Y," for yes,
or "N," for not a match to the template. The user can evaluate this data to understand how well conserved
an amino acid residue is across species or in a species of interest to add an additional line of evidence to
support (or question) susceptibility predictions.
© Primary Report
% Full Report


View Level 3 Summary Report






Level 3 Data
•Full

The following links exit the site IWtlT





Download Current Level 3 Report Settings


Search: Enter keyword ®


Analysis Completed 0
Similar
Susceptibility as
Template 0
Position 1
Amino Acid 1
Direct Match 1
Side Chain 1
Side Chain ......
Match 1 MW1
MW Match 1
Total Match 1
Position 2
Amino Acid 2


201908 2914:55:59
v
351
D
v
Acidic
Y
133.104
Y
Y
353
E
201908 2914:55:59
Y
320
D
Y
Acidic
Y
133.104
Y
Y
322
E
201908 29 14:55:59
-
316
D
v
Acidic
Y
133.104
Y
Y
318
E

201908 2914:55:59
Y
355
D
Y
Acidic
Y
133.104
Y
Y
357
E

201908 29 14:55:59
:
319
D

Acidic
Y
133.104
Y
Y
321
E
201908 29 14:55:59
Y
319
D
Y
Acidic
Y
133.104
Y
Y
321
E




(1 Of 1) 1
10
Download Table: r —










The "Direct Match X" column describes whether the hit amino acid is an exact match to the template
amino acid, providing a "Y" or "N" for yes or no, respectively. The "Side Chain X" column indicates the
side chain classification for the amino acid residue (click on "Show Amino Acid Info... for more
information on classifications). The "Side Chain Match X" column indicates whether the hit side chain
has the same classification as the template amino acid, providing a "Y" or "N" for yes or no, respectively.
The "MW X" column indicates the molecular weight (g/mol) of the amino acid residue and the "MW
Match X" column indicates whether the hit molecular weight has a difference in molecular weight greater
than or equal to 30 g/mol compared to the template amino acid, providing a "Y" or "N" for yes or no,
respectively. For the "Total Match X" to be "Y," both "Side Chain Match X" and "MW Match X" should
be either "Y" and Y" or one "Y" and one "N," respectively. Only if both "Side Chain Match X" and
"MW Match X" are "N" and "N," then the "Total Match X" is "N" for no. Ultimately, the Total Match 1,
2, 3, 4.... are used to inform the "Similar Susceptibility as Template" column. If there is one or more "N"
for Total Match comparing any amino acid residue to the template across a row for a given species, then
the "Similar Susceptibility as Template" is "N" for no, indicating that the hit species is predicted NOT to
have the same susceptibly prediction as the template sequence. However, if all "Total Match X" are "Y"
for yes, then the "Similar Susceptibility as Template" is "Y" indicating that the hit species is predicted to
have the same susceptibly prediction as the template sequence.
65

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Multiple Level 3 Runs Requiring the Same Amino Acid Residue Comparisons
Typically, Level 3 individual amino acid residue alignments are submitted repetitively, comparing species
from one taxonomic group at a time to the template amino acid residue(s).
View Level 3 Data

Choose Query to View


|-Select Level 3 Run Name -


ii r





Actinopteri


Amphibia


Aves


Crocodyliadae


Dipnoi


Lepidosauria
~



mammalia

Testudines

Therefore, to increase efficiency in submitting the same alignments in Level 3 repeatedly, the user can
take advantage of the "Copy to Residue List" button. For the first alignment of amino acid residues, the
user would select the amino acid residues to align and click the "Update Report" button.
Select Amino Acid Residues O"




1M
3M "H
4T
£ a
9A
351D
355V
356H
375Q
400G
Enter Amino Acid Residue Positions
[351,355,356,375,400

Copy to Residue List


Update Report

By clicking "Update Report" the residues that were selected will be copied into the "Enter Amino Acid
Residue Positions" text box. When the user selects a new "Level 3 Run Name" (from the same Level 1
query accession) to view by using the "View Level 3 Data" dropdown and clicking the "View Level 3
Data" button on the "Level 1 Query Protein Information" page, the "Enter Amino Acid Residue
Positions" text box will be populated with the amino acid residues selected from the previous run.
Enter Amino Acid Residue Positions
I 351,353,362,364,394,524)
Enter residue positions as a comma separated list
Copy to Residue List
The user can keep, add, or delete, residue positions in this box and click "Copy to Residue List" button.
The amino acid residues will then be moved to the "Select Amino Acid Residues Shuttle" and the user
can then click "Update Report" to view the data in the table below.
66

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3 Data Visualization
In formation Page : Heat Map
The Heat Map is a feature that allows the user to have a visual representation of the chosen amino acid(s)
for a single Level 3 run. The Heat Map utilizes color to denote which amino acids are a total match,
partial match and not a match to the template sequence. The Heat Map is accessed within the "Level 3"
page under the "Visualization" drop down and will open in a separate tab. The Heat Map has many
similar features to the Level 1 and 2 boxplots with some added customizable features. There are many
settings that can be changed within the Heat Map and if necessary, there are informational buttons that
can be opened to get added information regarding the different options.
To get to the Heat Map, open a completed Level 3 run and click the "Visualization" drop down then
select the "Visualize Data" button. This will bring you to the Heat Map where there is information
regarding the features of the map. Then select the "Heat Map" icon to access the Heat Map itself.
Select Amino Acid Residues	©!"¦ |
2T

3M

4T

6H

7T

8K

9A

10S

11G

E
1M
5L
143E
202C
Enter Amino Acid Residue Positions
Copy to Residue List
Update Report

Visualization
©S|
Visualize Data
This will open in a separate tab.

Heat Map Customization Page
Upon opening the Heat Map the user will have options to customize the visualization. The first feature is
the selection of taxonomic groups to be added to the Heat Map. The default order of the taxonomic
groups is based on how the species are selected during the Level 3 set up process. There is the option to
include all taxonomic groups or a user chosen few. To move the taxonomic groups over to place them in
order you must either click or *CTRL* click and select the arrow pointed to the right. Once the
taxonomic groups are moved over, the user can order the groups by dragging them up or down.
Level 3
Taxonomic Groups
Mammalia
Testudines
Aves
Crocodylia
Lepidosauria
Amphibia
Dipnoi
Order Level 3
Taxonomic Groups
Level 3
Taxonomic Groups
Aves
Crocodylia
Amphibia
Order Level 3
Taxonomic Groups
Mammalia
Lepidosauria
Testudines
Dipnoi

Report Options
There are multiple options within the Heat Map that can be changed based on what information the user
desires to have present. The Heat Map itself can be changed between the "Simple" report which shows
the amino acid and its respective position or the "Full" report which gives added information about each
67

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
amino acid. The user can also change between the common name and scientific name displayed on the
Heat Map.
Optional Selections
The "Optional Selections" for the Heat Map will highlight the name for each respective species based on
what is selected; Ortholog Candidates, Threatened Species, Endangered Species, Common Model
Organism. Only one optional selection can be highlighted at a time.
H Total Match
| Partial Match
' | Not a Match H) Threatened Species

Common Name
Amino Acid
1
Amino Acid
2
Amino Acid
3
Amino Acid
4
Human

32K
46S
55P
64A
Diamondback terrapin

32K
46S
55P
64T
Western painted turtle

32K
46S
55P
64T
Chinese soft-shelled turtle

32K
46S
55P
64T
Terrapins

32K
46S
55P
64T
Goodes thornscrub tortoise

32 K
46S
55P
64T
Pacific ridley

32K
46S
55P
64T
Painted turtle

32K
46S
55P
64T
Green sea turtle

32K
46S
55P
64T
Three-toed box turtle

61K
75S
84 P
93T
Heat Map Settings
Changing the "Heat Map Settings" will give the user the option to display specific information in the Heat
Map. The user can select or deselect a variety of the settings to have a customized Heat Map. The user
can choose to display Species Names as Common Name or Scientific Name, choose to highlight special
groups such as Ortholog candidates, Threatened and Endangered Species, or Common Model Organisms.
Additionally, the user may choose options to remove the text from the Susceptibility Prediction, Amino
acid abbreviation or position, and further remove sections of the Heat Map.
Report Options
© -
Report Type —
(•) Simple
Q Full
Species Name Type —
(J) Common Name
( ) Scientific Name
Optional Selections
©"
Ortholog Candidates (j^ Threatened Species ^ Endangered Species ^ Common Model Organisms

Heat Map Settings



m ®
H
¦
Susceptibility Prediction Heat
Map
Susceptibility Prediction Alignment Prediction Heat
Text Map
Amino
Acid
Amino Acid
Position
68

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Common Name
Similar
Susceptibility
Amino
Acid 1
Amino
Acid 2
Amino
Acid 3
Amino
Acid 4
Human
V
133E
135E
137S
584I
Western painted turtle
V
127E
129D
131S
577V
Lappet-faced vulture
N
,25E
127E
129G
575I
Nile crocodile
V
127E
129D
131S
575I
Split-tongued squamates
V
127E
129E
131N
576V
Japanese giant salamander
r
126E
128E
130G
573L
West African lungfish
V
131E
133E
135S
580G
Total Match
~ Partial Match
^ ] Not a Match
Above is an example of a simple report which shows the amino acid and its respective position. Each
amino acid is compared to the template species and can receive a dark blue color (Total match), a light
blue color (Partial match), or a yellow color (Not a match). To access more information regarding each
amino acid, the user can scroll over the amino acid box to bring up a box with added data.
Amino acid alignments are compared to the user selected template amino acids: Comparing Side Chain
Classification (e.g., acidic, basic, aromatic) and molecular weight as surrogate for size (> 30g/mol
difference = N). If both the side chain classification and molecular weight are within 30 g/mol then the
amino acid will be a total match, if only one amino acid characteristic is similar to the template then it is
labeled a partial match, and if both characteristics differ from the template then the alignment is not a
match.
Below is an example of a full report which also shows the amino acid and its respective position but also
shows the amino acid's side chain classification, molecular weight, and if it is a Total match (dark blue)
or Not a match (yellow) to the template species.
Common Name
Similar
Susceptibility
Amino
Acid 1
Side
Chain 1
MW 1
Total
Match 1
Amino
Acid 2
Side
Chain 2
MW 2
Total
Match 2
Amino
Acid 3
Side
Chain 3
MW 3
Total
Match 3
Human
V
274G
Aliphatic I 75.067
Y
275E |
¦
Acidic
147.131
v
276G
Aliphatic
75.067
Y
Western painted turtle
N
268Q
Amidic
146.146
N
269 D
Acidic
133.104
V
270A
Aliphatic
89.094
Y
Nile crocodile
N
268Q
Amidic
146.146
N
269D
Acidic
133.104

270A
Aliphatic
89.094

Split-tongued squamates
N
268Q
Amidic
146.146
N
269D
Acidic
133.104
V
270S
Hydroxylic
105.093
N
Japanese giant salamander
N
267P
Aliphatic| 115.132

268D
Acidic
133.104
Y
269Q
Amidic
146.146
N
Match
~ Not a Match
69

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The example below shows only the "Alignment Prediction" (Amino acid match against template) for each
amino acid in chronological order.
Common Name
Amino
Acid 1
Amino
Acid 2
Amino
Acid 3
Amino
Acid 4
Human




Western painted turtle




Lappet-faced vulture



Nile crocodile




Split-tongued squamates



Japanese giant salamander




West African lungfish




~
~
Total Match
Partial Match
Not a Match
There is added information for each species (NCBI Accession, Protein Name, Scientific Name, and
Taxonomic Group) along with each amino acid (Amino Acid Name, Abreviation, Side Chain, and
Molecular Weight). This can be found by scrolling over the species name or the amino acid.
Common Name
Similar
Susceptibility
Amino
Acid 1
Amino
Acid 2
Human
-
274G
275E
Western paintec
Nile crocodi
Split-tongued squ
Japanese giant sal
~
~
Total Match
Partial Match
Not a Match
NCBI Accession
Protein Name
Scientific Name
Taxonomic Group Mammalia
NP 000116.2
estrogen receptor isoform 1
Homo sapiens
Ortholog Candidate
ty
Amino
Acid 1
Side
Chain 1
MW 1
Total
Match 1

274G
Aliphatic
75.067
v
¦
268Q
Name
Glycine
N

268Q
Abv
G
N
~i
268Q
Side Chain
Aliphatic
N
¦
267P
MW
75.067
II Y
70

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
To push the customized Level 3 Heat Map to the Decision Summary Report as a visualization, press the
"Push Level 3 Heatmap to DS Report" button. It will then be active within the DS Report Level 3 section.
To download the Heat Map, press the "Download Heatmap..." button. The Heat Map can be downloaded
as an SVG, JPG, or PNG.
Download
Heatmap...
Push Level
3 Heatmap
To DS
Report
Decision Summary Report
The "Decision Summary (DS) Report" is a feature that gives the user options to design a single output
page to concisely view results from all Levels of the SeqAPASS evaluation for completed jobs. The
output is customizable to include visualizations and susceptibility predictions that can be downloaded in a
PDF format. The "DS Report" page becomes activate when the user takes action on a result page to push
tables or visualizations to the DS Report. The "DS Report" page will contain a maximum of one Level 1
output (and visualization) and one Level 3 output (and visualization) but can contain multiple Level 2
domain outputs (and their respective visualizations).
Main
Level 1
Level 2
Level 3
DS Report
To push results from any Level to the DS Report, the user must press the "Push Level # To DS Report"
button. The "DS Report" button will then become active for the user to view the report settings. The DS
Report can be updated as the user changes settings in Level 1, Level 2 and Level 3 (Adding or removing
amino acids) but the user must push the updated report
to the DS Report again using the "Push Level # To DS
Report" button (There will be a notification next to the
button if settings have been updated to remind the user to push the report). If the user chooses to change
to a different SeqAPASS job (e.g., a different protein accession), the "DS Report" button will become
inactive and the user must push the data from the new job to the DS Report as described previously.
Level 1 of the Decision Summary Report
Upon clicking the "DS Report" button, the user is brought to a new page that will contain the "Level 1
Report" section of the DS Report which will show all the pertinent information for the query protein and
report settings that were pushed to the report. The user can also include the Level 1 visualization in the
DS Report by going to the "Level 1 Visualization" page and clicking "Push to Boxplot to DS Report".
Push Level 1 To DS Report
71

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The default visualization or a user customized visualization will then be inserted in the downloadable DS
Report PDF once the radio button is selected.
Level 1 Info
Add Level 1 Info to Report
Level 1 Query Protein Information
SeqAPASS ID: 1631
Query Species: Homo sapiens
Query Protein: estrogen receptor isoform 1
Query Accession: NP 000116.2 Uit
Ortholog Count: 410
Protein and Taxonomy Data: 06/08/2020
BLAST Version: 2.10.0
Software Version: 4.1
Report Settings
Report Type: Primary
E-Value: 0.01
Sorted By Taxonomic Group:
CLASS
Common Domains: 1
Species Read-Across: Y
Cut-off %:34.43
Show Only Eukaryotes: Y
Optional Components
Component	Add to Report
Level 1 Visualization
Once the user is satisfied with the data that has
been pushed to the DS Report, the "DS Report"
button will bring the user to the "Level 1 Report"
section which gives the user customizable
options. In the "Level 1 Report" section, there is a
series of checkboxes in the "Select Taxonomic
Groups (CLASS)" box. Here the user can select
which taxonomic group(s) they would like to
select and display in the DS Report. Upon
selecting the taxonomic group(s), the user can
then customize the report in the "Select Species"
box, by selecting the checkbox next to the species
for which the user would like data from Level 1
displayed in the "Final Decision Summary
Report" table at the bottom of the page. The template species will always be selected and cannot be
deselected. Species will be active only when at least one taxonomic group is selected in the "Select
Taxonomic Groups (CLASS)" box. Level 1 results for those species selected from the "Select Species"
box will be integrated in the "Final Decision Summary Report" table at the bottom of the page (Note: if
the user does not push a Level 1 job to the "DS Report" page, there will be no information in that
section).
Level 1 Info
The Level 1 information section becomes present when either a Level 1 report or a Level 1 visualization
is pushed to the DS Report. The information contained in the section includes the "Level 1 Query Protein
Information" (i.e., SeqAPASS ID, Query Species, Protein, and Accession, Ortholog Count, Protein and
Taxonomy Data, Blast Version and Software Version.) as well as the "Report Settings" (i.e., Report Type,
E-Value, Sorted By Taxonomic Group, Common Domains, Species Read-Across, Cut-Off, and Show
Only Eukaryotes.) and finally the "Optional Components" section which contains the option to include
the "Level 1 Visualization" to the report.
Including Visualizations in DS Reports
The user can also include the "Level 1 Visualization" by going to the visualization page and either
pushing the default visualization or a user modified visualization which will then be attached in the
downloaded PDF once the radio button is selected. In the scroll downs, the template species will always
be selected and cannot be deselected. Species will be not active until a taxonomic group box is selected.
Once that occurs, those respective species will become active and can be deselected individually or by the
select all function. Those species selected will become active in the "Final Decision Summary Report"
table at the bottom of the page (Note: if a user
pushed only a boxplot to the DS Report, then only Push Level 1 Boxplot To DS Report
the "Level 1 Info" and the "Optional Components"
will be active).
72

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1 Report
Select Taxonomic Groups (CLASS)
Select All	_	. _
Taxonomic Group
Select Species
Mammalia


Testudines
1
Western gorilla
Aves

Chimpanzee
Crocodylia

Western lowland gorilla
Lepidosauria

Pygmy chimpanzee
Amphibia

Sumatran orangutan
Chondrichthyes

Bomean orangutan
Ceratodontimorpha

Rhesus monkey
Coelacanthiformes

Sooty mangabey
Actinopteri

Crab-eating macaque
Cladistia

Pig-tailed macaque
Petromvzontiformes

Uqandan red Colobus
5 Common Name
j) Scientific Name
Add Level 1 Info to Report |v
Level 1 Query Protein information
SeqAPASS ID: 1306
Query Species: Homo sapiens
Query Protein: estrogen receptor isoform 1
Query Accession: NP 000116.2 HiM
Ortholog Count: 348
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
Software Version: 3.2
Report Settings
Report Type: Primary
E-Value: 0.01
Sorted By Taxonomic Group:
CLASS
Common Domains: 1
Species Read-Across: Y
Cut-off %:33.93
Show Only Eukaryotes: Y
Optional Components
Component	Add to Report
Level 1 Visualization
Level 2 of DS Report
The Level 2 section of the DS Report contains all the domains that have been pushed to the report. There
can be multiple domains present in the section once they have been run and pushed individually to the
report. The user can also include each respective "Level 2 Visualization" by going to the visualization
page and either pushing the default visualization or a user modified visualization which will then have the
option to be attached in the downloaded PDF. Once a domain is selected, it will appear in the "Final
Decision Summary Report" table at the bottom of the page (Notes: if the user does not push a Level 2 run
to the DS Report page, there will be no information in that section. If a visualization is pushed to the DS
Report before a Level 2 report, the domain will be present along with the "Add Visualization to Report"'
button being active.).
Level 2 Report
Select Level 2 Domains
Add to
Final Decision
Report
Domain
Optional Components
Select All
B

Add Info
to Report
Add Visualization
to Report


¦



Level 3 of DS Report
The Level 3 section of the DS Report contains all the information for the query protein and report settings
that were pushed to the report. It also contains the ammo acids that were updated in the report and pushed
over. New amino acids added after data has been pushed to the DS report will need to be pushed over.
The Yes (Y) or No (N) susceptibility will be displayed in the "Final Decision Summary Report'' table.
The user can also include the "Level 3 Visualization" by going to the visualization page pushing a user
modified visualization which will then have the option to be attached in the downloaded PDF. (Notes: if
the user does not push a Level 3 run to the DS Report page, there will be no information in that section.
Also, if a "Level 3 Visualization" (Heat Map) is pushed before a Level 3 report, the "Level 3 Info" will be
populated with that respective run's information.
73

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3 Report
Add to Final Risk Report ^





Add Level 3 Info to Report

351 D,353E,362K,364V,394R.524H

SeqAPASS ID: 1306



Template Species: Homo sapiens
Template Protein: [NP_000116.2] estrogen receptor isoform 1
Protein and Taxonomy Data: 02/28/2019

Optional Components


Component Add to Report

BLAST Version: 2.8.1

Level 3 Visualization

Software Version: 3.2

I	1




Final Decision Summary Report Table
The "Final Decision Summary Report" table contains the important data and susceptibility predictions for
each level run, for all the species selected in the Level one section. The table takes the susceptibility
prediction for each run and easily displays the results for a quick interpretation. The complete table can be
either saved as an excel spreadsheet or .csv file. It will also be added into the PDF when downloaded.
Each selected specie(s) will have its own respective row which contains the information that has been
pushed to the "Final Decision Summary Report" table. The columns will show the Data Version, NCBI
Accession, Filtered Taxonomic Group, Species, Protein Name, Level 1 Susceptibility Prediction as Yes
(Y) or No (N), Level 2 Common Domain(s) Name and respective Susceptibility Prediction as Yes (Y) or
No (N), Level 3 Template Species, and Level 3 Amino Acid Susceptibility Prediction as Yes (Y) or No
(N). (A few things to note: if there are multiple domains pushed to the "Final Decision Summary Report"
table, each domain will have their own column. Also, for species to have either a Yes (Y) or No (N)
susceptibility prediction in the table, they must be pushed to the report from the Level 3 run as well as
selected in the Level 1 taxonomic groups/species selection. If a species was not included in the Level 3
report that was pushed but is included in the "Final Decision Summary Report" table, they will receive a
NA for their Level 3 susceptibility prediction.)
Final Decision Summary Report
Search: Enter keyword
Data
Version
NCBI Accession 0
Filtered
Taxonomic Group
Species 0
Protein 5
Susceptible
(Y/N) C
(345)
cd06157,
NR_LBD,
The ligand
binding
domain of
nuclear
receptors, a
family of
ligand-
activated
transcription
regulators
Level 3
Template
Level 3 Amino
Acids (Y/N)
5
NP 000116.2
Mammalia
Human
estrogen receptor isoform 1
Y

Homo sapiens
Y



(1 of 1)
1 10 ~ Download Table:




Download DS Report as PDF
To capture all the data pushed to the DS Report as a PDF, press the "Download DS Report" button. The
DS Report PDF will match the data on the DS Report page and will include the visualizations if selected
by the user. The information for each Level that is pushed to the downloaded DS Report PDF include all
the Query Protein Information for that respective protein, domain(s), and template protein. (Note: Once
the PDF is created and the DS Report page has been updated, the user must redownload the PDF to have
the most up to date version of the page.)
74

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level I of DS Report PDF
The Level 1 section of the DS Report PDF will contain all the "Level 1 Query Protein Information" along
with the Level 1 "Report Settings" for that respective protein's run. This information will not be present if
no Level 1 run information or Level 1 visualization is pushed to the DS Report PDF.
Level 1



Level 1 Query Protein Information
Report Settings


SeqAPASS ID: 1679
Report Type: Primary


Query Species: Homo sapiens
E-vahie: 0.01


Query Protein: estrogen receptor isoform 1
Sorted By Taxonomic Group: CLASS


Query Accession: NP_000116.2
Common Domains: 1


Grtiiolog Count: 410
Species Read-Across: Y


Protein and Taxonomy Data: 06/08/2020
Cut-off %: 34.43


BLAST Version: 2.10.0
Show Only Eukaiyotes: Y


Software Version: 4.1


Level 1 Visualization
_ro
F
In
4-J M
c

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 2



Level 2 Query Protein Information
Report Settings


SeqAPASS ID: 1653
Report Type: Primary


Query Species: Homo sapiens
E-value: 10.0


Query Domain: (345) cd06157. NRLBD. The ligand bmding
domam of nuclear receptors, a family of ligand-activated
transcription regulators
Sorted By Taxonomic Group: CLASS


Query Accession: NP_000116.2
Species Read-Across: Y


Ortbolog Count: 410
Cut-off %: 55.00


Protein and Taxonomy Data: 06/08/2020
Show Only Eukaryotes: Y


BLAST Version: 2.10.0



Software Version: 4.1


Level 2



Level 2 Query Protein Information
Report Settings


SeqAPASS ID: 1653
Report Type: Primary


Query Species: Homo sapiens
E-value: 10.0


Query Domain: (341) cd06929. MR LBD Fl. Lieand-binding Sorted By Taxononuc Group: CLASS


domain of nuclear receptor family 1



Query Accession: NP_000116.2
Species Read-Across: Y


Ortholog Count: 409
Cut-off %: 42.03


Protein and Taxonomy Data: 06/08/2020
Show Only Eukaryotes: Y


BLAST Version: 2.10.0



Software Version: 4.1


Level 3 of DS Report PDF
The Level 3 section of the DS Report PDF will contain all the "Level 3 Template Protein Information"
along with the Level 3 "Selected Amino Acids" for that respective run. This information will not be
present if no Level 3 run information or Level 3 visualization is pushed to the DS Report PDF. The run
can have a visualization "Heat Map"' that can be added to the DS Report PDF by selecting the "Add
Visualization to Report" radio button.
Level 3

Selected Amino Acids Level 3 Template Protein Information
5L. 57G. 120F, 177E
SeqAPASS ID: 1653
Template Species: Homo sapiens
Template Protein: [NP_0Q0116.2] estrogen receptor
isoform 1
Protein and Taxonomy Data: 06/08/2020
BLAST Version: 2.10.0
Software Version: 4.1

76

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Final DS Report Table in DS Report PDF
The Final Decision Summary Report table will display the species that were selected for the Level 1 set of
the DS report. It can display the specie's respective "Protein", "Level 1 Susceptibility (Y/N)", common
domain(s), "Level 3 Template", and "Level 3 Susceptibility" all depending on what is selected from the
DS Report set up.
Final Decision Summary Report
Species
Protein
Level 1 Susceptible (Y/N)
(345) cd06157, NR LBD, The
ligand binding domain of
nuclear receptors, a family of
ligand-activated transcription
regulators
Human
estrogen receptor isoform 1
Y
Y
Western gorilla
estrogen receptor alpha
Y
Y
Chimpanzee
estrogen receptor isoform X2
Y
Y
Western lowland gorilla
estrogen receptor isoform X2
Y
Y
Pygmy chimpanzee
estrogen receptor isoform X2
Y
Y
Bomean orangutan
estrogen receptor alpha
Y
Y
Sumatran orangutan
estrogen receptor isoform X2
Y
Y
Sooty mangabey
PREDICTED: estrogen receptor
isoform X2
Y
Y
Rhesus monkey
estrogen receptor isoform X2
	Y	
	Y	

Moving Between Level 1, Level 2, Level 3, and Decision Report Data Pages
As a user chooses to view Level 1, Level 2, or Level 3 data in the "View SeqAPASS Reports" tab, new
buttons become available for allowing the user to move between Levels of an analysis. The Decision
Report data page will become active once a user pushes a finished run using the "Push Level # To DS
Report" button. Please see snapshot below.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Log out
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

SeqAPASS Reports

Version 4.1
Logged in as: Donovan Blatz

Main
Level 1 Level 3




The user can use the "Main" button to return to the list of completed Level 1 runs and select a different
query accession to view. The "Level 1" button brings the user to the Level 1 data page, where the user can
set up queries for Level 2 and Level 3, as well as select the button to view Level 2 and Level 3 data pages.
Open Level 1, Level 2, and Level 3 pages remain open until the user selects a different run to view on the
"Main" page. Moving between tabs, such as "Home," Request SeqAPASS Run," and "SeqAPASS Run
Status", does not close the Level 1, Level 2, or Level 3 pages that have been opened.
77

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Note: If the user logs out of the SeqAPASS tool, upon logging back in, the data will reset to default
settings. Therefore, the View SeqAPASS Reports tab will not display the "Main," "Level 1," "Level 2,"
or "Level 3" buttons, until a query is chosen and Level 2 and Level 3 pages are opened.
Search, View, and Download Data Tables
The user can use the "Search" box to enter text to search the table. Further, the user can use the arrow
buttons and page numbers on the bottom of the screen to view all data and the drop-down to expand the
table to 10, 20, or 50 rows. There are also left and right scroll bars at the bottom of the tables to allow the
user to view all columns of the table.
Search using text box on top of tables:
Search: Enter keyword
0
Options for viewing data:
(1 of 95)
1! 2 3 4 5 6 7 8 9 10
10- Download Table:
¦V-
All data tables in the SeqAPASS tool can be downloaded as Excel or csv fdes. Hie icons for downloading
the fdes are present on the bottom right-hand side of all tables. Click the icon to download data.
Download Table:
Upon selecting a csv fde, the user can choose to save or open the fde. Each fde is appropriately named by
Level of the SeqAPASS evaluation and report type.
S?Default
_' Second Local Minirr
User Defined
View Cutoff
Sorted by Taxonomic Group: jdass
HJ
Species Read-Across:
Update Report
Use Default Settings



®Prima»y Report
!®l Partial Hit Protein Sequenoe
Si mmsmmmxm*
@ Show Only Eukaryotes

Level 2 Data
Primary


Search: Enter keyword
Opening SeqAPASS_Level2_Primary_Report.esv
You have chosen to open:
SeqAPASS_Level2_Primary_Report.csv
which is: Microsoft Excel Comma Separated Values File
from: https://seqapassstage.rtpnc.epa.gov
What should Firefox do with this file?
o ] Open with: | Microsoft Excel (default)
0 Save File
|T| Do this automatically for files like this from now on.
Filtered
Taxonomic
Scientific Name 5
Protein Name 5
MP 000116.2
estrogen receptee issform 1
estrogen receptor alpha
XP 003993525.1
white-tufted-ear
PREDICTED: estrogen receotor isoform X1
XP 017393067-1
Cebus capudnus in
white-faced sapajcu
PREDICTED: estrogen receptor
XP 018884801.1
Gorilla gorilla gorilla
western lowland gorilla
PREDICTED: estrogen receptor
XP 003811544.1
pygmy chimpanzee
PREDICTED: estrogen recector isoform X2
XP 0033115S8.1
Pan troglodytes
PREDICTED: estrogen
ABY64724.1
XP 011852190-1
PREDICTED: estrogen receptor isoform X2
XP 002817538-1
Sumatran orangutan
PREDICTED: estrogen receptor isoform X2
(1 of 82)
1234 567 89 10
10[T] Download Table:
78

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Upon selecting a .xls file, the user can save the report to their desired location. Each file is appropriately
named by Level of the SeqAPASS evaluation and report type.
B Show Only Eukaryotes

View Level
 SeqAPASS Reports v Q Search SeqAPASS Reports P
folder fE5 - O
Name Date modified Type
Level 2 Data - Primary








* This PC








Network



Search: Enter keyword '
Data
Version
NCBI Accession 0
Protein
Count 2
Species
Tax ID 0
Taxonomic
Group 0
Filtered
Taxonomic
Group 0

4
NP 000116.2
1265506
9606
Mammalia
Mammalia


4
XP 014992596.1
88400
9544
Mammalia
Mammalia

Filename:
Save as type:

4
ABY64721.1
931
9534
Mammalia
Mammalia

SeqAPASS_Level2_Primary_Reportj(ls
*
XP 003255939.1
38964

Mammalia
Mammalia

Microsoft Excel 97-2003 Worksheet (*.xls)








4
XP 025240309.1
52618
9565
Mammalia
Mammalia


4
XP 003811544.1
51891
9597
Mammalia
Mammalia

Hide Folders



4
XP 011922091.1
66748
2521
Mammalia
Mammalia


Cancel
4
ABY64717.1
2023
9593
Mammalia
Mammalia








4
XP .092817533,1
145798
9601
Mammalia
Mammalia
Pongo abelii
Sumatran orangutan
PI
4
XP 011852190.1
38580
9568
Mammalia
Mammalia
Mandrillus leucophaeus
Drill



(1 of 95)
1 23456789 10 i



Log out
The user can log out from any page in SeqAPASS, by clicking the "Log out" link on the upper right-hand
side of the page. If a user clicks Log out and then Logs back in, all settings will be set back to default.
User can log out at any time by clicking the "Log out" link on the upper right-hand side. Any successfully
submitted queries that were requested prior to logging out will continue running and when completed,
will be available to the user in the "View SeqAPASS Reports" tab.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Log out
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

Welcome to SeqAPASS
Version 4.0
Logged in as: Blatz,Donovan
Pop-up Messages
The Spinning Wheel pop-up is used as an indicator to alert the user that an action is taking place, where
the interface of the SeqAPASS tool is contacting the backend database. For example, upon clicking the
"SeqAPASS Run Status" tab, "Refresh Data" button, "View Level 2 Data" button, or "View Level 3
Data" button the Spinning Wheel will pop-up and disappear from the screen. There are multiple other
instances where the spinning wheel is used as an indicator to the user that an action is occurring.
Querying database ... Please wait

Pop-up messages are meant to guide the user to submit the correct information for a query, infonn the
user of a successful or failed query submission, or otherwise inform the user of an error. All pop-up
messages will appear for 10 seconds on the upper right-hand side of the screen, and then disappear. If the
79

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
user would like to close the message before the 10 seconds is up. click on the message and an 'V will
appear of the upper right-hand corner of the message box. Click the x to close the message.
In the "Request SeqAPASS Run" tab, Compare Primary Amino Acid Sequences "By Species" page, a
successful Level 1 query submission will display a pop-up message indicating that the query has been
submitted to the run queue or if "existing' message appears indicating that the accession has been ran
previously either by a user and is available to view.
i j Success
Submitted NP_064393.2:
submitted
OR
j Success
NP 000116.2: existing
User did not select any query proteins from the "Request SeqAPASS Run" tab, Compare Primary Amino
Acid Sequences "By Species" or "By Accession" page, and clicked "Request Run" button.
0 Error
Must select query
proteins
OR
(x) Error
Must enter NCBI
accession
If the user enters non-sense text (or any text that is not an NCBI accession) into the "NCBI Protein
Accession" text box for submitting a Level 1 query in the "Request SeqAPASS Run" tab, in the Compare
Primary Amino Acid Sequences "By Accession" page, and clicked "Request Run" button, the message
below will pop-up indicating that the Accession entered is not in the SeqAPASS database.
} Success
fgafgaf: not in database
In the "View SeqAPASS Reports" tab, Level 1 page, if a user clicks "View Level 2 Data," a successful
Level 2 query submission will display a pop-up message indicating that the query has entered the ran
queue.
J Level 2 Run Requested
Status queued
80

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
In the "View SeqAPASS Reports'' tab, Level 1 page, if a user selects a domain that has already been
submitted (but not completed) and clicks "Request Domain Run" a message for successful Level 2 query
submission will display a pop-up message indicating that the query has entered the run queue
{ Level 2 Run
Requested
Status Already run or
could not submit
In the "View SeqAPASS Reports" tab, Level 1 page, if a user clicks "View Level 2 Data" without
selecting a domain to view from the drop-down, the message below will pop-up to indicate that the user
must select a domain.
(X) Error
Must select domain from
drop-down
In the "View SeqAPASS Reports" tab, Level 1 page, a successful Level 3 query submission will display a
pop-up message indicating that the query has entered the run queue.
¦	1
p Level 3 Run Requested
Status queued
In the "View SeqAPASS Reports" tab, Level 1 page, if a user fails to type a user defined Level 3 Run
Name, the message below will pop-up to indicate that the user must do so.
(x) Error
You must specify a
Template Sequence and
Level 3 Run Name
In the "View SeqAPASS Reports" tab, Level 1 page, if a user fails to select species from the Level 1 Data
table to be compared with the template sequence, the message below will pop-up.
0 Error
You must select
sequences from the
Level 1 Data table to
request a Level 3 Run
81

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
In the "View SeqAPASS Reports" tab, Level 1 page, if a user fails to select a Level 3 Run Name from the
Choose Query to View drop-down and clicks the "View Level 3 Date" button, the message below will
pop-up.
0 Error
Must select level 3 run
from drop-down
In the "View SeqAPASS Reports" tab, "Level 3 Template Protein Information" data page, if a user fails
to select amino acid residues using the "Select Amino Acid Residues" shuttle and clicks the "View Level
3 Date" button, the message below will pop-up.
No Residues Selected
User must select
residues
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Documentation
Query Species: The selection of the query species for a SeqAPASS analysis is dependent upon the
question the user is addressing. For example, the query species can be the target species (i.e., human or
companion animal in the case of drugs; or insect, plant, fungus, or pest in the case of pesticides) or,
depending on the application of the susceptibility prediction, the query species may be a species known or
hypothesized to be sensitive to a chemical acting on the protein molecular target of interest. There may be
instances where a protein for the species of interest has not been sequenced, in this case it may serve the
users purpose to identify another taxonomically related species from the same organism Class, Order,
Family, or Genus as a surrogate query species. In certain cases, when there is interest in the susceptibility
of a particular species (e.g., honey bee) and in the case that there are numerous potential target species
(e.g., neonicotinoids are intended to cause mortality in a number of pest insects) the species of particular
concern may serve as the query species.
Query Protein: SeqAPASS can be queried with any protein sequence available in the NCBI protein
GenBank database, by protein name, or NCBI Accession. It is suggested that the user of SeqAPASS
examines their query protein and species in the NCBI protein database prior to submitting a run to
SeqAPASS (use NCBI link on query page). It is not uncommon for a protein of a specific species to be
represented by more than one sequence. In such cases there are some guiding principles for identification
of the best sequence available for the SeqAPASS run.
General guidelines: These guidelines describe best practices for identifying the most useful sequence for a
species susceptibility prediction in SeqAPASS, however, in some cases, limited sequence information is
available and therefore less desirable sequences may be used. It is up to the user of SeqAPASS to
recognize the quality and limitations of the sequence chosen for the SeqAPASS query. The information
about a particular protein can be found on the Protein page in the NCBI database
(http: //www .ncbi .nlm .nih. gov/protein/).
82

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
http: //www. ncbi .nlm .nih .gov/protein/
3 Home - Protein - NCBI
C | | 0 * Google
3
www, ncbi, nlm. nih, gov/protein/
P. Most Visited Getting Started i J Customize Links Windows Marketplace
NCBI Resources © How To ©
Protein
1 Protein	vjj j androgen receptor, homo sapiensf

Help

I Protein
The Protein database is a collection of sequences from several sources, including translations from annotated coding
regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein sequences are
the fundamental determinants of biological structure and function.
I
Using Protein
Quick Start Guide
FAQ
Help
GenBank FTP
RefSeq FTP
Protein Tools
BLAST
LinkQut
E-Utilities
Blink
Batch Entrez
Other Resources
GenBank Home
RefSeq Home
CDD
Structure
Search for a protein of interest using protein name and/or species of interest: For the example above,
multiple hit proteins were identified.
J NCBI Resources© How To©
Protein
1 Protein v androgen receptor, homo sapiens
Save search Advanced
E3E3
Help
Show additional filters
Species
Animals
Fungi
Bacteria
More ...
Enzyme types
Ligases
Oxidoreductases
Source
databases
DDBJ
EMBL
GenBank
PDB
PIR
RefSeq
UniProtKB / Swiss-Prot
Sequence length
Custom range. .
Molecular
weight
Custom range. .
Release date
Custom range...
Revision date
Custom range...
Display Settinus: R Summary, 20 per page, Sorted by Default order
Results: 1 to 20 of 540
Page |l | of 27 Next > Last
Send to: R Fillers: Manage Filters
Top Organisms ITreel
~	RecName: Full=Androgen receptor. AltName. Full=Dihvdrotestosterone receptor. AltName.
1-	Full=Nuclear receptor subfamily 3 group C member 4
919 aa protein
Accession: P10275.2 Gl: 113830
GenPept FASTA Graphics Related Sequences Identical Proteins
~	androgen receptor |Homo sapiens!
2-	917 aa protein
Accession: AAA51772.1 Gl: 178882
GenPeot FASTA Graphics Related Sequences Identical Proteins
~	androgen receptor, partial |Homo sapiensl
3-	2 aa protein
Accession: MD14959.1 Gl: 4262811
GenPeot FASTA Graphics
~	androgen-receptor [Homo sapiens]
4-	906 aa protein
Accession: AAA51780.1 Gl: 179034
GenPeot FASTA Graphics Related Sequences Identical Proteins
~	androgen receptor |Homo sapiensl
5-	917 aa protein
Accession: AAA51771.1 Gl: 178872
GenPeot FASTA Graphics Related Sequences Identical Proteins
~	androgen receptor |Homo sapiens)
Homo sapiens (531)
Aspergillus niger (4)
Chlorocebus aethiops (1)
Cardiobacterium valvarum F0432 (1)
Streptococcus pneumoniae MNZ41 (1)
All other taxa (2)
'More...
Find related data
Database: | Select
Search details
androgen receptor[All Fields] AND
("Homo sapiens"[Organism] OR homo
sapiens[All Fields])
Recent activity
Turn Off Clear
Q, androgen receptor, homo sapiens (540)
Pm;
Select one of the proteins by clicking on the link shown above to see detailed information about the
protein
83

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
% NCBI Resources© How To©


Sian in to NCBI




Protein I Protein
[



Advanced

Help
Display Settings: 0 GenPept
androgen receptor [Homo sapiens]
GenBank: AAA51771.1
FASTA Graphics
Send to: R
Go to: R
LOCUS
DEFINITION
ACCESSION
VERSION
DBSOURCE
KEYWORDS
SOURCE
ORGANISM
REFERENCE
AUTHORS
TITLE
JOURNAL
PUBMED
REFERENCE
AUTHORS
JOURNAL
PUBMED
COMMENT
FEATURES
source
PRI 31-0CT-1994
AAA51771	917 aa
androgen receptor [Homo sapiens]
AAAS1771
AAA51771.1 GI:178872
locus HUMARA accession M21748.1
Homo sapiens (human)
Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
1 (residues 1 to 917)
Tilley,W. D., Marcelli,M., Wilson,J.D. and McPhaul,M.J.
Characterization and expression of a cDNA encoding the human
androgen receptor
Proc. Natl. Acad. Sci. U.S.A. 86 (1), 327-331 (1989)
2911578
(si
s)
Marcelli,M., Tilley,W.D., Wilson,C.M., Griffin,J.E., Wilson,J.D.
and McPhaul,M.J.
Definition of the human androgen receptor gene structure permits
the identification of mutations that cause androgen resistance:
premature termination of the receptor protein at amino acid residue
588 causes complete androgen resistance
Hoi. Endocrinol. 4 (8), 1105-1116 (1990)
2293020
[2] sites; androgen resistant mutation.
Draft entry and computer-readable sequence [1] kindly submitted by
M.J. McPhaul, 09-DEC-1988.
Method: conceptual translation.
Location/Qualifiers
1..917
	/organism=rfHoroo sapiens"	
Change region shown
Customize view
Analyze this sequence
Run BLAST
Identify Conserved Domains
Highlight Sequence Features
Find in this Sequence
Protein 3D Structure
Targeting The Binding
Function 3 (bf3) Site Of The
Human Androgen Receptor
PDB: 4HLW
Source: Homo sapiens
Method: X-Ray Diffraction
Resolution: 2.5 A
See all 54 structures-
Articles about the AR gene
Repression of cell proliferation and androgen
receptor activity in prostat [Anticancer Res. 2013]
TALEN-engineered AR gene rearrangements
reveal endocrim [Proc Natl Acad Sci USA. 2013]
Androgen receptor (AR) positive vs negative roles
in prostate cancer cell d [Cancer Treat Rev. 2014]
Identical proteins for AAA51771.1
Guiding principles: On the NCBI protein page, rows to examine include: "DEFINITION,"
"REFERENCES," COMMENTS," and "FEATURES." The information provided in these rows can aid a
SeqAPASS user in the identification of an ideal query sequence for SeqAPASS.
It is desirable to:
a.	Use accessions with the following prefix: NP_
b.	Avoid use of protein sequences labeled "partial," "PREDICTED," "PROVISIONAL," "INFERRED,"
or "hypothetical"
c.	Avoid using those labeled "TPA" (Third Party Annotation), however if TP A is all that is available
"TPA: experimental" would be preferred over "TPA: inferential"
d.	Look at the date associated with the protein in the "LOCUS" row of the detailed protein page. A more
recent date can have the most up-to-date annotation of the protein. Under the "DBSOURCE" row of the
detailed protein page other accessions associated with past protein sequences can be viewed. Many times,
if the "xrefs" row is heavily populated and has the most recent annotation update date, it is likely to be the
best sequence to use as a query sequence in SeqAPASS.
d.	Short sequences should be avoided when possible as query sequences. Many times, if one selects the
protein from the protein output derived from the NCBI protein database query, they will find that the
short sequence is actually a partial sequence described in the "DEFINITION" row of the Protein page.
e.	Unless there is reason for doing so (based on the question the user is trying to address), splice-variants
labeled in "FEATURES" rows of the Protein page as "alternatively spliced" would be less desirable
f.	It is important to check the references associated with the selected query protein. In some cases, certain
sequences are associated with sensitivity to a given chemical. This can be particularly useful when
predicting susceptibility to pesticides, where certain strains of insects are produced to be readily sensitive
or insensitive to a chemical.
84

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
g. A secondary check of the sequence used in the SeqAPASS run would be to look at the output derived
and see whether ortholog candidates were detected. Ideally a preferential sequence would have more
ortholog candidates identified.
Important Note: To identify which query protein has the greatest number of Ortholog Candidates the user
can choose to submit multiple proteins with the same species and protein. Upon the Level 1 runs
completing for those similar proteins, the user can then select the "View SeqAPASS Reports" tab and
look at the table for "Ortholog Count" the protein with the highest number is likely to be the most
appropriate query species for a SeqAPASS evaluation.
Example: Androgen receptor, Homo sapiens
3[
Display Settings: GenPept
androgen receptor [Homo sapiens]
GenBanklAAA51771.11
DEFINITION
ACCESSION
VERSION
DBSOURCE
KEiTOORDS
SOURCE
ORGANISM
ffcA51771	917
ndrogen receptor [Homo sapiens] .
AAA51771
AAA51771.1 &I:178872
locus HIMM& accession M21748.1
FRll 31-0CT-1994
sapie:
JLiEIH
: (tnifilXi)
; Primates
: Hj.pl orrhini ;
Eukaryota; Metasoa; Chordata; Crii
Majrmalia; Eutheria; Euarchontogli:
fitj.rrhi.rii ; Hcminidae; Homo.
(residues 1 to 917)
Tilley,W.D. , Hireelli ,-H. , Wilson,J.D. and McPhaul,M.J.
Characterisation and expression of a cDHA encoding the buman
JOURNAL
FUBJED
REFERENCE
AUTHORS
JOURNAL
pljHTrri
. Acad. Sci. U.S.A. 86 (1), 327-331 (1989)
(s i
:*)
Marcelli,M., Tilley,W.D., Mi:
and McPhaul,M.J.
Definition of the hwian andr>
the identification of nnutati>
premature termination of the
588 causes ccmplete androgen resistance
Hoi. Endocrinol, 4 (8), 1105-1116 (1990)
J293020
,C.M., (rriffin,J.E. , Wilson,J.D.
that cause androger
:eptor protein at an1
[2]
"TJraft entry and computer-readable
M.J. McPhaul, 09-DEC-1988.
fethod: conceptual translation.
Loc ation/Quali j
eguence [1] kindly submitted by
Protein
Region
1..917
/organism="Hcmo sapiens"
/db_xre f = "taxom:9606"
/map="Xqll.2-ql2"
/sex="male"
/ti ssue_type = "prostate"
1..917
/product1"androgen receptor"
6..446
/re gi on_name = "Androgen_re c ep"
/note="Androgen receptor; pfam02166"
/db_xref="CUD:111097"
552..633
/re gi onjname = "NR_DBD_AR"
/note="DNfc-binding domain of androgen receptor (AR) is
composed of two C4-type zinc fingers; cd07173"
/db_xref = "CDD: 143547"
order(557,56t,574,577,593,599,609,612)
/s ite_type-"other"
/note = "zinc binding site [iori binding]"
/db_xref="CUD:143547"
order(566..569,576,578..579,582..583,591,606..607,610,613)
/ s ite_type = "DNfc binding"
/note:"DNA binding site [nucleotide binding]"
/db_xref="CDD:143547"
order(592..596 ,598..600 ,605 ,608)
/s ite_type = "othe r"
Change region sho\m
Customize view
Analyze this sequence
Run BLAST
Identify Conserved Domains
Highlight Sequence Features
Find in this Sequence
Protein 3D Structure
Targeting The Binding
Function 3 (bf3) Site Of
"The Human Aidrogen
PDB: 4HLDV
Method: X-Ray
1 Diffraction
Resolution: 2.5 A
see all 51 strictures...
Articles about the AR gene	~
Repression of cell proliferation and androgen
receptor activity in pre [.Anticancer Res. 2013]
TALEN-engineered AR gene rearrangements
reveal endo [Proc Natl /tad Sci U S A 2013]
Androgen receptor (AR) positive vs negative
roles in prostate car [Cancer Treat Rev. 2014]
Identical proteins for AAA51771.1
androgen receptor [Homo sapiens]
[AA061772]
See all...
Pathvrays for the AR gene
Integrated Breast Cancer Pathway
SIDS Susceptibility Pathways
Nuclear Receptors
Reference sequence information	*
RefSeq genomic sequence
Seethe genomic reference sequence for the
AR gene (NG_009014.2).
RefSeq protein isoforms
See 4 reference sequence protein isoforms
for the AR gene.
85

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Example cont:
/note="dimer interlace
/db_xre i-"CDD:143S47 "
[polypeptide binding]"
670.
.915
/ re gi on_na/ne = "HE_LED_AE"
/note:"Ligand binding domain of the nuclear receptor
androgen receptor, ligand activated transcription
regulator; cd07073"
/db_xref="CDD:134750"
order(699,704..703,705..706,70S,739..740,743..744,747,750,
764,770,705,071,075)
/site_type s "othe r"
/note = " 1 igand binding site [chemical binding]"
/db_xref="CDD:1347S0"
order(701,714,710,744,740 ,734,736,091..094,095..096)
/j ite_type-"othe r"
/note="coactivator recognition site [polypeptide binding]"
/db_xre i-"CDD:134750"
1..917
/gene = "AR"
/code d_by="M417 46.1:163..4 916"
/db_xref="GDB:&00-140-556"
1
mevqlglgrv
61
qqqqqqqqqq
141
echpergcvp
101
iljea;tmql
441
jvimglgvea
301
edtaeyjpfk
361
yynfplalag
441
gjp* aaa.* _p.
401
tippqglagq
541
rdhvlpidyy
601
idkfrrkncp
661
hiegyecqpi
741
pgf mlhvdd
701
qcvimrhliq
041
ackrknptjc
901
vpkiljgkvk
yprppjktyr
qqqqqqqet;
epgaavaa;k
lqqqqqeav*
lehl jpgeql
ggytkglege
pppppppphp
jwhtlftaee
es dftapdw
fppqktclic
jcrlrkcyea
gnaviqyjcwt
e i 9»1 qitpq
jrrfyqltkl
piyfhtq
galqnlfq$v
prqqqqqqge
gipqqipapp
rgdcmyapll
jlgcjgjaaa
hariklenpl
gqlygpcggg
ypgyfiv; rvp
gdea;gchyg
^ntlgarklk
gwcaghdrtn
glnrvi amg#r
el'lcmkalll
ldjvqpiare
reviqnpgpr
dgjpqahrrg
dedd;aap;t
rjgaptjjkd
gvppavrptp
gjjgtlelpj
dygjawaaaa
9999999999
ypjptcvkje
altcgjckvf
klgnlklqee
qpdjfaalli
iftnvnjiml
ijiipvdglk
lhqftfdlli
hpeaajaapp
ptgylvldee
1jllgptfpg
nylggtitii
c apl ae ckgj
tlilykjgal
aqc rygdlaj
9999999999
mgpomdiyig
fkraaegkqk
geajJttipt
jlnelgerql
yfapdlvfne
nqkffdelrm
kjhmvivdfp
ga;llllqqq
qqp;qpq;al
lijcjadlkd
dnakelckav
lldd;agk;t
deaaayq;rd
lhgagaagpg
eagavapygy
pyginrleta
yl c a.< rndct
e ettqkltvi
vhwkwakal
yrmhkjirnyi
nyikeldrii
eriTnae i i j vq
More about the AR gene	I*
The androgen receptor gene is more than 90
kb long and codes for a protein that has 3
major functional domains: the N-terminal
domain, DNAb...
Aso Known As: RP11-383C12.1. AIS. DHT...
Homologs of the AR gene
The AR gene is conserved in Rhesus
monkey, dog, c
:e. rat, and chicken.
Link Out to external resources
A selection of literature about the proteins
[GoPubMed Proteins]
Transcript/Protein Information
[PANTHER Classification System]
Transcript/Protein Information
[PANTHER Classification System]
biochemicals
[Enact Aitigen/Labome]
antibody review
others
antibody
cDNA clone
protein and peptide
ELI S A and assay kit
[Exact Anigen/Labome]
[Exact Artigen/Labome]
[Exact Aitigen/Labome]
[EaactAitigen/Labome]
[Exact Aitigen/Labome]
[Exact Aitigen/Labome]
h. If multiple proteins appear to be the best query protein for SeqAPASS, the sequences can be aligned
using NCBFs COBALT. Enter (copy and paste from NCBI protein search list) accessions and align.
O COBALT

My NCBI 1
usssn
Home Recent Results Help



Cobalt Constraint-based Multiple Protein Alignment Tool

Enter Query Sequences
Enter at least 2 piotein accessions, gis, 01 FASTA sequences dg>
COBALT computes a multiple protein sequence Alignment using conseive
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Alignment page will be generated
O COBALT
Constraint-based Multiple Alignment Tool
MyNCBI
Home Recent Results Help

m|[SiDownload
- Cobalt RID EMV7SF1X211 (7 seqs)

All queries form only one cluster. No domain information was used for generating constraints. Decreasing maximum in-cluster distance or turning off guery clustering option may improve
results.
T Descriptions 0 Select All Re-a?tgn ') >Alignment parameters
Legend for links to other resources: E UniGene B GEO ~ Gene O Structure d Map Viewer
Accession
Description
JL
0 P10275.2
0 AAAM77? 1
0 AA/W17RTI 1
0 AAA51771.1
0 AAA51729 1
0 AAD45921.1
0 AAA518B6.1
RecName: FulNAndrogen receptor; AltName: Full=Dihydrotestosterone receptor; AltName: FulNNucle M ,'J
androgen receptor [Homo sapiens] >gb|AAA51771.11 androgen receptor [Homo sapiens]	E
androgen-receptor [Homo sapiens]	M.'.l
androgen receptor [Homo sapiens] >gb|AAA51772.11 androgen receptor [Homo sapiens]
androgen receptor [Homo sapiens] >sp|P10275.2|ANDR_HUMAN RecName: FulNAndrogen receptor; '^PubChem BioAssav Info linked to AAA51729.1
androgen receptor [Homo sapiens]	M I
androgen receptor [Homo sapiens] >sp|P10275.2|ANDR_HUMAN RecName: FulNAndrogen receptor;	HPubChem BioAssay Info linked to AAA51386.1
~ Alignments 0 Select All fBHHKI ' Mouse over the sequence identiferfor sequence title
View Format: | Compact ^ & Conservation Setting: | 2 Bits v w,
0P1O275
1
0AAA51772
1
0AAA51780
1
0AAA51771
1
0AAA51729
1
0AAD45921
1
0AAA51886
1
0P1O275
81
0AAA51772
80
0AAA51780
76
0 AAA51771
80
MEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLGQQQQQQQQQQQQQQQQQQQqET	80
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQ-ET	79
HEVQLGLGRVYPRPP3KTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQ0QQQQQQQQ	ET	75
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQOQQQQOQQQQOQQQQQ-ET	79
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET	80
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQMPGPRHPEAASAAPPGA5LLLLQQQQQQQQQQQQQQQQ	ET	75
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQHPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET	80
To evaluate sequences, change settings for "Conservation Setting"' from "2 Bits"' to "Identity"'
p. COBALT
Constraint-based Multiple Alignment Tool
MyNCBI
Home Recent Results Help


Phvloqenetic Tree Edit and Resubmit >Download
- Cobalt RID EMV7SF1X211 {7 seqs)
All gueries form only one cluster. No domain information was used for generating constraints. Decreasing maximum in-cluster distance or turning off guery clustering option may improve
results.
T Descriptions 0 Select All Re-align >Alianment parameters
Legend for links to other resources: e UniGene Q GEO e Gene Structure Map Viewer
Accession
Description
Links
0 P10275.2
0 AAA51772.1
0 AAA51780.1
0 AAAS1771.1
0 AAA51729.1
0 AAD45921 1
0 AAA51S86.1
RecName: FulNAndrogen receptor; AltName: FulNDihydrotestosterone receptor; AltName: FulNNucle
androgen receptor [Homo sapiens] >gb|AAA51771.11 androgen receptor [Homo sapiens]
androgen-receptor [Homo sapiens]
androgen receptor [Homo sapiens] >gb|AAA51772.1| androgen receptor [Homo sapiens]
androgen receptor [Homo sapiens] >sp|P10275.2|ANDR_HUMAN RecName: FulNAndrogen receptor;
androgen receptor [Homo sapiens]
androgen receptor [Homo sapiens] >sp|P10275.2|ANDR_HUMAN RecName: FulNAndrogen receptor;
E
M .'i I
M.'.l
LSluJPubChern BioAssav Info linked to AAA51729.1
M.'.l
EPubChem BioAssav Info linked to AAA51886.1
~ Alignments 0 Select All Re-afign
View Format: | Compact j*
Mouse over the sequence identiferfor sequence title
0P1O275	1
0AAA51772 1
0AAA51780 1
0 AAA51771 1
0AAA51729 1
0AAD45921 1
0AAA51886 1
MEVQLGI
MEVQLG!
HEVQLGL
Conservation Setting: 2 Bits
1	Bit
2	Bits
3	Bits
®vyprppsktyrgafqnl| 4 gits
lltRVYPRP P SKTYRGAFQNL IfiHflllB
f'GPPHI CAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQOqET 80
'GPPHIiAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQ-ET 79
wiriiiIaasmppgasllllqqqqqqqqqqqqqqoq-
-ET	75
HEVQLGLGRVYPRPPSKryRGAFQHLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQ-ET	79
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAA.PPGASLLLLQQQQQQQQQQQQQQQQQQQQqET	80
MEVOLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQ	ET	75
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET	80
87

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Look for differences in the sequence (e.g., conserved residues, gaps) and start by eliminating sequences
that have gaps.
i. If, after the suggested evaluations of the proteins are performed, questions remain as to which sequence
would be best to run in SeqAPASS, run all relevant sequences in SeqAPASS for the evaluation. The
individual residue differences between commonly named sequences will become most important when
evaluating residues known to be important for binding the chemical or activating the protein (Level 3
SeqAPASS analysis). After completing the SeqAPASS run, select the data that has the greatest number of
ortholog candidates for your evaluation of conservation and further predictions of cross species
susceptibility. Depending on the protein of interest, multiple subunits may be associated with a protein. In
this case, all relevant subunits can be queried using SeqAPASS.
Level 1 Calculated Percent Similarity
The SeqAPASS algorithms submit the query to NCBFs standalone BLASTp (using default settings,
including BLOSUM-62 matrix), which aligns the query protein with all proteins available in the NCBI
protein database and provides a variety of metrics associated with each pairwise alignment between the
query and hit sequences. SeqAPASS selectively captures output from BLASTp, including one sequence
per species with the highest bit score. Detailed descriptions of metrics derived from BLASTp (e.g.,
BLASTp Bitscore, E-Value, Positives, Identity, Hit length) can be found in:
The NCBI Handbook: (http://www.ncbi.nlm.nih.gov/books/NBK21106/);
BLAST® Help: (http://www.ncbi.nlm.nih.gov/books/NBK62051/) and the
NCBI Glossary Field Guide: (http://www.ncbi.nlm.nih.gov/Class/FieldGuide/glossary.html)
The top row of the Level 1 data corresponds to the queried protein selected by the user. For each sequence
queried, the Level 1, top row query sequence is used to determine the maximum bitscore for the analysis,
which is derived from aligning the query sequence to itself using BLASTp. To calculate percent
similarity, the bitscore for each hit sequence is normalized to the maximum bit score and then multiplied
by 100.
Note: SeqAPASS v2.0 and newer parse the BLASTp query and hit accessions to identify all the
species/accessions from the identical proteins. Therefore, if a hit sequence represents multiple species, all
species with the identical sequence will be found in the data table for Level 1 and Level 2. To determine
which sequence/species was identified from BLASTp as a hit and which sequence/species was parsed
from the identical sequence, view the "Full Report" for Level 1 or Level 2, column "Identical Protein,"
Where "N" is indicative of the original hit sequence and "Y" is the parsed sequence.
Common Domain Count
Reversed Position Specific BLAST (RPS BLAST) is used to compare each query and hit sequence to
conserved domains defined in NCBIs Conserved Domain Database. A hit domain is considered in
common with the query domain if it contains the same domain accession as the query and it aligns with
the NCBI curated domain with the same or greater amino acid residue coverage than the query sequence.
88

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Ortholog Candidate Identification
Ortholog sequences are those that have diverged from a speciation event and therefore are more likely to
maintain similar function. SeqAPASS uses reciprocal best hit (RBH) BLAST for ortholog detection by
automatically comparing each hit protein to all protein sequences available for the query species and if the
original query protein or one of its identical protein matches is identified to by the best match to the hit or
maintain the same bitscore, then the hit sequence would be considered an ortholog candidate. The
sequence is indicated an Ortholog Candidate or not with a yes (Y) or no (N) in the column.
Note: Many NCBI protein accessions represent multiple identical protein sequences in the BLASTp
output. This is due to BLASTp querying and presenting data from the non-redundant protein database.
Sometimes the identical sequences are from different species. This can be checked by following the link
for the top row "NCBI Accession" in the table to the NCBI protein page. Below the protein name
[species] title will be a link to "Identical Proteins/'
Click the "Identical Proteins'' link and look for a sequence in the list from the user defined query species.
1 % NCBI Resources © How To 0

Siqn in to Ncl



Protein Protein


Advanced

He
© NCBI is phasing out sequence Gl numbers in September 2016. Please use accession.version! Read more...


GenPept^
Send to: ~

Change region shown

estrogen receptor isoform 1 [Homo sapiens]
Customize view

NCBI Reference Sequence: NP_000116.2

Identical Proteins FASTA Graphics


AnaltiTa thic eannanra
Note: If the top hit is a Protein DataBank (PDB) code (e.g., 1AHRA) from RBH BLAST there will be
no ortholog candidates identified. BLASTp when ran against all accessions for a given species does not
return PDB codes. It is recommended that the user identify a similar/identical sequence to the PDB code
and use that sequence as the query sequence.
Susceptibility cut-off
The susceptibility cut-off values listed on the "Level 1 (and Level 2) Susceptibility Cut-off' page are
determined by plotting the % similarity data from the "Primary Report" or "Full Report" and identifying
the local minimums in the data. The default cut-off is determined by taking the 1st local minimum and
moving up in percent similarity until the next ortholog candidate is found. The susceptibility cut-off
displayed in the list is the percent similarity of the identified ortholog candidate.
Criteria for Susceptibility Prediction (when "Primary> Report Settings " is set to "Species Read-Across: "
Yes)
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate = Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
89

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
If the hit sequence is below the susceptibility cut-off but belongs to any organism class found above the
susceptibility cut-off, the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for
"yes". This criterion allows susceptibility predictions to be made across taxonomic groups based on the
likelihood that the sequences above the cut-off are better matches to the query.
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate = N, for "no,") and does not belong to any organism class found above the susceptibility cut-
off, the hit is predicted to not be susceptible; therefore, Susceptibility Prediction = N for "no"
Note that the "Primary Report" may yield different Susceptibility Predictions than the "Full Report," as
the predictions are based on the data in the different reports. The Primary Report is filtered to only display
E-value <0.01 and Common Domain Count > 1.
Criteria for Susceptibility Prediction (when "Primary Report Settings " is set to "Species Read-Across: "
No)
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate = Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate = N, for "no,"), the hit is predicted to not be susceptible; therefore, Susceptibility Prediction =
N for"no"
Level 2 Calculated Percent Similarity
Data obtained from the Level 1 RPS BLAST evaluation is used to assign sequence ranges that aligned
with a user selected domain (from the NCBI CDD database) to each accession from the Level 1 Full
report. BLASTp is then used to align the query domain range to each hit domain range. The percent
similarity is calculated based on the bit scores from the BLASTp alignment of the domain regions. For
each sequence queried, the Level 2, top row query species is used to determine the maximum bitscore for
the analysis, which is derived from aligning the query sequence to itself using BLASTp. To calculate
percent similarity, the bitscore for each hit sequence is normalized to the maximum bit score and then
multiplied by 100.
Susceptibility cut-off (same method as used in Level 1)
The susceptibility cut-offs listed on the "Level 2 Susceptibility Cut-off' page are determined by plotting
the % similarity data from the "Primary Report" or "Full Report" and identifying the local minimums in
the data. The default cut-off is determined by taking the 1st local minimum and moving up in percent
similarity until the next ortholog candidate is found. The susceptibility cut-off displayed in the list is the
percent similarity of the identified ortholog candidate.
90

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 11/5/2020; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 2 Criteria for Susceptibility Prediction (when "Primary Report Settings " is set to "Species Read-
Across: " Yes)
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate = Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off but belongs to any organism class found above the
susceptibility cut-off, the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for
"yes". This criterion allows susceptibility predictions to be made across taxonomic groups based on the
likelihood that the sequences above the cut-off are better matches to the query.
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate = N, for "no,") and does not belong to any organism class found above the susceptibility cut-
off, the hit is predicted to not be susceptible; therefore, Susceptibility Prediction = N for "no"
Note that the "Primary Report" may yield different Susceptibility Predictions than the "Full Report," as
the predictions are based on the data in the different reports. The Primary Report is filtered to only display
E-value <0.01 and Common Domain Count > 1.
Level 2 Criteria for Susceptibility Prediction (when "Primary Report Settings " is set to "Species Read-
Across: " No)
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate = Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate = N, for "no,"), the hit is predicted to not be susceptible; therefore, Susceptibility Prediction =
N for "no"
Level 3 Sequence Alignments
COBALT is used to align all user selected sequences (from Level 1 hits) with a user defined template
sequence. Because COBALT algorithms align all sequences, it is recommended that the user align the
template sequence with sequences that are most similar to one another. To capture the most similar
sequences from the SeqAPASS data it is recommended that the user filter the Level 1 data by taxonomic
group and step through the Level 1 data pages one by one while selecting sequences. It is recommended
that the user look at the name of the sequence and exclude 'partial" sequences when possible. Requesting
a query from one taxonomic group at a time, breaks the data down in manageable alignments.
Selecting Amino Acid Residues to Align
The user may select up to 50 amino acid residues to compare across selected species in Level 3.
91

-------