EPA/6Q0/R-19/156
Sequence Alignment to Predict Across
Species Susceptibility
(SeqAPASS)
VERSION 4.0
trppqglagq
rdhvlpid"$rft
idkf
hiecr/Aw r J
User Guide
Authors:
Carlie A. LaLone
Donovan J. Blatz
Colin P. Finnegan

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) User Guide
Quick Notes: Use Chrome for optimal performance and PLEASE DO NOT submit more than 10 Level 1
queries at a time. Wait until they run to completion prior to submitting more.
Table of Contents
Background	page 2
Accessing SeqAPASS	page 3-4
Returning Users (page 3)
First Time Users (page 4)
Messages from the SeqAPASS Development Team	page 4
SeqAPASS Home Tab	page 5
Request SeqAPASS Run Tab	page 5-11
Identify a Protein Target (page 6)
Query "By Species " (page 7)
Query "By Accession " (page 10)
SeqAPASS Run Status	page 12-13
View SeqAPASS Reports	page 14-19
View Report (page 15)
Save Report(s) (page 15)
Level 1: Primary Amino Acid Sequence Alignment	page 20-26
Primary Report Settings (page 22)
Susceptibility Cutoff Box for Level 1	page 26-29
No Ortholog Candidate (page 28)
Level 2: Functional Domain(s) Alignment	page 30-32
View Level 2 Data Page	page 32-37
Primary Report Settings (page 35)
Susceptibility Cutoff Box for Level 2	page 38-41
No Ortholog Candidate (page 40)
Level 1 and Level 2 Data Visualization	page 41-50
Level 1 and 2 Information Page (page 43)
Level 1 and 2 BoxPlot Page - Controls (page 44)
Level 3: Individual Amino Acid Residue Alignment	page 51-60
View Level 3 Individual Amino Acid Query and Data Page	page 61-66
Level 3 Data - Primary Report (page 64)
Level 3 Data - Full Report (page 65)
Moving Between Level 1, Level 2, and Level 3 Data Pages	page 67
Search, View, and Download Data Tables	page 67-68
Log out	page 69
Pop-up Messages	page 69-72
SeqAPASS Documentation	page 72-81
1

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Background
The SeqAPASS tool has been developed to predict across species relative intrinsic susceptibility
to chemicals with known molecular targets (e.g., pharmaceuticals, pesticides) as well as evaluate
conservation of molecular targets from high-throughput screening assays (i.e., U.S. Environmental
Protection Agency ToxCast Program) and molecular initiating events (MIEs) and early key events in the
adverse outcome pathway framework, as a means to extrapolate such knowledge across species. The term
"relative" is used because it is recognized that molecular target similarity is one consideration, though an
important one, for making predictions of susceptibility to a chemical. Other important considerations for
susceptibility that are not evaluated using the SeqAPASS methodology include how well a chemical is
absorbed, distributed, metabolized, and eliminated, life stage, and other life history traits. Also, "relative"
indicates that the determination of sequence similarity between proteins is based on comparison to a
single protein sequence for a specific species. Additionally, we describe "intrinsic susceptibility" as the
vulnerability (or lack thereof) of an organism to chemical perturbation due to its inherent biological
composition.
Cross-species comparisons of proteins can be conducted through examination of sequence and
structural information, depending on how well the protein has been characterized and what is known
about a chemical-protein interaction. SeqAPASS allows the user to assess various levels of protein
sequence detail across species including comparisons of primary amino acid sequence (including ortholog
detection), functional domain(s), and individual amino acid residue positions. Each level requires a
greater understanding of the protein and its interaction with a chemical of interest (or similar ligand).
Because human and veterinary drugs, as well as pesticides, are designed to act specifically on well
characterized molecular targets, these chemical classes have proven useful for demonstrating the utility of
the SeqAPASS tool and its application to various hazard assessment/research scenarios.
The pertinent information necessary to begin a SeqAPASS query includes: the identification of a single
(or multiple) query species and a query protein, which would be the molecular target(s) of interest (e.g.,
receptor or enzyme).
The SeqAPASS algorithms mine, collect, and collate information from the National Center for
Biotechnology Information (NCBI) protein database (http://www.ncbi .nlm .nih.gov/protein/). conserved
domains database (http ://www.ncbi .nlm .nih. gov/cdd/). taxonomy database
(http://www.ncbi .nlm .nih.gov/taxonomv/). strategically utilizes the Stand-Alone Basic Local Alignment
Search Tool for proteins (BLASTp;
http://blast.ncbi .nlm .nih. gov/Blast,cgi?CMD=Web&PAGE TYPE=BlastDocs&DOC TYPE=Download
and the Constraint-based Multiple Alignment Tool (COBALT; http://www.st-
va.ncbi.nlm.nih.gov/tools/cobalt/re cobalt.cgi).
2

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Accessing SeqAPASS
For optimal SeqAPASS performance use Chrome
Access SeqAPASS using the following URL: https://www.seaapass.eDa.gov/seaapass/
Returning Users
Click "l ogin""
New to SeqAPASS Version 4 (See user guide for more details)
•	New EPA compliant login through the Web Application Access
•	Integrated information and help buttons
•	Links to guide user to an appropriate query protein
•	Level 1, Level 2, and Level 3 data summary reports
•	Interoperability with the ECOTOX Knowledgebase to compare sequence-based susceptibility predictions to existing empirical toxicity data
•	Expedited identification of literature to support Level 3, critical individual amino acid residue, comparisons using Reference Explorer
•	Ability to create Level 3 Data reports with combined taxonomic groups
•	Seleno-cysteine (U) added to Level 3, critical individual amino acid residue comparisons
Log In to SeqAPASS	Version 4.0
Welcome to SeqAPASS
o I
Login

| For optimal SeqAPASS performance use Chrome ©
Want an account? Click here for instructions.

Select either '"Login with EPA LAN User ID & Password" or "Login with PIV card" with two step
verification to login.
h ft t\
fcSSZJ
EPA Enterprise Authentication
Login with...
EPA LAN User ID & Password
PIV Card
0
Login with your
PIV
Remember to plug in
your PIV card
Login with EPA LAN User
ID & Password
Login with PIV card
3

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
First time users
To request a username and password to access the SeqAPASS tool, select "here" below the login and
follow the directions 011 the next page. The directions are different for the internal EPA user versus the
external non-EPA user, however the user type does not limit access to the tool. Everyone that requests an
account will be given one in a timely manner. Individual account allows users to store all previous
SeqAPASS runs. Once the user has obtained their username, external users will select "Login with EPA
LAN User ID and Password."
EPA Users
1.	Go to https://waa.epa.aov and login with your existing EPA LAN id and password.
2.	Under the "Community Access" menu, select "Request Web Community Access"
3.	Select the "SeqAPASS Users" community and click submit.
4.	Return to the SeqAPASS login page to access SeqAPASS
External Users
1.	Go to https://waa.epa.gov and click on the "Self Register" link.
2.	Fill out the form using the following EPA Contact information:
o EPA Contact Name - Carlie Lalone
o EPA Contact's Email Address - lalone carlie@epa.gov
o EPA Contact's Phone Number 218 529-5038
3.	Select the "SeqAPASS Users" community from the dropdown menu at the bottom of the page.
4.	Once you submit the form you will receive an email confirming your request and a follow-up email with your username once
your account has been activated.
On the Log in screen the user will provide the necessary Login information:
EPA User: EPA LAN User ID & Password or PIV card with two step verification
External User: Username and Password
Upon creating your password, login to SeqAPASS as described above for Returning Users. To change a
password at any time, go to waa.epa.gov and select "User Profile" to reset. The user will then use the new
password to login.
Messages from the SeqAPASS development team
Look for messages about planned version releases, data updates, and/or fixes to the SeqAPASS tool.
These will occasionally be displayed below the SeqAPASS banner when the development team has
information to share with SeqAPASS users.
Sequence Alignment to Predict Across Species
Susceptibility (SeqAPASS)
New to SeqAPASS Version 4 (See user guide for more details)
•	New EPA compliant login through the Web Application Access
•	Integrated information and help buttons
•	Links to guide user to an appropriate query protein
•	Level 1, Level 2, and Level 3 data summary reports
•	Interoperability with the ECOTOX Knowledgebase to compare sequence-based susceptibility predictions to existing empirical toxicity data
•	Expedited identification of literature to support Level 3r critical individual amino acid residue, comparisons using Reference Explorer
•	Ability to create Level 3 Data reports with combined taxonomic groups
•	Seleno-cysteine (U) added to Level 3: critical individual amino acid residue comparisons
4

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
SeqAPASS Home Tab
The "Home" tab indicates who is logged in to the tool (right-hand of the screen) and contains links to
obtain information about the SeqAPASS tool (About SeqAPASS), including contact information for
support and references to published articles describing the SeqAPASS tool and its applications. Other
relevant references to databases and tools are also referenced. A link to the SeqAPASS User Guide can
also be found on this page. To Submit a Comment/Question click on the "Submit Comment/Question"
link to email the developer. "Log out" icon in upper right-hand corner of screen can be clicked at any time
to log out. "Information" buttons are present throughout SeqAPASS to give the user additional
information or instruction regarding features and functionality of the tool. "Exit" buttons are also present
by each external (non-EPA) link that takes the user to a page NOT maintained by the EPA.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Log out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports
Settings
Welcome to SeqAPASS Version 4.0
Logged in as: Biatz,Donovan

SeqAPASS Home
About SeqAPASS

SeqAPASS User Guide exit

Submit Comment/Question or Report a Problem©


Request SeqAPASS Run Tab
Clicking the "Request SeqAPASS Run" tab opens a page to enter the query information necessary for a
SeqAPASS run. Each section of the "Request SeqAPASS Run" will be described below:
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Log out
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

Request Level 1 SeqAPASS Run
Version 4.0
Logged in as: Biatz,Donovan
5

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Identify a Protein Target
SeqAPASS is designed to predict cross species chemical susceptibility. Protein targets are often decided
based on chemical, adverse outcome pathway (AQP), or high-throughput screening (HTS) assay target.
Resources have been provided, as links, to aid the user in searching for appropriate protein targets and can
be accessed by selecting the drop-downs found in the "Identify a Protein Target" box.
Identify a Protein Target

SeqAPASS is designed to predict cross species chemical susceptibility based on a protein molecular target. The following resources have been identified to guide the user to an
appropriate protein target based on the chemical, adverse outcome pathway (AOP), or high-throughput screening (HTS) assay target of interest. Click the help buttons below for
descriptions of how to find relevant protein target information from these resources.
All links will open in a new tab.
The following links exit the site J EXIT|
*• Pharmaceutical protein targets:
httDs://www.druabank.ca
httD://sitem.herts.ac.uk/aeru/vsdb/index.htm
httD://bidd.nus.edu.sa/aroup/cjttd/TTD HOME.asp
" Pesticides and other chemical protein targets:
http://www.t3db.ca
AOP chemical intiators:
httos://aopwiki.ora
*¦ ToxCast HTS results by chemical:
https://comptox.epa.aov/dashboard
Select Search
There are two options for entering query information: "By Species" or "By Accession"' (See radio buttons
to the right of "Select Search"). Selecting "By Species'' will allow the user to enter text and select from a
dropdown list of species and then select a protein from any sequence available for that species in the
NCBI protein database. Selecting "By Accession"' allows the user to enter a NCBI protein accession.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Log out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports
Settings

Request Level 1 SeqAPASS Run Version 4.0
Logged in as: Blatz,Donovan





Identify a Protein Target
|+j






Compare Primary Amino Acid Sequences
O


o ! to u ® By Species
Select Search:
By Accession





6

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Query "By Species "
Type the name of the query species of interest in the "Query Species Search" text box. The species
common name, scientific name, or Taxid (ID number derived from the NCBI taxonomy database) may be
typed into the search bar. This is the species you would like to compare all other species to. The search
bar has an auto-complete function and will generate a list of species with corresponding Taxid. When text
is typed into the search bar, the auto-complete function queries the database in the order of "starts with"
then "contains." If an integer is typed in the search bar the auto-complete function queries the database in
the order of "Taxid", "starts with", then "contains."
Home Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

Request Level 1 SeqAPASS Run
Version 4.0
Logged in as: Blatz,Donovan
Identify a Protein Target
Compare Primary Amino Acid Sequences
i) By Species
Select Search:
Query Species Selection
Query Species Search:
[Homo sap





Add Query Species
I Homo sapiens (Taxid:9606) I
Query Species:
Homo sapiens Linnaeus, 1758 (Taxid:9606)
Homo sapiens neanderthalensis (Taxid:63221)
Homo sapiens ssp. 'Denisova' (Taxid:741158)
Homo sapiens ssp. Denisova (Taxid:741158)
Homo sapiens subsp. 'Denisova' (Taxid:741158)
Homo sapiens x Mus musculus hybrid cell line (Taxid:1131344)

Note: The user can also use the NCBI taxonomy database to identify query species using the NCBI link
on the right-hand side of the "Add Query Species" button.
Select species of interest by clicking on the name in the drop-down box. Once species is selected, click
"Add Query Species" button. This advances the species of interest to the "Query Species" box and fills
the "Query Proteins" box with all available protein sequences for that species from the NCBI protein
database (although the box only displays the initial 200 proteins/species based on lowest numerical
accession number). The protein list includes the protein NCBI accession, protein name, and species
scientific name.
Query Species Selection
°1
Query Species Search:
Add Querv SDecies NC.RI Taxonomy natahase
saDiens (Taxid:9606)




Query Protein Selection
4
Query Protein Search:
Filter Protein NCBI Protein Database IBPH

Query Proteins:
[NP_000005.2] alpha-2-macroglobulin isoform a precursor
[NP_000006.2] arylamine N-acetyltransferase 2
[NP_000007.1] medium-chain specific acyl-CoA dehydrogenase, mitochondrial isofori
[NP_000008.1] short-chain specific acyl-CoA dehydrogenase, mitochondrial isoform 1
[NP_000009.1] very long-chain specific acyl-CoA dehydrogenase, mitochondrial isofo ^


Add Selected Protein(s)

7

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
To filter the query protein list, type the query protein name or partial name in the "Query Protein Search"
box and click the "'Filter Protein'" button. This action will filter the protein list in the "Query Proteins" box
to only display proteins that contain the user defined text (this search query does not contain an autotill
feature due to the filter feature). Proteins will be listed in alphabetical order based on NCBI accession
Example: typing "estrogen" retrieves all proteins that contain the word "estrogen" in the protein name
(the user can scroll to identify proteins of interest).
Query Protein Selection
Query Protein Search: estrogen|
Filter Protein
NCBI Protein Database jJEXIT
Query Proteins:
[NP 000116.2] estrogen receptor isoform 1
[NP001035055.1] G-protein coupled estrogen receptor 1
[NP_001035365.1] estrogen receptor beta isoform 2
[NP_001091671.1] G-protein coupled estrogen receptor 1
[NP_001116212.1] estrogen receptor isoform 1
Add Selected Protein(s)
Note: To explore details associated with a protein of interest, click the "NCBI Protein Database" link to
the right of the "Filter Protein" button to open NCBI proteins database (See SeqAPASS Documentation
section of user guide for details about searching for query proteins using NCBI database).
Highlight the protein or proteins of interest (Ctrl left click to select multiple proteins) in the "Query
Proteins" box and click "Add Selected Protein(s)" button. This moves the protein(s) of interest to the
"Final Query Protein(s)" box. To remove proteins from the "Final Query Protein(s)" box highlight those
to be removed and click the "Remove Selected Protein(s)" button. Select "Remove All Proteins" to
discard all proteins from "Final Query Protein(s)" box. The clear button removes all infonnation
previously entered on the "Request SeqAPASS Run" page.
Query Protein Selection
Query Protein Search:
Query Proteins:
Filter Protein
NCBI Protein Database BffiTil
[NP_001258805.1] estrogen receptor beta isoform 5
[NP_001258806.1] estrogen receptor beta isoform 6
[NP_001278170.1] estrogen receptor isoform 3
ligsls]
[NP_001278641.1] estrogen receptor beta isoform 2
Add Selected Protein(s)
SeqAPASS Submission
Final Query Protein(s)
[NP_001258805.1] estrogen receptor beta isoform 5
[NP_001278159.1] estrogen receptor isoform 2
[NP_001278641.1] estrogen receptor beta isoform 2
Remove Selected Protein(s) RefpQve All Protejns
8

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Once the user identifies the protein(s) to be queried, select "Request Run." A message will briefly appear
in upper right-hand corner of the screen for 10 seconds to alert the user of the request status.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Please note that SeqAPASS has been updated to Data Version 4 (see About page for details).
Submitted
NP_001230447.1:
submitted
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports S^
Request Level 1 SeqAPASS Run
Version 4.0
Identify a Protein Target
SeqAPASS is designed to predict cross species chemical susceptibility based on a protein molecular target. The following resources hav^
appropriate protein target based on the chemical, adverse outcome pathway (AOP), or high-throughput screening (HTS) assay target of ii
descriptions of how to find relevant protein target information from these resources.
All links will open in a new tab.
The following links exit the site i EXIT
Submitted
NP_001230448.1:
submitted
Submitted
NPJJ01248338.1:
submitted
~ Pharmaceutical protein targets:
~ Pesticides and other chemical protein targets:
~ AOP chemical intiators:
~ ToxCast HTS results by chemical:
Multiple proteins can be added to the final list for multiple SeqAPASS runs. If another query species is
desired, return to; "Query Species Search" to select the next species. Follow the process described above
for selecting the proteins associated with this species. The proteins populated in the "Query Proteins" box
will always be associated with the species highlighted in the "Query Species" box.
Note; In the current version of SeqAPASS, PLEASE do not request more than 10 query proteins at a
time to avoid longer wait times for the completion of a run.
Query Species Selection
Query Species Search:
1
Add Query Species
NCBI Taxonomv Database EXfT
Query Species:
Homo sapiens (Taxid:9606)


Bostaurus (Taxid:9913)




Query Protein Selection
Query Protein Search:
	1

Filter Protein NCBI Protein Database |
Query Proteins:
[NP 001001133.2] protein argonaute-3


[NP 001001134.1] solute carrier organic anion transporter family member 3A1


[NP 001001135.2] collagen alpha-1 (II) chain isoform 1 preproprotein


[NP 001001136.2] hepatoma-derived growth factor-like protein 1


[NP_001001137.1] UAP56-interacting factor


Add Selected Proteinls)
Note; A user may check the progress of the run by clicking on the "SeqAPASS Run Status" tab. (See
SeqAPASS Run Status section of the user guide for more information)
9

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Query "ByAccession"
Users familiar with the NCBI database can utilize NCBI protein accessions (e.g., NP_000116.2) to query
the SeqAPASS tool. This is done by selecting the "By Accession" radio button to the right of the "Select
Search" text on the "Request SeqAPASS Run" page.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Log out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports
Settings

Request Level 1 SeqAPASS Run Version 4.0
Logged in as: Blatz,Donovan





Identify a Protein Target
s






Compare Primary Amino Acid Sequences
0


By Species
Select Search:
By Accession





Upon selecting the "By Accession" radio button, a new query page will be displayed. Type the NCBI
protein accession (e.g., NP_000116.2) for the protein of interest (this Accession comes from the NCBI
protein database; See "SeqAPASS Documentation" for details) in the "NCBI Protein Accession" box. If
desired, more than one NCBI Accession may be entered into the "NCBI Protein Accession" box by
clicking the enter key after each additional NCBI Accession entry.
Upon clicking the "NCBI Protein Accession" text box, a pop-up message will appear in the middle of the
text box, to provide an example for the proper format of Accessions to be entered.
SeqAPASS Submission
NCBI Protein Accession:
Request Run Clear
NCBI Protein Database |iX)T

Note: To avoid longer wait times for the completion of a run, in the current version of SeqAPASS, please
do not request more than 10 NCBI Accessions at a time.
10

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Home
Request SeqAPASS Run
SeqAPASS Run Status View SeqAPASS Reports
Settings

Request Level 1 SeqAPASS Run
Version 4.0
Logged in as: Blatz,Donovan
Identify a Protein Target
Compare Primary Amino Acid Sequences
{J By Species
Select Search:
By Accession
SeqAPASS Submission
NCBI Protein Database M
NCBI Protein Accession:
NP 000116
Request Run
After the NCBI accession(s) of interest have been typed in the "NCBI Protein Accession" box, click the
"Request Run" button. To remove proteins from the "NCBI Protein Accession" box click the "Clear'
button. A message will briefly appear in the upper right-hand corner of the screen to alert the user of their
run request status.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Success
NP 001315029: submitted;
A Please note that SeqAPASS has been updated to Data Version 4 (see About page for details).
NP_001315029.1 j
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

Request Level 1 SeqAPASS Run
Version 4.0
Logged in as: Blatz,Donovan

Identify a Protein Target



Compare Primary Amino Acid Sequences

By Species
Select Search:
• By Accession


SeqAPASS Submission
o

NCBI Protein Data
base —


NCBI Protein Accession:



Request Run Clear




Note: All NCBI Accessions can include the version number (one digit after the decimal place, e.g.,
NP 000116.2). Otherwise, if the version is not included, the most recent version of the accession will be
queried automatically.
11

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
SeqAPASS Run Status
Level 1 SeqAPASS (primary amino acid sequence comparisons) status is displayed as the default. The
Accession in the column "Level 1 Query Accession" is that selected and queried by the user. For a query
to finish it must display "complete" in the BLASTp column, 100% in the "Common Domains" column,
and 100% in the "Ortholog Candidate" column. The "Common Domains" column displays the %
completion for running Reverse Position Specific (RPS)-BLAST (Default E-value of <0.01) on the
Accessions from the Level 1 Full Report. RPS-BLAST, and therefore "Common Domains" status, will
take the longest to complete. The "Ortholog Candidate" column displays the % completion for running a
reciprocal best hit BLAST evaluation for each hit sequence. The status for the "BLASTp" column is
described as "started," "analyzing," or "complete." If the user's successfully submitted query has entered
the run queue, the position of the submitted query in the queue will be indicated in the column (e.g., 2nd in
queue). The "Common Domains" and "Ortholog Candidate" columns will also describe the position of
the user's submitted query in the run queue. Once the run has begun processing, the % completed for
RPS-BLAST or reciprocal best hit BLAST, respectively, will be displayed. Please see example below:
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)	Log out
Home Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports Settings

SeqAPASS Run Status

Version 4.0
Logged in as: Blatz,Donovan
® Level 1 Status
Q Level 2 Status	^Refresh Data
Q Level 3 Status
SeqaPASS Level 1 Run Status
Search: Enter keyword
SeqAPASS Run
Id -
Data Version J User J Level 1 Query BLASTp C Common Ortholog Start Date: Date Completed I SeqAPASS Run Duration 3
Accession 5 Domains S Candidate ;
1310
4
Batz.DQiovan@epa.gov
NP_001315029.1
complete
100%
100%
2019 09 04 10:24:21
2019 09 0410:27:04
2 minute(s) 43 secondfs)
1309
4
Batz.Donovan@epa.gov
NP 001230447.1
complete
100%
100%
2019 09 04 10:14:04
2019 09 0410:24:35
10 minute(s) 31 second(s)
1306
4
Blatz.Donovan@epa.gov
NP_001230448.1
complete
100%
100%
2019 09 04 10:14:04
2019 09 0410:24:37
10 minute(s) 33 second(s)
1300
4
Biatz.Donovan@epa.gov
NP 001248338.1
complete
100%
0%
2019 09 04 10:14:04
Not Finished

1308
3
Blatz.Donovan@epa.gov
NP 001258805.1
complete
100%
100%
2019 09 04 10:12:07
2019 09 04 10:12:07
1 seconds
1308
3
Batz.Donovan@epa.gov
NP 001278159.1
complete
100%
100%
2019 09 04 10:12:07
2019 09 04 10:12:07
1 seconds
1308
4
Batz.Donovan@epa.gov
NP 001258806.1
complete
100%
100%
2019 09 04 10:12:07
2019 09 04 10:19:24
7 minute(s) 17 second(s)
1306
3
Batz.Donovan@epa.gov
NP_000116.2
complete
100%
100%
2019 08 2914:53:03
2019 08 29 14:53:03
1 seconds
1303
3
Batz.Donovan@epa.gov
CAC38767.1
complete
100%
100%
2019 08 27 12:31:18
2019 08 2712:39:25
8 minute(s) 7 secord(s)
1302
3
Batz.Donovan@epa.gov
NP 571229.3
complete
100%
100%
2019 08 27 12:24:34
2019 08 2712:50:34
26 minute(s) 0 second(s)
(1 of 3) BUS HE |WT] Download Table: —
Top of Page
The user can view the status of requested SeqAPASS runs. Each Run is assigned a unique "SeqAPASS
Run Id.' A Run is considered a query that was requested either individually or as a batch in the "Request
SeqAPASS Run" tab. The user can view run start and end dates/times, and the duration of the run. (See
Search, View, and Download Data Tables section of user guide for more information). The "Data
Version" column indicates which version of NCBI data is being used (See "About" page for details on
Data Versions)
The user is also able to view the status of Level 2 (Functional domain(s)) and Level 3 (individual amino
acid residue alignments).
12

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View Level 2 Status by selecting the radio button. Also, while viewing the page, the user can click the
"Refresh Data" button to refresh the data. "'Level 1 Query Accession" column displays the NCBI
accession selected and queried by the user. Please see below:
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)	Log out
Home Request SeqAPASS Run
SeqAPASS Run Status View SeqAPASS Reports Settings

SeqAPASS Run Status
Version 4.0
Logged in as: Blatz,Donovan
Levei 1 Status
® Levei 2 Status
Q Levei 3 Status
Refresh.Data
SeqaPASS Level 2 Run Status
Search: tnter keyword j
SeqflRASS Data Version Usef s Uwn Query NCBI Accession i Domain Type; BLASTp 5 Start Date C Date Completed 3 SeqAPASS Run Duration ;
2410
3
Bi3tz.Donovan@epa.gov
AQ23055S.1
AQZ36559.1
p450
oompiete
2019 08 2808:45:05
2019 08 28 08:45:29
24 seconds
2414
3
B'atz.Don ovan@epa.gov
XP 008582383.1
XP 008582383.1
PLNQ2183
complete
2019 08 23 14:14:16
2019 08 23 14.14:26
10 seconds
2413
3
BSatz.Donovai@epa.gov
XP 008582383.1
XP 008582383.1
PLN02428
compete
2019 08 23 13:59.45
2019 082313:59:54
9 seconds
2412
3
Biatz.Donovan@epa.gov
AQZ38556.1
AQZ3B556.1
CypX
complete
2019 08 2312:23:17
2019 08 23 12:23:32
15 seconds
2411
3
Biatz.Doncvan@epa.gov
ALG650S1.1
ALG85Q81.1
Cypx
compete
2019082311:01:31
20190823 11:01:44
13 seconds
2410
3
B3tz.Donovan@epa.gov
NP C00118.2
NP G00116.2
NR LBD ER
complete
2019 03 2309:48:41
2019 08 23 06:47:27
48 seconds
2409
3
BSatz.Donovan@ep3.gov
NP 000118.2
NP 000118.2
NR LBD HNF4 IS®
compete
2019 08 2011:54:38
2019 08 20 11:54:49
13 seconds
2408
3
Biatz.Donovan@epa.gov
NP 000452.2
NP 000452.2
NR LBD TR
compete
2019 08 19 10:07:25
2019 08 19 10:07:36
11 seconds
2407
3
Biatz.Donovan@epa.gov
NP 001028.1
NP 001028.1
V-set
compfete
2019 08 1913:54:30
2019 08 19 13:54:32
2 seconcs
(1 of 1)	[T]	'10* Download Table: -
Top of Page
View Level 3 Status by selecting the radio button. "Level 1 Query Accession" column displays the NCBI
accession selected and queried by the user. The "Job Name" is the user defined name chosen to describe
the Level 3 alignment. Also, while viewing the page, the user can click the "Refresh Data" button to
refresh the data. Please see below:
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Log out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports
Settings

| SeqAPASS Run Status
Version 4.0
Logged in as: Blatz,Donovan |
O Levei 1 Status
Q Levei 2 Status
% Level 3 Status

SeqaPASS Level 3 Run Status
Search: Enter keyword |
SeqAPASS
Run Id •
Data Version
User c
Job Name ;
Level 1 Query
Accession S
Template Accession :
COBALT :
Start Date :
Date Completed ;
SeqAPASS Run Duration ;
888
3
Bi3tz.Donovan@ep3.gov
Actmopteri
NP 000118.2
NP 000116.2
compfete
2019 082914:55:57
2019 08 29 14:55:59
2 seconds
681
3
Bfatz.Dtmovan@epa.go
Bee run
AQZ36559.1
AQ236559.1
oompiete
2019 08 27 12:33:07
2019 C® 27 12:38:09
2 seconds
660
3
Biatz.Donovan@epa.go
Tea
AQZ36559.1
XP 006562363.1
complete
2019 08 23 12:20:48
2019 08 23 12:20.50
2 seconds
859
3
Bfatz.Dociovsn@epa.go
CYP9Qtes«
AQZ36559.1
XP 006532363.1
oompiete
2019 08 23 12:19:08
201908 2312:19:10

658
3
Btatz.Donovan@epa gov
CYP9Q comparison
AQZ36559.1
XP 008562364.1
compete
2019 032312:17:18
2019 08 23 12:17:20
2 seconas
657
3
B-3tz.Donovan@epa.go
Break Test
NP 000116.2
NP 000118.2
compfete
2019 08 21 15:01:33
2019 08 21 15:01:35
2 seconcs
658
3
Bfatz.Donovan@epa.gov
User Guide test 2
NP 000116.2
NP 000118.2
complete
2019 08 21 12:25:46
2019 08 21 1255:48
2seconcs
855
3
BJatz.Donovan@ep3.go
User Guide test
NP 000118.2
NP 000118.2
complete
2019 0821 11:27:30
201903 21 1157:33
2 seconds
654
3
BiStz.Donovan@epa.gov
Test Case study
NP 000452.2
P10828.2
complete
2019 08 19 18:11:54
2019 08 19 18:11:56
2 seconds
(1 of 1)	JT]	ilO »j Download Table:
Top of Page
To return to previous tabs click on "Home," "Request SeqAPASS Run," or "SeqAPASS Run Status"
tabs.
13

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View SeqAPASS Reports Tab
The "View SeqAPASS Reports" tab provides a table of completed SeqAPASS runs. From this page the
user can choose to either "View Report" or "Save Report* s)."
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Loo out
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

SeqAPASS Reports

Version 4.0

Logged in as: Biatz,Donovan
^Partial Protein Sequence
Reguest Selected Report
Refresh Available Reports
# View Report


G Save Report(s)


The completed runs, by default, are listed in the order in which they were completed, with the most recent
runs at the top. The table includes information for each ran, such as SeqAPASS Run ID (unique for every
run regardless of if it is the same protein/species combination ran twice), Data Version, Ortholog Count
(number of orthologs detected from the aligned hit sequences in Level 1; see Detailed Documentation
page 79), NCBI Accession, Query Protein Name, taxonomy information for the query species, and the
date/time of run completion.
While viewing the page, the user can click the "Refresh Available Reports"' button to refresh the table
with additional completed runs. Partial protein sequences are highlighted in yellow as illustrated in the
example below. (See Search, View, and Download Data Tables section of user guide for more
information).
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports	Version 4.0	Logged in as: B!atz,Donovan
(^Partial Protein Sequence
Request Selected Report
Refresh Available Reports
# View Report


© Save Report(s)


Available Reports
Search:) Enter keyword

SeqAPASS
Run Id -
Data Version
Ortholog Count
Level 1 Query
Accession 0
Query Protein Name 0
NCBI
Taxonomy ID 0
Query

1310
4
3
NP_001315029.1
estrogen receptor isoform 4
9606
Ho

1309
4
16
NP_001230447.1
estrogen-related receptor gamma isoform 6
9606
Ho

1309
4
57
NP_001230448.1
estrogen-related receptor gamma isoform 2
9606
Ho

1308
3
9
NP_001258805 1
estrogen receptor beta isoform 5
9606
Ho

1308
3
45
NP_001278159.1
estrogen receptor isoform 2
9606
Ho

1308
4
38
NP 001258806.1
estrogen receptor beta isoform 6
9606
Ho

1306
3
348
NP_000116.2
estrogen receptor isoform 1
9606
Ho

1303
3
305
CAC38767 1
cytochrome P450 aromatase
90988
Pimep

1302
3
104
NP_571229.3
aromatase
7955
D

1301
3
0
APO40848.1
PsbA, partial (plastid)
93036
P
(1 of 3)	1 2 3||"j-' 10- Download Table:
Top of Page
14

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View Report
To select a completed am and view Level 1 data, select the corresponding radio button in the first column
of the table and click "Request Selected Report." This will open the Level 1 page to view the Level 1 data
and to set up queries for Level 2 and Level 3.
Note: The user MUST select a radio button PRIOR to clicking "Request Selected Report." If the user
fails to select a radio button and clicks "Request Selected Report" a Spinning Wheel will appear and
disappear, and no completed run will be opened. Further, there is no pop-up message indicating that the
user did not select a radio button.
SeqAPASS Reports
Version 4.0
Logged in as: Blatz.Donovan
Home
Request SeqAPASS Run
SeqAPASS Run Status View SeqAPASS Reports Settings
@Partial Protein Sequence
® View Report
Q Save Report(s)
Request Selected Report _
h Available Reports _
Available Reports
SeqAPASS Data Version I Ortholog Count Level 1 Query
Run Id -	c	t	Accession c
Search: Enter keyword
Query Protein Name s
Query Species Name :

1310
4
3
NP_001315029.1
estrogen receptor isoform 4
9606
Homo sapiens

1309
4
16
NP_001230447 1
estrogen-related receptor gamma isoform 6
9606
Homo sapiens

1309
4
57
NP_001230448 1
estrogen-related receptor gamma isoform 2
9606
Homo sapiens

1308
3
9
NP_001258805 1
estrogen receptor beta isoform 5
9606
Homo sapiens

1308
3
45
NP_001278159 1
estrogen receptor isoform 2
9606
Homo sapiens

1308
4
38
NP_00125S806.1
estrogen receptor beta isoform 6
9606
Homo sapiens
O 1305 3
348
.Nf&QQTOfia
estrogen receptor isoform 1
9606 Hpmo.sapiens

1303
3
305
CAC38767.1
cytochrome P450 aromatase
90988
Pimepbales promelas

1302
3
104
NP_571229.3
aromatase
7955
Danio rerio

1301
3
0
APO408481
PsbA. partial (plastid)
93036
Poa annua
(1 of 3)
12 3 " I "
Download Table:
Save Report(s)
To download completed Level 1, 2, and/or 3 data, select the "Save Report(s)" radio button. Upon doing
so the user can select which accession(s) to download by clicking the checkbox in the first column of the
table associated with desired accession and click "Save Selected Report(s)."
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports	Version 4.0	Logged in as: Blatz.Donovan
SPartial Protein Sequence	Save.Selected ReporUs) _ Refresh AvaMahle. Reports
O View Report
® Save Report(s)
Available Reports
Search:! E nter keyword [
SeqAPASS '
Run Id *
Data Version '
Ortholog Count
Level 1 Query
Accession ;
Queiy Protein Name :
NCBI
Taxonomy ID s
Query Species Name t
Q 1310
—J,
_ 3,	
NP_001315029-1
estrogen receptor isoform 4
9606
Homo sapiens
H 1309
-4
16
NP_00J230447.1 estrogen-related receptor gamma isoform 6
9606
Homo sapiens
m
1309
4
57
NP_001230448 1
estrogen-related receptor gamma isoform 2
9606
Homo sapiens
y I 1308
3
9
NP_001258805 1
estrogen receptor beta isoform 5
9606
Homo sapiens

.3.
$5
NP_O01Z78159.1
estrogen receptor isgfoim 2
9606
Homo sapiens
y1 1308
4
38
NP_001258806 1
estrogen receptor beta Isoform 6
9606
Homo sapiens


34?
NP_000116.?.
estrogen receptor isoform 1
9606
Homo sapiens


305
CAC38767.1
cytochrome P450 aromatase
90988
Pimephales promelas.
,302
3
104
NP_571229.3
aromatase
7955
Danio rerio
y| 1301
3
0
APO408481
PsbA. partial (plastid)
93036
Poa annua
(1 of 3)	12 3" 10 -
15

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The user can also deselect data that is not wanted in the download by scrolling to the far right of the table
and deselecting the checkboxes for the different levels of the SeqAPASS analysis. By default, all
available data for the selected accession will be downloaded in a zip file.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Log out
Home Request SeqAPASS Run SeqAPASS Run Status
View SeqAPASS Reports
Settings

SeqAPASS Reports
Version 4.0

Logged in as: Blatz,Donovan
BiPartial Protein Sequence
SaveSelectedRer
warasl^. j Refresh .Available Reports,
Q View Report


# Save Report(s)


Available Reports


Search:
| Enter keyword f




ry Protein Name i
NCBI
Taxonomy ID o
Query Species Name ;
Query Common Name ;
Taxonomy :
Level 1
Level 2
Level 3
pen receptor isoform 4
9606
Homo sapiens
Human
Mammalia
M
¦
¦
ed receptor gamma isoform 6
9606
Homo sapiens
Human
Mammalia



ed receptor gamma isofomi 2
9606
Homo sapiens
Human
Mammalia
Q
¦
receptor beta isoform 5
9606
Homo sapiens
Human
Mammalia
a
¦
en receptor isofonn 2
9606
Homo sapiens
Human
Mammalia
Q
b
Q
i receptor beta isofomi 6
9606
Homo sapiens
Human
Mammalia

¦

ken receptonsoform 1
9606
Homo sapiens
Human
Mammalia
~
B
H
pome P450 aromatase
90988
Rmephaies promejas
Fathead minnow
Actinopteri

~
a
aromatase
7955
Danio rerio
Zebrafish
Actinopteri
B
y
a
bA. partial (plastid)
93036
Poa annua
Bluegrass
Liliopsida
H
u
u
(1 of3>	112 IfflirTi r.i 10 T
Top of Page
A WinZip file will be created for all the selected Reports.
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports Settings
Versil C Save As
SeqAPASS Reports
* > This PC > Downloads
Org;
Save Selected
^Partial Protein Sequence
Q View Report
# Save Report(s)
. Refresh Available Repo
* Quick.
match your search.
Network
Available
Search: Enter
iy Protein Name :
Query Species Name c
ed receptor gamma isoform 6
Save as type; WinZip File (*.zip)
Actinopteri
bA partial (plastid)
(1 of 3)
Top of Page
en receptor isoform 4
Rmephaies prom el as
Homo sapiens
Homo sapiens
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Log out
16

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
A pop-up seqapass.zip file should appear with data files for each selected report. The naming convention
is the NCBI Protein Accession and the Data Version (e.g., AAG31441.2_v2).
1 13 & v ! seqapass - WinZip


1 = 11 b lka.1
MfLDtlik Unzip/Share Edit Backup Tools Settings View
Help Upgrade


Files >
seqapass.zip

Actions


Recent Zip Files


Unzip All Files


^j> seqapass, 2ip
1 AAG31441.2_v2
P Type: Folder
Date modified: 5/17/2017 8:58 AM
A Unzip to:
ir VAa.ad,e.,.\seqapass


^j> seqapass-l.zip
. AAK85198.1_v2
1 Type: Folder
Date modified: 5/17/2017 8:58 AM
Convert 81 Protect Files


~ f seqapass-2 .zip
i AAQ03208.1_v2
f Type: Folder
Date modified: 5/17/2017 8:58 AM
Date modified; 5/17/2017 8:58 AM
When adding files to this zip:
Encrypt Off


Places
i ACD44939.1_v2
Convert to PDF llll^l!


1 Type: Folder




«—, Favorites
» CAA10110.1_v2
Date modified: 5/17/2017 8:58 AM
fjjjp! Resize Photos Oft






9 . Type: Folder

Watermark Off


' • 'j Libraries
, NP_001267576.1_v2
I Type: Folder
Date modified: 5/17/2017 8:58 AM
Save or Share Zip


irJLgl Computer
~ B82G8 free of 464 GB
, P68279.2_v2
1 Type: Folder
Date modified; 5/17/2017 8:58 AM
m Save as-.

Network


i® Email



~ 7 item(s)
Zip File: 44 item(s), 130 MB



By clicking on one of the Reports for a Protein Accessionversion, all available files for each Level of the
SeqAPASS evaluation are available.
Note; This download includes default settings only. If susceptiblity cut-off or any defaults were
manipulated on Level 1 or 2 pages they will NOT be downloaded here and can ONLY be downloaded
directly from the Level 1 or Level 2 page where the setting was manipulated by the user. Also, data
visualizations can ONLY be downloaded from the Level 1 and 2 pages. They DO NOT populate in the zip
file folders.
0 E> ' ' seqapass-2 - WinZip

Unzip/Share Edit Backup Tools Settings View Help Upgrade
!#

Files >
(2) AAB53939.1_v2
r 5eqapass-2.zip
Actions

Recent Zip Files
Unzip Selected Files


^i> seqapass-2.zip
I LevellReportS Date modified: 5/17/2017 9:03 AM
A Unzip to:



P Type Folder
0 \\Aa.ad....\seqapass-2


—v seqapass-l.zip
i Level2ReportS Date modified: 5/17/2017 9:03 AM




P Type: Folder
Convert & Protect Files


¦su seqapass.zip
l Level3ReportS Date modified: 5/17/2017 9:03 AM
When adding files to this zip:



;PU Type: Folder
^ Encrypt Off


Places

p3 ¦ Convert to PDF -


Favorites
Resize Photos Off




JL Watermark Off


' ' • \ Libraries






Save or Share Zip


jtfc-i Computer
382 GB free of 464 GB

fH Save as...


Network

Email


~ 3 item(s) Zip File: 78 item(s), 1.88 MB





17

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
By selecting "Level 1 Reports", both full and primary reports are available as csv files as well as a graphic
of the density plot for determining the susceptibility cut-off
0 M' ' seqapass-2
WinZip

1 - 11 IKH
Unzip/Share Edit
Backup Tools Settings View Help Upgrade


Files
Recent Zip Files
> ©
Levell Reports
seqapass-2 .zip ~ AAB53939.1_v2
Actions
Unzip Selected Files

—u seqapass-2.zip
1
©
AAB53939.l_Full_v2.csv
Type Microsoft Excel Comma Separated Values File
Date modified: 5/17/2017 9:03 AM Unzip to:
Size: 167 KB -* 44.8 KB WAa.ad....\seqapass-2

» i seqapass-l.zip
1

AAB53939.l_Full_v2_cutoff.png
Type; PNG Image
Date modified: 5/17/2017 9:03 AM
Size: 16.0 KB ¦¥ 14.6 KB Convert & Protect Files

seqapasszip
1

AAB53939.l_Primary_v2.csv
Type Microsoft Excel Comma Separated Values File
... When adding files to this zip:
Date modified: 5/17/2017 9:03 AM
Size: 105 KB -» 26.3 KB £ Entrypt C ff

Places
Favorites
n

AAB53939.l_Primary_v2_cutoff.png
Type PNG Image
Date modified: 5/17/2017 WB AM g, Convert to PDF OS
Size 161 KB -> 14.7 KB
Resize Photos Off
^ Watermark Off

' * I Libraries


Save or Share Zip

ilAgl Computer
382 G8 free of 464 GB


[~| Save as...

Network


Email


I | 4 item(s)
Zip File 78 item(s), 158 MB

By selecting ""Levcl2Rcpoits". all completed domain comparisons will be available and named by NCBI
domain accession with the starting amino acid residue position for the domain (e.g.. pfam00001(54)).
R b ' seqapass-2
WinZip


Unzip/Share Edit
Backup Tools Settings View Help
Upgrade
#
Files
> (£) Level2Reports
seqapass-2.zip » AAB539391_v2

Actions
Recent Zip Files

Unzip Selected Files

jgpf seqapass-2.zip
> 1
j pfam00001(54)
r Type Folder
Date modified: 5/17/2017 9:03 AM
A Unzip to:
0F \\Aa.ad....\seqapass-2

j3v seqapass-lzip
^>1 , 1
. pfaml0320(54)
F • Type: Folder
Date modified: 5/17/2017 9:03 AM
Convert & Protect Files

seqapass.zip
m , i
k pfaml3853(54)
Date modified: 5/17/2017 9:03 AM
When adding files to this zip:


Type Folder

Encrypt OH





=
Places
^, Favorites


fyr Convert to PDF



Resize Photos B^^M'



J. Watermark Off B^9

"" * | Libraries


Save or Share Zip

Computer
382 GB free of 464 GB


f**|. Save as...

Network


USB Email


~ 3 item(s)
Zip File 78 item(s), 1.88 MB


18

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Upon selecting a domain file to view, both full and primary reports are available as csv fdes as well as a
graphic of the density plot for determining the susceptibility cut-off.
' B & t seqapass-2 - WinZip	I ° II a iSail
Unzip/Share Edit
Backup Tools Settings View Help Upgrade

#
Files
Recent Zip Files
> (£) pfam00001(54)
N-' seqapass-2.zip ~ AAB53939,l_v2 ~ Level2Reports

Actions
Unzip Selected Files
jyv seqapass-2.zip
pfam00001(54)_Full_v2.csv
W3,| Type: Microsoft Excel Comma Separated Values File
Date modified: 5/17/2017 9:03 AM
Size: 191 KB -> 45.0 KB
l?j Unzip to:
el \\Aa.ad....\seqapass-2
gfji seqapass-Lzip
pfam00001(54)_Full_v2_cutoff.png
Type: PNG Image
Date modified: 5/17/2017 9:03 AM
Size: 18.4 KB ¦+ 171 KB
Convert 8t Protect Files
seqapass.zip
Ipi pfam00001(54)_Primary_v2.csv
W3,| Type: Microsoft Excel Comma Separated Values File
Date modified: 5/17/2017 9:03 AM
Size 162 KB ¦+ 37.4 KB
When adding fifes to this zip:
Encrypt I]
Places
pf a rnOOOOl (54)_Pri m a ry_v2_cutoff. p ng
Type: PNG Image
Date modified: 5/17/2017 9:03 AM
Size: 18.4 KB -» 171 KB
Convert to PDF -
Resize Photos C **" .
Watermark Off H
Favorites


' *'' "I Libraries


Save or Share Zip
f' ta' Computer
382 GB free of 464 GB


f-j. Save as...
Network


Q5 Email

|~| 4 item(s)
Zip File: 78 item(s), 1.88 MB

By selecting "L.e\ elSReports", all user defined Level 3 alignments are available as csv.
Note: These csv files show the alignments across the entire sequence, not just those amino acid residues
selected by the user.
—JiT EB	v I seqapass-2 - WinZip
Unzip/Share Edit Backup
Files
Recent Zip Files
seqapass-2.zip
m L 1
gju seqapass-l.zip
i
£j|» seqapass.zip
k 1
Places
" " I Libraries
IlJIta1 Computer
382 GB free of 464 GB
%
©Level3Reports
seqapass-2.zip ~ AAB53939.1 v
seqapas5-2,zip ~ AAB539391_v2
3 try(318)_v2.CSV	Date modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 22.0 KB •¥ 4.77 KB
closer yet(310)_v2.CSV	Date modified: 5/17/2017 9
Type Microsoft Excel Comma Separated Values FileSize: 51.3 KB ¦+ 7.38 KB
fOUr(316)_v2,CSV	Date modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 28.6 KB -fr 4.98 KB
multi part teSt(313)_v2,CSV	Dale modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 34.7 KB -¥ 8.06 KB
multijest with non canonicals(320)_v2.... Dalt: modified: 5/17/2017 g
Type: Microsoft Excel Comma Separated Values FileSize: 31.8 KB •+ 7.95 KB
not yet working(309)_v2.csv	Date modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 51.2 KB -* 8.57 KB
repeat of 301(311)_V2.CSV	Date modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 31.5 KB -~ 8.02 KB
Should be 3(319)_v2,csv	Date modified: 5/17/2017 9
Type: Microsoft Excel Comma Separated Values FileSize: 25.2 KB 4.76 KB
Actions
Unzip Selected Files
A Unzip to:
0 \\Aa.ad....\seqapass-2
Convert & Protect Files
When adding files to this zip:
ft Encrypt
pv,/ Convert to PDF Off
Resize Photos Off
Save or Share Zip
M Save as...
Email
| | 14 'rtem(s)
Zip File: 78 item(s), 1.88 MB
19

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1: Primary Amino Acid Sequence Alignment
From the "View SeqAPASS Reports" tab, upon selecting a radio button and clicking "Request Selected
Report" the Level 1 data will be displayed.
The "Level 1 Query Protein Information" box contains the SeqAPASS Run ID, Query Accession,
Ortholog Count (# of hits identified as ortholog candidates to the query species protein sequence), NCBI
Data updates ("Protein and Taxonomy Data:" displays the date that NCBI databases were downloaded
and incorporated into the SeqAPASS database; BLAST Version: and Software Version: displays the
version being used by the SeqAPASS tool for the selected data), Query Species, and Query Protein. Other
information in this box will be described below.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Log ou
t
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports
Settings

SeqAPASS Reports Version 4.0
Logged in as: Blatz.Donovan


Main Level 1




Level 1 Query Protein Information


Hit proteins are identified for the following query protein. Use the main button to go back to the SeqAPASS Reports list.
SeaAPASS ID: 1306 Querv Accession: NP 000116.2 exit Ortholoa Count: 348
Query Species: Homo sapiens
Query Protein: estrogen receptor isoform 1
Protein and Taxonomy Data:
02/28/2019
BLAST Version: 2.8.1
Software Version: 3.2

The default table displayed at the bottom of the page is the "Primary Report", which includes query
protein information in the first row below the column titles, followed by hit proteins whose sequences
aligned with the query protein. The hit proteins are ordered from the highest to lowest percent similarity
(Maximum percent similarity =100%). For each hit protein, Data version, NCBI Accession and species
information is provided including the "Protein Count" which indicates the number of protein records per
species in the NCBI protein database, taxonomic information (See Primary Report Settings section
below in user guide for more detail on "Taxonomic Group" versus "Filtered Taxonomic Group"
columns), and species names. Also included are the NCBI protein accession, protein name, BLASTp
bitscore (describes overall quality of the alignment, See NCBI BLASTp tutorials), and percent similarity
([hit bitscore/query bitscore]* 100). If the hit protein has been identified as an ortholog candidate (using
reciprocal best hit blast method), it will be noted with a "Y" for yes or if not an ortholog candidate, a "N",
for no. If the hit protein is predicted to be susceptible according to the susceptibility cut-off criteria, that
will also be noted with a "Y" for yes or alternatively an "N" for no. The date the analysis was completed
is also identified. The data also includes a column describing the number of ortholog candidates identified
using the reciprocal best hit BLAST method. The susceptibility cut-off is also listed in a column. The cut-
off is determined through identifying local minimums in the density plot of the percent similarity values
for the primary report data set and evaluation of ortholog candidates. Additionally, there is a column that
identifies if the species is a Eukaryote noted with a "Y" for yes or alternatively an "N" for no. Links out
to the NCBI Protein Database, NCBI Taxonomy Database, and ECOTOX Knowledgebase (specific to the
data row) are embedded in the Level 1 data table for "NCBI Accession," "Species Tax ID," "Scientific
Name," "Protein Name", and "ECOTOX" columns. (See Search, View, and Download Data Tables
section of user guide for more information).
20

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Default highlights identify partial protein sequences, sequences with a bitscore higher than the query
sequence and therefore percent similarity greater than 100% (commonly synthetic constructs), and when
zero ortholog candidates are identified (in this case a user should consider a different query sequence or
check the full report). Please see Susceptibility Cutoff Box for Level 1 section of user guide for details
when no orthologs are detected. Additionally, the default setting for the report shows only eukaryote data
if a eukaryote is selected as the query protein, excluding prokaryote data from the table with the "Show
Only Eukaryotcs" checkbox checked. To view prokaryote data, deselect this checkbox. If a prokaryote is
selected as the query protein, the default setting will include both eukaryote and prokaryote data and the
"Show Only Eukaryotcs" checkbox will not be selected. To limit the data to eukaryotes only, the user
would check the "Show Only Eukaryotcs" checkbox.
Columns in left side of table:
tjjj Primary Report
Full Report
Level 1 Data - Primary
The following links exit the site fEl&B



Search: Enter keyword ®


Data
Version
NCBI Accession 5
Protein
Count 0
Species
Tax ID 0
Taxonomic
Group I
Filtered
Taxonomic
Group 0
Scientific Name 0 Common Name 0
Protein
y
4
NP 000116.2
1265506
sees
Mammalia
Mammalia
Homo sapiens
Human
estrogen rece

4
ABY64717.1
2023
9593
Mammalia
Mammalia
Gorilla gorilla
Western gorilla
estrogen rei

4
XP 003311596 1
178219
9598
Mammalia
Mammalia
Pan troolodvtes
Chimpanzee
estrogen recer

4
XP 018884801.1
47068
9595
Mammalia
Mammalia
Gorilla gorilla gorilla
Western lowland gorilla
PREDICTED: estrooe

4
XP 003811544 1
51891
9597
Mammalia
Mammalia
Pan paniscus
Pygmy chimpanzee
estrogen recef
estrogen re<

4
ABY64718.1
1718
9600
Mammalia
Mammalia
Ponoo pvomaeus
Bornean orangutan

4
XP 002817538 1
145798
9601
Mammalia
Mammalia
Ponoo abelii
Sumatran orangutan
estrogen recer

4
XP 011751932 1
69122
9545
Mammalia
Mammalia
Macaca nemestrina
Pig-tailed macaque
estrogen recet

4
XP 014992596 1
88400
9544
Mammalia
Mammalia
Macaca mulatta
Rhesus monkey
PREDICTED estroge

4
XP 011922091 1
66748
§521
Mammalia
Mammalia
Cercocebus atvs
Sooty mangabey
PREDICTED: estrooe


(1 of 94)
23456789
10 **• Jj »' 10 ~ Download Table:

Columns in right side of table:
Level 1 Data - Primary
The following links exit the site fiH

Search: Enter keyword ®

Protein Name 0
BLASTp Ortholog
Bitscore 0 Candidate 0
Ortholog
Count
Cut-off 5
Percent
Similarity 0
Susceptibility . . „ , » . „ Eukaryote
Prediction 0 Analys.s Completed 0 5
EcoTox
estrogen receotor isoform 1
1241.87
Y
348
33 93
100 00
Y
2019 05 1611:04:08 I Y

estrogen receptor alpha
1229.54
Y
348
33.93
9901
Y
2019 051611 04.08
Y

estrogen receptor isoform X2
1229.54
Y
348
33.93
99.01
V
2019 0516 11 04:08
Y

PREDICTED: estrogen receptor isoform X2
1228.77
Y
348
33-93
98.95
Y
2019 05 16 11.04.08
Y

estrooen receptor ISoformX2
1228.00
Y
348
33.93
98 88
Y
2019 05 1611:04:08
Y

estrogen receptor alpha
1227 62
Y
348
33.93
98.85
Y
2019 05 1611 04:08
Y

estrooen receotor isoform X2
1227 62
Y
348
33.93
9885
Y
2019 05 1611:04 08
Y

estrogen receptor isoform X2
1227.23
Y
348
33.93
98.82
Y
2019 05 16 11 04:08
Y

PREDICTED: estrooen receotor isoform X2
1227.23
Y
348
33.93
98.82
Y
2019 05 16 11:04:08
Y

PREDICTED: estrooen receptor isoform X2
1227.23
Y
348
33.93
9882
Y
2019 05 1611 04:08
Y


(1 of 94) ff
2 3 4 5
6 7 8 9
10 " "
10 * Download Table:
[®[l	Partial Hit Protein Sequence	®
¦	HHnHHnn
^	View	Summary Report
a	Show Only Eukaryotes
21

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1: Primary Report Settings
Default settings
The "Primary Report Settings" drop down allows the user to view default settings on the table below and
manipulate certain settings. "Primary Report Settings" are only available on the "Primary Report"
display, not the "Full Report." The default settings show data for hits whose E-value are < 0.01 and have
been identified to have > 1 domain in common with the query sequence. The default setting for the
"Sorted by Taxonomic Group" is "class," therefore the "Filtered Taxonomic Group" column in the table
is set to identify and report the taxonomic lineage of "class" from the NCBI Taxonomy Database.
However, if class is not identified in the NCBI Taxonomic Hierarchy associated with the hit accession,
then the algorithm will report the next available Taxonomic Group moving from class to subclass, to
superorder, to order, to suborder, to superfamily, to family, to subfamily, to genus. Finally, the
susceptibility predictions are set by using species read-across. (Please view Documentation Section of
the User Guide for details on Read-Across settings). Briefly, Species Read-across is used to set the
susceptibility prediction, where all ortholog candidates are Susceptible = Y; all species listed above the
susceptibility cut-off are Susceptible = Y; all species below the cut-off from the same taxonomic group of
one or more species above the cut-off are Susceptible = Y; and those below the cut-off that are not
ortholog candidates and do not belong to a taxonomic group above the cut-off are Susceptible = N.
Primary Report Settings
E-value:	0.01	I ©
Sorted by
Taxonomic
Group:
Common
Domains:
Species Read-
Across:
Changing Default Settings
The "E-value" and "Common Domains" settings can be manipulated by the user by entering the desired
E-value or number of Common Domains in the respective text boxes and clicking "Update Report." The
table and data visualization will automatically be updated after a few seconds. The user may choose to
change the level of the taxonomic hierarchy that is used for the susceptibility prediction. From the Sorted
by Taxonomic Group" dropdown the user may choose to display a different taxonomic group in the
"Filtered Taxonomic Group" column of the data table.
Update
Report
Use Default Settings
22

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Primary Report Settings
OrTI

E-value: 0.01
| O
Sorted by
Taxonomic


order
©
Group:
class

Common ^
Domains:
subclass
superorder
— ©
Species Read-
Across:


suborder
o
Update
Report
superfamily
family
subfamily
8
Visualiz;
genus
0 *
If the user chooses "order' for example, the "Filtered Taxonomic Group" column in the data table will
report the taxonomic lineage of "order"' from the NCBI Taxonomy Database and all species read-across
for the susceptibility prediction will be based on order instead of class. The data visualization will also
update. As described previously, if order is not identified in the NCBI Taxonomic Hierarchy associated
with the hit accession, then the algorithm will report the next available Taxonomic Group moving from
suborder, to superfamily, to family, to subfamily, to genus. Upon selecting the Taxonomic Group from
the dropdown and clicking "Update Report," the Level 1 Data for the Primary report will update to the
selected taxonomic level.

B
Partial Hit Protein Sequence
O


<§) Primary Report
m
mi linn 			 ¦






View Level 1 Summary Report

Full Report







Show Only Eukaryotes



Level 1 Data - Primary
The following links e
at the sit
¦¦


Download Current Level 1 Report Settings
Search: Enter keyword **
Version NCB. Accession S
Protein Species Taxonomic
Count 5 Tax ID 0 Group 0
Filtered
Taxonomic Scientific Name 0
Common Name 0 Protein Name 0

4
NP_00Q11M
1265506
9606
Mammalia
Mammalia Homo sapiens
Human
estrogen receptor isofgrm 1

4
XP 003311596.1
178219
9528
Mammalia
Mammalia Pan troglodytes
Chimpanzee
estrogen receptor isoform X2
4
ABY64717.1
2023
2533
Mammalia
Mammalia Gonlla gorilla
Western gorilla
estrogen receptor alpha

4
XP 018884801 1
47068
9595
Mammalia
Mammalia Gonlla gorilla gorilla
Western lowland gorilla
PREDICTED estrooen recer.fr isoform X?

4
XP 003811544.1
51891
9597
Mammalia
Mammalia Pan paniscus
Pygmy chimpanzee
estrogen receptor isoform X?
4
XP 0028175381
145798
9601
Mammalia
Mammalia Ponoo abelii
Sumatran orangutan
estrogen receptor isoform X2

4
ABY64718.1
1718
9600
Mammalia
Mammalia Ponao pvomaeus
Bomean orangutan
estrogen receptor aloha

4
*P 011922091 1
66748
2521
Mammalia
Mammalia Cercocebus atvs
Sooty mangabey
PREDICTED

4
XP 011751932.1
69122
2545
Mammalia
Mammalia
Mammalia Macaca fascicularis
Mammalia Macaca nemestrina
Crab-eating macaque
Pig-tailed macaque
estrogen receptor isoform X2
(1 of 94)	1 23456789'10 " 1' 10» Download Table: —
23

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level One Summary Report
The user can view a summary of the data for each taxonomic group by clicking on the "View Level 1
Summary Report" button. The data includes, number of species, mean percent similarity, median percent
similarity and susceptibility prediction. This data can also be downloaded.

m
Partial Hit Protein Sequence
0

® Primary Report
m
Percent Similarity > 100%


m


View Level 1 Summary Report
Q Full Report


m
Show Only Eukaryotes



Level One Summary Report
Taxonomic Group Filtered Taxonomic
0 Group 0
Number of Mean Percent
Species 0 Similarity 0
Median Percent
Similarity 0
Susceptibility
Prediction C
Mammalia
Mammalia
173
71.33
86.23
Y
Testudines
Testudines
10
67.34
79.16
Y
Aves
Aves
95
63.01
77.88
Y
Crocodylia
Crocodylia
7
69.23
78.29
Y
Lepidosauria
Lepidosauria
21
61.16
71.68
Y
Amphibia
Amphibia
21
44.96
52.79
Y
Chondrichthyes
Chondrichthyes
7
37.18
39.24
Y
Ceratodontimorpha
Ceratodontimorpha
3
43.11
57.01
Y
Coelacanthiformes
Coelacanthiformes
2
46.56
46.56
Y
Actinopteri
Actinopteri
169
34.21
40.73
Y

(1 of 6) 12 3
4 5 6 ** " 10' Download Table: ' *"**
The user may also choose to turn species read-across off, by using the '"Species Read-Across" drop-down
and selecting "No" and clicking "Update Report/' When "No" is selected, the susceptibility predictions
will only be "Y" in the table below if Percent Similarity is above the Cut-off or if the hit is identified as
an Ortholog Candidate, yes or "Y." Any hit below the cut-off will yield a susceptibility prediction of no
or "N."
Primary Report Settings
o -

E-value: 10.01
I °
Sorted by Taxonomic [order
Group:
- o
Common Domains: 11
l ®
Species Read-Across:
No j-
Yes

©
Update Report Use
ngs

	
24

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The user can select the "Full Report" on the "Level 1" page, which includes the same information as the
"Primary Report" and additional information pertaining to the alignment of the protein sequence using
BLASTp. Additional information includes the number of amino acid residues in the sequence (Hit
Length), the number of exact matching amino acids between the hit and query sequence (Identity), the
number of exact and similar matches in amino acids between the hit and the query sequence (Positives),
the expect value (E-value) describing the number of different alignments expected to occur in the
database search by chance, and the conserved domain count. The conserved domain count identifies all
domains associated with the query protein in the NCBI conserved domains database (Specific hits, Non-
specific hits, Superfamilies, and Multi-domains; See NCBI conserved domains database for details).
SeqAPASS algorithms record the query sequence coverage of each curated domain and compares that
coverage to that of the hit sequence. If the hit sequence covers the curated domain greater than or equal to
the query sequence, then the domain is considered a common domain between the hit and query. The
number of common domains comparing each hit sequence to the query sequence are summed and
reported. This column displays "0" when the hit protein and query protein do not have any common
domains. (See Search, View, and Download Data Tables section of user guide for more information).
The user can also download the currently applied report settings by selecting the "Download Current
Level 1 Report Settings." This csv allows the user to track which settings were used or changed by the
user when downloading a data table.
Full Report
Partial Hit Protein Sequence
Show Only Eukaryotes
View Level 1 Summary Report
Level 1 Data - Full
The following links exit the site mm	Download Current Level 1 Report Settings
Search: Enter keyword ]®
Hit Length 0
Identity 0
Positives 0
BLASTd Ortholog
Evalue 5 Bitscore S Candjdate
Ortholog rut-off " Common Percent Susceptibility
Count " Domain Count 0 Similarity 0 Prediction C
Analysis Completed 0
Eukaryote EcoTox
595
595
595
O.OOOEO
1241.87
'
348
33.93
78 I 100.00
Y
2019 051611:04:08
Y

595
590
592
O.OOOEO
1229.54

348
33.93
75 I 99.01
Y
2019 051611:04:08
Y

595
590
592
O.OOOEO
1229.54
Y
348
33.93
75 I 99.01
Y
2019 051611:04:08
Y

595
589
592
O.OOOEO
1228.77
Y
348
33.93
75 I 98.95
Y
2019 051611:04:08
Y

595
589
592
O.OOOEO
1228.00
Y
348
33.93
75 I 98.88
Y
2019 051611:04:08
Y

595
589
591
O.OOOEO
1227.62
Y
348
33.93
75 I 98.85
Y
2019 051611:04:08
Y

595
589
591
O.OOOEO
1227.62
Y
348
33.93
75 I 98.85
Y
2019 051611:04:08
Y

595
588
592
O.OOOEO
1227.23
Y
348
33.93
75 I 98.82
Y
2019 051611:04:08
Y

595
588
592
O.OOOEO
1227.23
*
348
33.93
75 I 98.82
Y
2019 051611:04:08


595
588
592
O.OOOEO
1227.23

348
33.93
75 I 98.82
Y
2019 051611:04:08
Y





(1 of 97)
1 2 3 4 5 6 7 8 9 10 ~ 10» Download Table: ^ ^

Note: SeqAPASS v2.0 and newer parse the BLASTp query and hit accessions to identify all the
species/accessions from identical proteins. Therefore, if a hit sequence represents multiple species, all
species with the identical sequence will be found in the data tables for Level 1. To determine which
sequence/species was identified from BLASTp as a hit and which sequence/species was parsed from the
identical sequence, view the "Full Report" for Level, column "Identical Protein," where "N" is indicative
of the original hit sequence and "Y" is the parsed sequence.
25

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
1
A
e
Level 1 Report Settings

2


3


4
Analysis TimeStamp
2019 05 16 11:04:08
5
SeqAPASS version
3.2
6
Query Species
Homo sapiens
7
Query Protein
estrogen receptor isoform 1
8
Query Accession
NP_000116.2
9
Ortholog Count
348
10
LI Cutoff
Default
11
LI Cutoff Value
33.93221513
12
E-value
0.01
13
Sorted byTaxonomic Group
CLASS
14
Common Domains
1
15
Species Read Across
Y
16
Show Only Eukaryotes
Checked
17
Report
Primary
When downloading the current level 1 report settings, the following information will be present in the
csv. If the user decides to change the default settings, the csv can be utilized for quick information if the
SeqAPASS page is no longer accessible.
Susceptibility Cutoff Box for Level 1
The susceptibility prediction is determined by identifying ortholog candidates, sequences above a defined
susceptibility cutoff, or by identifying those species below the susceptibility cut-off from an organism
class above the susceptibility cutoff. The default susceptibility cut-off is set by plotting the distribution of
percent similarities calculated for each hit protein. From this plot, the critical points are identified, and the
local minimums and maximums reported. Using the ortholog candidate data, a susceptibility cut-off is
automatically determined by identifying the first ortholog candidate at an equal or higher percent
similarity than the first local minimum. The user can view this graph by clicking the "Cutoff Settings"
button in the "Susceptibility Cut-off' box, which will open a new tab in the web browser. The "Select
Cut-Off' drop-down can allow the user to select between the default cut-off, the 2nd local minimum or a
user defined cut-off. The 2nd susceptibility cut-off is identified in the density plot by finding the 1st
ortholog candidate at an equal or higher percent similarity to that of the 2nd local minimum. Upon
selecting the User defined cut-off from the dropdown, the user can view and closely examine the density
plot and manipulate the cut-off. The Enter Cut-off text box becomes active and the user can enter a
number 1-100. To update the cut-off in the Level 1 data report and/or close the cutoff tab and return to the
Level 1 page, click "Update Cut-off' button.
26

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Susceptibility Cut-off
1


* 1 1
1 K H
ill
1 !

0 10 » » <0 jo k n n » n
Ptrctnl Similarity ^
Cutoff Settings

This will open in a separate tab
Note: The user should have a justification for changing the susceptibility cut-off, either based on
evaluation of Ortholog cutoffs in the data visualization or from empirical evidence.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level 1 Susceptibility Cut-off: Primary Report
Local minimums are identified and susceptibility cut-off is set based on % similarity of next ortholog candidate. Use update cut-off button to go back to Level 1 data.
SeqAPASS ID: 1290	Query Accession: NP 000116.2	Ortholog Count: 348	Protein and Taxonomy Data: 02/28/2019
Query Species: Homo sapiens	BLAST Version: 2.8.1
Query Protein: estrogen receptor isoform 1	Software Version: 3.2
Select Cut-off: | Defauit: Identify 1st local minimum and find next ortholog candidate	- Enter Cut-Off:	0
^^{KteteCui-of^
Density Plot
5.5
5.0
4.5
4.0
Cut-off Susceptibility	3.5
#	Cut-off
1	33.93	^ 3.0
2	51.64
3	61.97	o 2.5
4	71.68
5	85.11	2.0
6	96.53
1.5
1.0
0.5
0.0
O	^ ^ P * £	^
Percent Similarity
All potential susceptibility cut-offs generated by the data distribution and ortholog candidate
identification are reported in the table with columns "Cut-off #" and "Susceptibility Cut-off. The user
can use these numbers to define a cut-off if empincal evidence suggests that the "Default"' or "2
minimum" are not supported.
Cut-off Based on Ortholog Candidates







¦ Density
¦1 Local Max
U Local Min


A







f 1





¦ Inflection Point


I I










L\









1










1























	l









\
[
I



r
u



\
/
J
V
\


I
V*









27

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
No Orthologs Detected
Level 1 Query Protein Information
Hit proteins are identified for the following query protein Use the main button to go back to the SeqAPASS Reports list
SeqAPASS ID: 1292	Query Accession: NP 001317544 1 n	Ortholog Count: 0
Query Species: Homo sapiens
Query Protein: peroxisome proliferator-acirvated receptor gamma isoform 3
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
Software Version: 3.2
Susceptibility Cut-off
Primary Report Settings

Visualization
«•
Refresh Level 2 and 3 ru
® Primary Report
Q) Full Report
Partial Hit Protein Sequence
I53 Show Only Eukaiyotes
View Level 1 Summary Report
Level 1 Data - Primary
The following links exit the site |
BS^Bj|i| Enter keyword

Data
Version
NCBI Accession I
Court"
Species
Tax ID i
Taxonomic
Filtered
Taxonomic
Scientific Name J
Common Name I
Prote

•
HP 0CH317S4-* 1
'2WSD8
9S26
Mammalia
Mamma-la
Homo sapiens
Human
peroxisome orolrfefato-act



f7S£19
9598
Mammilla
Mmrmafia:
Pan tfoolodvles
Oiimparaee



TWO
959$
Mammilla
Mammas*
Gorilla gorilla gonila
Western Rwland gcrfis
PREDICTED, peroxisome p


51831
9597
«OTma«S
Mammal*
Pan oaniscus
pjromy chimp



43159
Junes
m™
Manvnwa
Urcusa.ctoshornbto
Brown bear
peroxisome proliferator-acti



399>W
61853
Mammalia
Mammafia
Nomascus laucoqenvs
NcrtbW r,lMb-£/)e«ksd oiL-tv.
PREDICTED: peroxisome p


t457M
2M1
Mammalia
Mammal |
Ponao abelti

peroxisome aroliferalor-actl


59282
9704
Mammalia
Wammato
Zalootim califotniaims
Cairfoma aeatm
petoMSome nroUferalOf-acti



3
3369S3
Mammalia
uwmat
Coiobtis anaotensis palllatus
An^wcflK-bus
PREDICTED peroxisome p


38530
S56S
Mammalia
Mammalia
Mandnllus leucophaeos
Drill

(1 of 82)
1 2 3 4 5
7 8
9 10 - -
10'
Download Table: —

If no orthologs are detected from reciprocal best hit blast analysis, the "Ortholog Count" will be "0" at the
top of the "Level 1 Query Protein Information" page. The cutoff will be set by the local minimums only,
therefore the susceptibility prediction will NOT take into account ortholog candidates. It is recommended
that the user checks the full report for Ortholog candidates or identifies a different query sequence for
the susceptibility predictions. Here, the susceptibility predictions will be highlighted in dark pink in the
Level 1 data table to indicate that 0 orthologs were detected and the susceptibility cutoff was determined
from plotting the distribution of percent similarities and identifying the local minimums.
Level 1 Query Protein Information
Hit proteins are identified for the following query protein. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1299 Query Accession: APO40848.1 OBB Ortholop Count: 0
Protein and Taxonomy Data:
02/28/2019
Query Species: Poa annua
Query Protein: PsbA, partial (plastid)
BLAST Version: 2.8.1
Software Version: 3.2
Note; De-select the "Show Only Eukaryotes" checkbox to see if prokaryotes were identified as orthologs.
28

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
By clicking on the "Cutoff Settings" button when no orthologs are detected, the "Cut-off #" and
"Susceptibility Cut-off columns will report only the local minimum values.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)	Log out
Level 1 Susceptibility Cut-off
Local minimums are identified and susceptibility cut-off is set based on % similarity of next ortholog candidate. Use update cut-off button to go badt to Level 1 data.
SeqAPASS tD: 19 Query Accession: CAA74340.1 Ortholog Count 0	NCBI Data: 02/01/2015
Query Species: Bubalus bubalis
Query Protein: insulin receptor
Select Cut-off. [ Default: Identify 1st local minimum and find next ortholog Candida^ * | Enter Cut-off;
Density Plot
Cut-off Based on Ortholog Candidates
Density
8
0
s
3
0
8
2
Percent bfrnflarity
From the "Level 1," page the user can return to the list of completed SeqAPASS runs by clicking the
"Main" button on the upper left-hand side of the "Level 1 Query Protein Information" page.
Level 1 Query Protein Information
Hit proteins are identified for the followng query protein. Use the man button to go back to the Se
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 2: Functional Domain(s) Alignment
In the "View SeqAPASS Reports" tab. on the "Level 1 Query Protein Information" page, there is a
"Level 2" box for comparing hit domains to the query domain. In the "Level 2" dropdown box, there is a
link out to the "NCBI Conserved Domain Database" for the query protein of interest. Below this link the
user will find a drop-down containing functional domains associated with the query sequence for
comparison across species.
Level 1 Query Protein Information
Hit proteins are identified for the following query protein Use the main button to go back lo the SeqAPASS Reports list.
SeqAPASS ID: 1290	Query Accession: NP 000116.2 'ffiBP	Ortholog Count: 348
Query Species: Homo sapiens
Query Protein: estrogen receptor isoform 1
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
Software Version: 3 2
Susceptibility Cut-off
IJ
Primary Report Settings
O'jfcl ]
Visualization
• ? |
Level 2 Query Domain
fiCBl Conserved Domain Database |
Functional Domains
|-Select Domain -
View Level 2 Data
Choose Domain to View
[ -Select Completed Domain - | » O
View Level 2 Data
„*]
Refresh Level 2 and 3 runs
In the drop-down box (below the words "Functional Domains") the user will find all domains associated
with the query protein listed in the NCBI Conserved Domains Database. To compare a domain from the
query protein to domains of the hit proteins, the user will use the drop-down to highlight a domain and
click the "Request Domain Run" button.
Note; Domains in the drop-down are listed with the first amino acid residue position that aligns with the
NCBI curated domain in parenthesis, followed by the NCBI domain Accession, domain name, and
description.
Level 2
Level 2 Query Domain
NCBI Conserved Domain Database **
Mil
Functional Domains
I -Select Domain -
Leve
L
+ Reference Explo
Level 3 Query Am
NCBI Protein Datab
Q-alart Tamnlafo
-Select Domain -
(243) cd06157, NR_LBD, The ligand binding domain of nuclear rec
(105) cd06916, NR_DBD_like, DNA-binding domain of nuclear rece
(245) cd06929, NR_LBD_F1, Ligand-binding domain of nuclear rec
(242) cd06930, NR_LBD_F2r Ligand-binding domain of nuclear rec
(215) cd06931, NR_LBD_HNF4_like, The ligand binding domain of.
Note: Hie user can also use the text box on the top of the drop-down to search the "Functional Domain"'
list in the drop-down.
30

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
It is recommended that the user click on the "NCBI Conserved Domains Database"
http://www.ncbi.nhn nih.gov/cdd/ link to identify which domains are "Specific hits" in the NCBI
Conserved Domains Database. On the NCBI page, the user can scroll over the graphical representation of
the domains associated with the query sequence to highlight and identify the Accession associated with
domain "Speci fic hits." The example below shows the user hovering over the NR LBD ER domain with
the computer
mouse.


% NCBI

Conserved _ ,—~
Domains j* SKypjl


HOME SEARCH GUIDE
NewSearch |
Structure Home | 3D Macromolecufar Structures
Conserved Domains | Pubchem | BioSystems |
Conserved domains on [gi|6282i794jref|NP_oooii6.2!]
estrogen receptor isoform 1 [Homo sapiens]
View Concise Results T (2)
Graphical summary
ID Zoom to residue level I
show extra options »
Query seq»
Specific hits
Superf anilies
List of domain hits
Ki Name Accession
H NR_LBD_ER Cd06?'.!9
Oest_recep superfanily
IR_DBD_1
dimtr- interface I't
m
E£R1_C
Search for similar domain architectures
Description
[Specific hit, evalue = 1.46e-
146]cd06949, Ligand binding domain
of Estrogen receptor, which are
activated by the hormone 17beta-
estradiol (estrogen) ;The ligand binding
domain (LBD) of Estrogen receptor
(ER): Estrogen receptor, a member of
nuclear receptor superfamily, is activated by the hormone
estrogen. Estrogen regulates many physiological
processes including reproduction, bone integrity,
Ligand binding domain of Estrogen receptor, which are activated by the hormone 17beta-estradiol (estroge
Estrogen receptor, a member of nuclear receptor superfamily, is activated by the hormone estrogen. Estrog cartiwas^
bone integrity, cardiovascular health, and behavior The main mechanism of action of the estrogen receptor mechanism of adion ofthe estrogen receptor is as a
element of target genes upon activation by estrogen and then recruiting coactivator proteins which are resp transcription fador by binding to the estrogen reSponse
may associate with other membrane proteins and can be rapidly activated by exposure of cells to estrogen e|ement Qf ta t	activation by estrogen and .
ligand-activated transcription factors, ER has a central well conserved DNA binding domain (DBD), a variat -	- •		
binding domain (LBD). The C-terminal LBD also contains AF-2 activation motif, the dimerization motif, and part ofthe nuclear localization region. Estrogen receptor has t
linked to aging, cancer, obesity and other diseases.
lue
&T46
Pssm-ID: 132747 Cd Length: 235 Bit Score: 426.07 E-value: 1.46e-146
10	20	30	40	50	60	70	80
	*	|	*	I	*	I	*	|	*	|	*	|	*	I	*....|
gi 62321794 310 ITADQMVSALLDAEPPILYSEYDPTRPFSEASMKGLLTNIASRELVHMItn^-KRVPGFVDLILBDgVHTJ.EC&WLEIIifl 389
Cdd:cd06949 1 LSAEQLISALLEAEPPHIYSEYDPTRPFTEASLMMLLTlJIADRELVHMINivAKKIPGFVDLSLHI^'VHLLESAWLELIHL 80
90	100	110	120	130	140	150	160
	*....|	*....|	*	|	*	|	*	|	*	|	*	|	*	|
gi 62821794 390 GL\'WRSMEHPGKLLFAPNLLLDRNQGKC\;EQHVEIFIMLLAISSRFEMMNLQGEEFVCLKSIILL1TSGVYTF13aTLKSL 469
Cdd:cd06949 81 GLVWRSMmFGKHFAFDI.IT.DmjQSSCVEGMVEI FDMLL&IASRFRELQLQREEYVCLKM ILLNSSVYTF---LLESI 157
170	180	190	200	210	220	230
	*	|	*	|	*	|	*	|	*	|	*	|	*	|	*...
gi 62821794 470 EELKDHIHRVLDKITDTLIHLMRK&SLTLQQQHQRLASLLLILSHIRHMSI'IKGMEHLYSMKCKHWPLYDT.T.T.FMLDAH 547
After identifying the domain(s) of interest and the corresponding starting residue and domain Accession,
the user can return to the SeqAPASS tool, scroll to the domain of interest in the drop-down. If that
domain has not been previously run by the user, the "Request Domain Run" button will become active
and the user can click it to submit the domain query.
Level 2
Level 2 Query Domain
NCBI Conserved Domain Database g
Functional Domains
| (243) cd061S7. NR_LBD The ligand tj] O
Request Domain Run
1 V' L 12 D ta



Choose Domain to View

-Select Completed Domain -
• 1©
View Level 2 Data
31

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
When user clicks the "Request Domain Run"' button, the following message will appear if the runs has
been submitted successfully.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Lo
gout
J Level 2 Run Requested
..Status queued
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings
i

When sequence comparisons have completed for the selected functional domain, the domain will be
present in the drop-down in the View Level 2 Data area. The drop-clown is not automatically populated
with the completed domain run. The user must click on the "Refresh Level 2 and 3 runs'''' button to
update the page for the newly completed domain to present itself in the Choose Domain to View drop-
down.
To view a completed Level 2 domain, highlight the domain of interest in the drop-down box and click the
"View Level 2 Data" button. This will bring the user to the "Level 2" data page for the selected query
protein/domain.
Note: The user can also use the text box on the top of the drop-down to search the "Completed Domain"
list.
Level 2 Query Domain
-Select Completed Domain -
Level 2
o -
Level 2
©0
-Select Completed Domain -
(316) cd06931, NR_LBD_HNF4_like, The ligand binding domain of h<
(310) cd06949, NR_LBD_ER, Ligand binding domain of Estrogen rec
Choose Domain to View
I (310) Cd06949, NR_LBD_ER, Ligand bindil[ »J ©
View Level 2 Data
NCBI Conserved Domain Database B
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)	Logj2Ui
Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports
Logged in as: Blatz,Donovan
Level 2 Query Domain Information
Ortholog Count: 348
Hit domains are identified for the following query domain. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1290	Query Accession: NP 000116.2 Ban
Query Species: Homo sapiens
Query Domain: (310) cd06949 ihhh NR_LBD_ER , Ligand binding domain of Estrogen receptor, which are activated by the hormone 17beta-estradiol (estrogen)
Query Protein: estrogen receptor isoform 1
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
CDD Data: 12/08/2016
Software Version: 3 2
Susceptibility Cut-off
This will open in a separate tab
Visualization
Primary Report Settings

10.0
O
3
Sorted by Taxonomic Group: [class
| Yes ~L--1
Use Default Settings
Species Read-Across:
Update Report
Visualize Data i This will open in a separate ta
The default "Level 2" table is the "Primary Report", which includes query domain information in the first
row below the column titles, followed by hit domains whose sequences aligned with the selected query
domain. The hit domains are ordered from the highest to lowest percent similarity (Maximum percent
similarity =100%). For each hit domain, Data Version, NCBI Accession and species information is
provided, including the "Protein Count" which indicates the number of protein records per species in the
NCBI protein database, taxonomic information, and species names. Also included are the NCBI accession
for the query protein, query protein name, Domain Type, BLASTP bitscore (describes overall quality of
the alignment, See NCBI BLASTp tutorials), and Domain percent similarity ([hit bitscore/query
bitscore]* 100). If the hit protein has been identified as an ortholog candidate (using reciprocal best hit
BLAST method), it will be noted with a "Y" for yes or if not an ortholog candidate, a ""N". for no.
A prediction of susceptibility is displayed based on the susceptibility cut-off, identified with a "Y" for yes
or an ""N" for no. The date/time the analysis was completed is also identified. (See Search, View, and
Download Data Tables section of user guide for more information). There is a column that identifies if
the species is a eukaryote, noted with a "Y" for yes or alternatively a ""N" for no if the hit is a prokaryote.
Additionally, a column with a link to the U.S. EPA ECOTOX Knowledgebase
(https://cfpub.epa.gov/ecotox/help.cfm) is available when there are empirical toxicity data curated for the
species identified in the row. This link allows the user to view available single chemical toxicity data
from the literature for specific species.
Default highlights identify partial protein sequences, sequences with a bitscore higher than the query
domain and therefore percent similarity greater than 100% (commonly synthetic constructs), and when
zero ortholog candidates are identified (in this case a user should consider a different query sequence).
Additionally, the default setting for the report shows only eukaryote data, excluding prokaryote data from
the table with the "Shows Only Eukaryotes" checkbox checked. To view prokaryote data, deselect this
checkbox.
33

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
0 Partial Hit Protein Sequence
® Primary Report
^ Full Report
m
f3 Show Only Eukaryotes
View Level 2 Summary Report
Level 2 Data - Primary
The following links exit the site llBSfi
Download Current Level 2 Report Settings
Search: Enter keyword ®
Data
Version
NCBI Accession 0
Protein
Count C
Species
Tax ID o
Taxonomic
Group I
Filtered
Taxonomic
Group 0
Scientific Name 0
Common Name 0
Protein Name C j t
4
NP 000116.2
1265506
9606
Mammalia
Mammalia
Homo sapiens
Human
estrooen receptor isoform 1
4
ABY64717.1
2023
9593
Mammalia
Mammalia
Gorilla qorilla
Western gonlla
estroaen receptor alpha
4
XP 002817538.1
145798
9601
Mammalia
Mammalia
Ponao abelii
Sumatran orangutan
estroaen receotor isoform X2
4
XP 011852190.1
38580
9568
Mammafia
Mammalia
Mandrillus leucoohaeus
Drill
PREDICTED, estrogen receotor isoform X2
4
XP 023061905.1
54518
591936
Mammalia
Mammalia
Piliocolobus tephrosceles
Ugandan red Colobus
estroaen receptor isoform X2
4
XP 018884801 1
47068
9595
Mammalia
Mammalia
Gorilla aorilla gorilla
Western lowland gorilla
PREDICTED estrooen receptor isoform X2
4
XP 008005788.1
62315
60711
Mammalia
Mammalia
ChlQrocebUS 59baei|5
Green monkey
PREDICTED: estrooen receotor isoform X2
4
XP 011751932.1
69122
9545
Mammalia
Mammalia
Macaca nemestnna
Pig-tailed macaque
estroaen receptor isoform X2
4
ABY64719.1
712
9580
Mammalia
Mammalia
Hviobates lar
Common gibbon
estroaen receptor alpha
4
NP 001158059 1
68224
9555
Mammalia
Mammalia
Papio an (ibis
Olive baboon
estroaen receptor
(1 of 95)
1 2 3 4 5 6 7
10 » Download Table:
Level Two Summary Report
The user can view a summary of the data for each taxonomic group by clicking on the "View Level 2
Summary Report". The data includes, number of species, mean percent similarity, median percent
similarity and susceptibility prediction. This data table can also be downloaded.
Level Two Summary Report
Filtered
Taxonomic Group _ "
r Taxonomic Group
Number of Mean Percent Median
Species 0 Similarity C
^ ' Similarity 0
Susceptibility
Prediction 0
Mammalia
Mammalia
176
80.60
97.63
Y
Aves
Aves
96
83.78
95.73
Y
Crocodylia
Crocodylia
7
84.98
95.97
Y
Testudines
Testudines
9
86.30
94.55
Y
Lepidosauria
Lepidosauria
22
71.14
92.21
Y
Amphibia
Amphibia
22
60.74
81.03
Y
Chondrichthyes
Chondrichthyes
7
55.68
67.59
Y
Coelacanthiformes
Coelacanthiformes
2
70.43
70.43
Y
Actinopteri
Actinopteri
179
51.66
62.13
Y
Ceratodontlmorpha
Ceratodontimorpha
3
53.96
71.15
Y

(1 of 6) 12 3
4 5 6 ¦* 10' Download Table:

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 2: Primary Report Settings
Default settings
The "Primary Report Settings" box allows the user to view default settings on the table below and
manipulate certain settings. The "Primary Report Settings box is only available on the "Primary Report"
display. The default settings show data for hits whose E-value are <10. The default setting for the "Sorted
by Taxonomic Group" is "class," therefore the "Filtered Taxonomic Group" column in the table is set to
identify and report the taxonomic lineage of "class" from the NCBI Taxonomy Database. However, if
class is not identified in the NCBI Taxonomic Hierarchy associated with the hit accession, then the
algorithm will report the next available Taxonomic Group moving from class to subclass, to superorder,
to order, to suborder, to superfamily, to family, to subfamily, to genus. Finally, the susceptibility
predictions are set by using Species Read-Across. (Please view SeqAPASS Documentation Section of
the User Guide for details on Read-Across settings). Briefly, Species Read-Across is used to set the
susceptibility prediction, where all ortholog candidates are Susceptible = Y; all species listed above the
susceptibility cut-off are Susceptible = Y; all species below the cut-off from the same taxonomic group of
one or more species above the cut-off are Susceptible = Y; and those below the cut-off that are not
ortholog candidates and do not belong to a taxonomic group above the cut-off are Susceptible = N.
Primary Report Settings 0 -


E-value:
10.0 ©


Sorted by Taxonomic Group:
class ^T ©


Species Read-Across:
Yes - ©


Update Report
Use Default Settings

Changing Default Settings
The user may choose to change the level of the taxonomic hierarchy that is used for the susceptibility
prediction. From the Sorted by Taxonomic Group" dropdown the user may choose to display a different
taxonomic group in the "Filtered Taxonomic Group" column of the data table.
10.0
E-value:
Sorted by Taxonomic Group:
Species Read-Across:
Update Report
Primary Report Settings
OH
suborder
superfamily
family
subfamily
genus
class
subclass
superorder
order
35

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
If the user chooses "order' for example, the "Filtered Taxonomic Group"' column in the data table will
report the taxonomic lineage of "order" from the NCBI Taxonomy Database and all species read-across
for the susceptibility prediction will be based on order instead of class. As described previously, if order
is not identified in the NCBI Taxonomic Hierarchy associated with the hit accession, then the algorithm
will report the next available Taxonomic Group moving from suborder, to superfamily, to family, to
subfamily, to genus. Upon selecting the Taxonomic Group from the dropdown and clicking "Update
Report," the Level 1 Data for the Primary report will update to the selected taxonomic level. The user can
also download the currently applied report settings by selecting the "Download Current Level 2 Report
Settings." This csv allows the user to track which settings were used or changed by the user when
downloading a data table.
Level 2 Data - Primary
The following links exit the site (IXIT
Search: Enter keyword ®
Data
Version
NCBI Accession 0
Protein
Count 0
Species Taxonomic T;I*nnr»mir
Tax ID 0 Group 0 T~'=
Scientific Name 0 Common Name 0
4
NP 000116.2
1265506
9606
Mammalia
Primates
Homo sapiens
Human
4
XP 014992596.1
88400
9544
Mammalia
Primates
Macaca mulatta
Rhesus monkey
4
ABY64721.1
931
9534
Mammalia
Primates
Chlorocebus aethiops
Grivet
4
XP 003255939.1
38964
61853
Mammalia
Primates
Nomascus leucoqenys
Northern white-cheeked gibbon
4
XP 025240309.1
52618
9565
Mammalia
Primates
Theropithecus qelada
Gelada
4
XP 003811544.1
51891
9597
Mammalia
Primates
Pan paniscus
Pygmy chimpanzee
4
XP 011922091 1
66748
9531
Mammalia
Primates
Cercocebus atvs
Sooty mangabey
4
ABY64717.1
2023
9593
Mammalia
Primates
Gorilla aorilla
Western gorilla
4
XP 002817538.1
145798
9601
Mammalia
Primates
Ponqo abelii
Sumatran orangutan
4
XP 011852190 1
38580
9568
Mammalia
Primates
Mandrillus leucophaeus
Drill
(1 of 95)	11 I 2 ||^l^l5jl^Ml8|l9|llO| | - " 10 ' Download Tabled —
The user may also choose to turn species read across off, by using the "Species Read-Across" drop-down
and selecting "No" and clicking "Update Report." When "No" is selected, the susceptibility predictions
will only be "Y" in the table below if Percent Similarity is above the Cut-off or if the hit is identified as
an Ortholog Candidate, yes or "Y." Any hit below the cut-off will yield a susceptibility prediction of no
or "N."
Primary Report Settings
E-value:
Sorted by Taxonomi c Grou p:
Species Read-Across:
order
No | ^
Yes
17*
1*1
Update Report
ult Settings




The user can select the "Full Report" on the "Level 2" data page, which includes the same information as
the "Primary Report" and additional information pertaining to the alignment of the protein sequence using
BLASTp and domain information. Additional information includes the NCBI PSSM ID, NCBI Domain
ID, Domain Name, number of amino acid residues in the sequence (Hit Length), the number of exact
matching amino acids between the hit and query sequence (Identity), the number of exact and similar
36

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
(similar side-chain substitutions) matches in amino acids between the hit and the query sequence
(Positives), and the expect value (E-value) describing the number of different alignments expected to
occur in the database search by chance. (See Search, View, and Download Data Tables section of user
guide for more infonnation).
Level 2 Data - Full
The following Inks exit Ite site Ifffln	Download Current Level 2 Report Settings
Search: Enter keyword
O
n
Domain Name
Hit Length 0
Identity 0 Positive C
Evalue i
BLASTp
Bitscore 0
Ortholog Ortholog ,
Candidate 0 l Count ~
w , sssr? EUT,e

NR_LBD_ER
238
238 238
1.621E-179
487 26
Y 348 41 50
100 00 Y
201908 23 09:47:27 Y


NR_LBD_ER
238
237 238
9.910E-179
485.34
Y 348 41.50
99.60 Y
201908 2309:47:27 Y


NR LBD ER
238
237 238
9.910E-179
48534
Y 348 41.50
9960 Y
201908 23 09:47:27 Y


NR LBD ER
238
237 238
9.910E-179
485.34
Y 348 41.50
99.60 Y
201908 23 09:47:27 Y


NR LBD ER
238
237 238
9.910E-179
485.34
Y 348 41 50
99.60 Y
201908 23 09:47:27 Y

NR LBD ER
238
237 238
9.910E-179
485.34
Y 348 41.50
99.60 Y
2019082309:47:27 Y


NR LBD ER
238
237 238
9.910E-179
48534
Y 348 41.50
9960 Y
201908 23 09:47:27 Y


NR LBD ER
238
237 238
9.910E-179
485.34
Y 348 41.50
99.60 Y
201908 23 09:47:27 Y


NR_LBD_ER
238
237 238
9.910E-179
485 34
Y 348 41.50
99 60 Y
2019082309:47:27 Y


NR LBD ER
238
237 238
9.910E-179
485.34
Y 348 41.50
99.60 | Y
201908 23 09:47:27 Y





(1 of 95)
1 234567 89 10 ~'| J10 ' Download Table:
Note: SeqAPASS v2.0 and newer parse the BLASTp query and hit accessions to identify all the
species/accessions from identical proteins. Therefore, if a hit sequence represents multiple species, all
species with the identical sequence will be found in the data tables for Level 2. To determine which
sequence/species was identified from BLASTp as a hit and which sequence/species was parsed from the
identical sequence, view the "Full Report" for Level, column "Identical Protein,'' where "N" is indicative
of the original hit sequence and "Y" is the parsed sequence.

A
B
1
Level 2 Report Settings

2


3


4
Analysis TimeStamp
2019 05 1611:04:08
5
SeqAPASS version
3.2
6
Query Species
Homo sapiens
7
Query Protein
estrogen receptor isoform 1
8
Query Domain
(310) cd06949, NR_LBD_ER,
Ligand binding domain of
Estrogen receptor, which are
activated by the hormone
17beta-estradiol (estrogen)
9
Query Accession
NP_000116.2
10
Ortholog Count
348
11
L2 Cutoff
Default
12
L2 Cutoff Value
41.5003807
13
E-value
10
14
Sorted byTaxonomic Group
CLASS
15
Species Read Across
Y
16
Show Only Eukaryotes
Checked
17
Report
Primary
When downloading the current level 2 report settings, the following infonnation will be present in the
csv. If the user decides to change the default settings, the csv can be utilized for quick infonnation if the
SeqAPASS page is no longer accessible.
37

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Susceptibility Cutoff Box for Level 2
The susceptibility prediction is set by identifying ortholog candidates, sequences above a defined
susceptibility cutoff, or by identifying those species below the susceptibility cut-off from an organism
class above the susceptibility cutoff. The default susceptibility cut-off is set by plotting the distribution of
percent similarities calculated for each hit protein. From this plot, the critical points are identified, and the
local minimums and maximums reported. Using the ortholog candidate data, a susceptibility cut-off is
automatically determined by identifying the first ortholog candidate at an equal or higher percent
similarity than the first local minimum percent similarity. The user can view this graph by clicking the
"View Cutoff button in the "Susceptibility Cut-off' box. Radio buttons located to the right of the
graphical display indicate which Cut-off has been applied for the evaluation of susceptibility in the report.
These radio buttons can be selected to change the cut-off in the table to the 2nd local minimum, where the
2nd local minimum is identified in the density plot and the first ortholog candidate at an equal or higher
percent similarity than the second local minimum percent similarity is used to set the cut-off. Or the user
can define the local minimum by clicking on the "User Defined'' radio button. Alternatively, the user can
view the closely examine the density plot and manipulate the cut-off by clicking the "View Cutoff'
button.
Level 2 Query Domain Information
Ortholog Count: 348
Hit domains are identified for the following query domain. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1290	Query Accession: NP 000116 2 EXIT
Query Species: Homo sapiens
Query Domain: (310) cd06949 exit , NR_LBD_ER , Ligand binding domain of Estrogen receptor, which are activated by the hormone 17beta-estradiol (estrogen)
Query Protein: estrogen receptor isoform 1
Susceptibility Cut-off




I:






View Cutoff

This will open in a separate tab
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
CDD Data: 12/08/2016
Software Version: 3 2
Visualization
Upon clicking "View Cutoff button, a new page is displayed with a drop-down that allows the user to set
the susceptibility cut-off using the first local minimum and the identified ortholog candidate, the second
local minimum and the identified ortholog candidate, or by the "User defined cut-off' (where the user
selects the cutoff). To update the cut-off in the Level 2 data report and/or return to the Level 2 page, click
"Update Cut-off' button.
Note: The user should have direct empirical evidence that species above the user defined cutoff are
susceptible via the protein of interest, or that the species below the user defined cutoff are not susceptible.
38

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Upon selecting the User defined cut-off from the dropdown, the Enter Cut-off text box becomes active
and the user can enter a number 1-100.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level 2 Susceptibility Cut-off: Primary Report
Local minimums are identified and susceptibility cut-off is sel based on % similarity of next ortholog candidate. Use update cut-off button to go back to Level 2 data.
SeaAPASS ID: 1290 Querv Accession: NP 000116.2 Ortholoa Count: 348
Query Species: Homo sapiens
Querv Domain: (310) cd06949 NR L8D ER Liaand bindina domain of Estroaen receotor. which are activated bv the hormone 17beta-estradiol (estroaen)
Query Protein: estrogen receptor isoform 1
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
CDD Data: 12/08/2016
Software Version: 3.2
Select Cut-Off: ! Default". Identify 1st local minimum and find next ortholog candidate ~ Enter Cut-Off:
©
Update Cut-off

Density Plot
Cut-off Based on Ortholog Candidates
¦	Density
¦	Local Max
¦	Local Min
¦i Inflection Point
Susceptibility
Cut-off
•$><$»"$> £	«5» *	^
Percent Similarity
All potential susceptibility cut-offs generated by the data distribution and ortholog candidate
identification are reported in the table with columns "Cut-off #" and "Susceptibility Cut-off'. The user
can use these numbers to define a cut-off if empirical evidence suggests that the "Default" or a2
minimum"' are not supported.
39

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
No Orthologs Detected
Level 2 Query Domain Information
Hit domains are identified for the following query domain. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1326	Query Accession: NP 0013175441 naffi	Ortholog Count: 0
Query Species: Homo sapiens
Query Domain: (110) cd06965 Sar, , NR_DBD_Ppar, DMA-binding domain of peroxisome proliferator-activated receptors (PPAR) is composed of two C4-type zinc fingers
Query Protein: peroxisome proliferator-activated receptor gamma isoform 3
Susceptibility Cut-off
El
i






View Cutoff

This will open in a separate ta

Visualization
©'~I' |
Primary Report
® Full Report
Partial Hit Protein Sequence
Show Only Eukaryotes
View Level 2 Summary Report
Level 2 Data - Primary
The following links exit the site wm
Search: Enter keyword ®
Data
Version
NCBI Accession 0
Protein
Count C
Species
Tax ID i
Taxonomic
Group :
Filtered
Taxonomic
Group c
Scientific Name o
Common Name c
4
NP 001317544.1
1266506
9606
Mammalia
Maturate ¦!
HomosaDiens
Human
4
XP 008150376.1
50340
29078
Mammafls
Mammalia
Eptesgus fuscus
ag&rownrai
i
XP 019283665.1
58782
9691
Mamrralla
Mammalia
Panthera oarcus
Leopard
A
XP 021047523.1
362S7
10093
Mammalia
Mamrnatl'a
Mus pafian
StlreW ITWH95
If no orthologs are detected from reciprocal best hit blast analysis, the "Ortholog Count" will be "0" at the
top of the "Level 1 Query Protein Information" page. The cutoff will be set by the local minimum s only,
therefore the susceptibility prediction will NOT take into account ortholog candidates. It is recommended
that the user checks the full report for Ortholog candidates or identifies a different query sequence for
the susceptibility predictions. Here, the susceptibility predictions will be highlighted in dark pink in the
Level 1 data table to indicate that 0 orthologs were detected and the susceptibility cutoff was determined
from plotting the distribution of percent similarities and identifying the local minimums.
Main
Level 1
Level 2
Level 2 Query Domain Information
Hit domains are identified for the following query domain. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1321 Query Accession: BAF57671.1 ftSttiffi Ortholog Count: 0
Query Species: Mus caroli
Query Domain: (24) CHL00070 Bat . petB . cytochrome b6
Query Protein: cytochrome b, partial (mitochondrion)
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
CDD Data: 12/08/2016
Software Version: 4.0


Susceptibility Cut-off
[~i

Primary Report Settings







Visualization
o>









By clicking on the "View Cutoff' button when no orthologs are detected, the "Cut-off #" and
"Susceptibility Cut-off' columns will report only the local minimum values.
40

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level 2 Susceptibility Cut-off: Primary Report
Local minimums are Identified and susceptibility cut-off is set based on % similarity of next ortholog candidate. Use update cut-off button to go back to Level 2 data.
SeqAPASS ID: 1326	Query Accession: NP 0013175441	Ortholog Count: 0
Query Species: Homo sapiens
Query Domain: (110) cd06965 , NR_DBD_Ppar, DNA-binding domain of peroxisome proliferator-activated receptors (PPAR) is composed of two C4-type zinc fingers
Query Protein: peroxisome proliferator-activated receptor gamma tsoform 3
Select Cut-off: | Default: Identity 1st local minimum and find next ortholog candidate
Enter Cut-off:
Update Cut-off
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
CDD Data: 12/08/2016
Software Version: 4.0
Density Plot
Cut-off Based on Ortholog Candidates
Cut-off
Susceptibility
#
Cut-off








¦ Density









¦	Local Max
¦	Local Min









_ Inflection
Point

















A



















1


















/ \


































Percent Similarity
The user can return to the "Level 2" data page by clicking the "Update Cut-off button or exiting the tab.
Level 1 and Level 2: Data Visualization
From the Level 1 or Level 2-results page SeqAPASS users can access an interactive data visualization for
both the "Primary Report" or "Full Report" by clicking on the "Visualize Data" button.
Example of Level 1 page:
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports
Logged in as: Blatz,Donovan
Main Level 1
Level 1 Query Protein Information
SeqAPASS Reports lis'
Susceptibility Cut-off
Primary Report Settings
H O
I ®
pr-H
Visualization
II
41

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Example of Level 2 page:
Home
Request SeqAPASS Run SeqAPASS Run Status
View SeqAPASS Reports
Settings

SeqAPASS Reports
Version 4.0
Logged in as: Blatz,Donovan

Main
Level 1 Level 2




Level 2 Query Domain Information
Hit domains are identified for the following query domain. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1290	Query Accession: NP 000116.2 Bar	Ortholog Count: 348
Query Species: Homo sapiens
Query Domain: (310) cd06949 Ban . NR_LBD_ER, Ligand binding domain of Estrogen receptor, which are activated by the hormone 17beta-estradiol (estrogen)
Query Protein: estrogen receptor isoform 1
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
CDD Data: 12/08/2016
Software Version: 3.2
Susceptibility Cut-off
View Cutoff
This will open in a separate tab


Primary Report Settings
0-
E-value:
110.0


Sorted by Taxonomic Group:
[dass
• 0

Species Read-Across:
ps-!
i o

Update Report
Use Default Settings

Visualize Data This will open in a separate tab.
The data visualization will then open in a new web browser tab, one for Level 1 and a different one for
Level 2. The visualization will display for the report selected by the user on the Level 1 or Level 2 report
page and be identified as "Level One Visualization - Primary Report" or "Level One Visualization - Full
Report" and "Level Two Visualization - Primary Report" or "Level Two Visualization - Full Report."
Note: One report type at a time, either "Primary Report" or "Full Report," can be displayed in the
visualization tab for Level 1 and Level 2. Therefore, if the user is viewing the "Level One Visualization -
Primary Report" page and returns to the Level 1 results page and clicks the radio button for "Full Report,"
the data visualization tab will update to "Level One Visualization - Full Report."
42

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1 and 2 Information Page
The initial page that opens upon clicking the "Visualize Data" button provides the respective level query
protein infonnation, including SeqAPASS ID, query protein, query species, ortholog count, and query
accession information. A link out to the NCBI protein database page corresponding to the queried
accession is available by clicking the query accession. Information on the visualization is provided in the
"Visualization Info" text box. To view the data visualization boxplots click the BoxPlot icon. The
Box Plot will then generate below the Visualization Info box.
Level One Visualization - Primary Report
Level 1 Query Protein Information
SeqAPASS ID: 1290
Query Accession: NP 000116.2

Query Protein: estrogen receptor isoform 1


Query Species: Homo sapiens


Ortholog Count: 348


Select to Open Information or Data Visualization
m
Info




Visualization info

The following data visualization is available for Level 1 and Level 2 data:
• BoxPlot - Boxplots depicting SeqAPASS data illustrating the percent similarity across species compared to the query species examining the primary amino acid sequences (Level 1 Visualization) or
functional domain (Level 2 Visualization).
o The open circle, o, represents the query species and closed circles, •, represent the species with the highest percent similarity within the specified taxonomic group,
o The top and bottom of each box represent the 75th and 25th percentiles, respectively. The top and bottom whiskers extend to 1.5 times the interquartile range,
o The mean and median values for each taxonomic group are represented by horizontal thick and thin black lines on the box, respectively,
o The dashed line indicates the cut-off for susceptibility predictions (based on ortholog analysis).
Level Two Visualization - Primary Report
Level 1 Query Protein Information
SeqAPASS ID: 1290
Query Accession: NP 000116.2
Query Species: Homo sapiens

Ortholog Count: 348

Query Domain: (310) cd06949 NR_LBD_ER, Ligand binding domain of Estrogen receptor, which a
e activated by the hormone 17beta-estradiol (estrogen)
Select to Open Information or Data Visualization
Info

Visualization Info
The following data visualization is available for Level 1 and Level 2 data:

• BoxPlot - Boxplots depicting SeqAPASS data illustrating the percent similarity across species compared to the query species examining the primary amino acid sequences (Level 1 Visualization) or
functional domain (Level 2 Visualization).

o The open circle, o, represents the query species and closed circles, •, represent the species with the highest percent similarity within the
pecified taxonomic group.
o The top and bottom of each box represent the 75th and 25th percentiles, respectively. The top and bottom whiskers extend to 1.5 times th
interquartile range.
o The mean and median values for each taxonomic group are represented by horizontal thick and thin black lines on the box, respectively.

o The dashed line indicates the cut-off for susceptibility predictions (based on ortholog analysis).

43

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1 and 2 BoxPlot Page - Controls
Upon clicking the "BoxPlot" icon on either Level 1 or Level 2 Visualization Information pages, a box for
the boxplot ""Controls" and a box for the interactive boxplot will open, respectively.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level Two Visualization - Primary Report
Level 1 Query Protein Information
SeqAPASS ID: 1290	Query Accession: NP 000116 2
Query Species: Homo sapiens
Qrtholog Count: 348
Query Domain: (310; cd06949 , NR_LBD_ER . Ligand binding domain of Estrogen receptor, which are activated by the hormone 17beta-estradiol (estrogen)
Select to Open Information or Data Visualization
©liti
: Mammalia
I Aves
! Testudines
ill
l Coelacanthifomnes *_!! Acfinopteri *
Cladistia * Myxiniformes » Petromyzontiformes * Bivalvia * Branchiostomidae * Gastropoda * Enteropneusta * Priapulimorpha * Ascidiacea "
Groups	Cephalopoda * Polychaeta » Arachnida	Malacostraca « Insecta * Coliembola « Hexanauplia » Liliopsida " Pilidiophora * Lingutata * Enopla «
(x-axis I
labels)
I Ctrteliata » , 1 Echino'tdea » 11 Branchiopoda « 11 Hototfturoitfea "¦ |1_
Monogonorrta «?11 HfiSpaliifidae * ! I Hffifrozoa'1—pfBSBg|IBgiBWL 4T| p'Stfy^nozoa panmoF
En op lea * | f Appendicularia * Cestoda
[ Mefostomata «•] | ftsteromea w"| |
["CfiftBtggfttea »*! fTMeonycB^tiora

Diplopoda
f) Common Name
, Scientific Name
Q Group by Common Name
© Optional Selections:
Ortholog
Threatened
Endangered
Common Model
Candidates.
Species:
Species:
Organisms:
y
~
~
y
Downioad BoxPlot...	Open Size Controls...
Taxon
44

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Manipulating Taxonomic Groups on x-axis
The boxplot controls allow the user to edit the taxonomic groups that are displayed on the x-axis by
clicking on the ""X" for the Taxonomic Group name (e.g., Aves). This action removes the selected group
from the x-axis. To the right of the "Taxonomic Groups" controls box is a drop-down that allows the user
to remove or add back taxonomic groups to the x-axis of the boxplot graphic, by deselecting or selecting
check-boxes in the dropdown. Similarly, unwanted taxonomic groups may be removed directly from the
boxplot by hovering the cursor over the taxonomic groups listed along the x-axis. The user will notice
that the selection arrow changes to a black arrow with a red "x' next to it; clicking the taxonomic group
will then remove it from the boxplot and the "Taxonomic Groups" controls box. The user can delete
multiple species by pressing CTRL and either clicking individual species or slowly dragging across
multiple species. Additionally, that taxonomic group will have the checkbox deselected in the
"Taxonomic Groups'' controls box drop-down list.
Taxonomic
Groups:
(x-axis
labels)
Select
Species
for
Legend:
Species
Legend
Options:
Optional
Selections:
Mammalia	Testudines * Aves * Crocodylia * Lepidosauria
Ceratodontimorpha * Coelacanthiformes * Actinopteri * Cladistia * Petromyzontiformes * Myxiniformes *
Enteropneusta
Gastropoda
Branchiostomidae x Cephalopoda
Lingulata *_ Polychaeta * Arachnida * Malacostraca >jp| ft Insecta
Priapulidae
Enopla * Maxillopoda
Branchiopoda * Echinoidea
Merostomata * Clitellata * Liliopsida * Eutardigrada * Monogononta *
Rhopaluridae * Anthozoa * Asteroidea * Appendicularia * Hydrc
* Scyphozoa * Trichoplax *
Chilopoda * Cubozoa * Peripatopsidae * Tricladida * Chromadorea * Enoplea * Macrostomida x
Trematoda * Cestoda * Diplopoda * Anopla *
m Common Name
I i Scientific Name
5 Group by Common Name
Ortholog
Threatened
Endangered
Common Model
Candidates
Spedes
Species
Organisms
Q
¦
y
H
Download BoxPlot...	Open Size Controls...
Boxplot
fl
I'.

I llll -l | I 3 | |
§ ? ! ^ |fi t I 1 I l5 ! .2 1 I J
^¦=-5	° =5 -g -s -5 S ° :
111 %*%$:
Taxon
45

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Customize BoxPlot Legend
The user may customize the boxplot by adding a legend that will pinpoint species of interest on the
boxplot. Upon clicking the drop-down for "Select Species for Legend" in the controls box the user may
search in the text box for specific species to display in the boxplot legend. Upon identifying a species
from the drop-down menu and selecting the checkbox the species name will be placed in the boxplot
legend and a corresponding data point will be produced on the graph. The default settings display the
species common name both in the "Select Species for Legend" dropdown and on the boxplot. However, if
the species scientific name is desired, the user can select the radio button for '"Scientific Name" in the
controls box for "Species Legend Options." This action will change the drop-down menu and species in
the legend to display the species scientific name.
Note: The database will take a brief moment to update the list upon changing between "Common Name"
and "Scientific Name."
Mammalia	Testudines * II Aves
Amphibia
* Coelacanthiformes *
Actinopteri * Cladistia * Petromyzontiformes * Myxiniformes * Enteropneusta * Gastropoda * Eivalvia BW Brarichiostomidae
Cephalopoda
Taxonomio
© Groups: Hexanauplia * Enopla
(x-a
labels)
Select
q Species
g» Species Lege
Priapulimorpha * Ascidiacea * Lingulata * Polychaeta * Arachnida * Ma!acostraca * Insecta * Collembola *
| Clitellata * 11 Liliopsida * | [ Eutardigrada * 11 Monogononta * J
Branchiopoda * Echinoidea * Merostomata
Holothuroidea * Rhopaluridae * Anthozoa * Asteroidea * Appendicularia >
Chilopoda * Cubozoa * Udeonychophora » Rhabditophora * Chromadorea
Hydrozoa
Polyplacophora
Enoplea * Trematoda * Cestoda
~j Aardvark
j§§ Abalones
Acorn worms
J Group by Common Name

_| Adelie penguin
Endangered
Common Model
Optional
1 African clawed frog
Species
Organisms
Selections:
S African cotton leafworm
~
q
Download BoxPlot...
Open Size Controls...
_CU 60-
I
(/) 50
CD
CL
®	Abalones	~ Chimpanzee
0	American beaver © Chum salmon
¦	Anna's hummingbird
A	Bactrian camel
ft) A •

^ ^	A
b: S J	t- o -S f- t- t	p-y — m F !u oi o ir n "J x> r= — in !» o	™ to s:
1 |!l||i!3li|fii|!!|l'l!Il!l'slili!f!!t!|li'
& O O- ^ S c < oxw ro P o. 5  ,9
O ^
0 
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Change Species Display on Plot
Multiple scientific names can be represented by only one common name (e.g., Common name: Teleost
fishes; corresponding scientific names: Spinibarbus denticulatus, Sinocyclocheilus rhinocerous,
Sinocyclocheihis grcthami, Sinocyclocheihis anshuiensis, Gobiocypris rants, Thamnacomis
septentrionalis). Therefore, if a species common name that represents multiple species was used to create
the legend, and the user decides to instead select "Scientific Name," by default the boxplot legend will
change to display multiple scientific names that representing the individual common name and each
scientific name will be represented by a unique color/shape point on the plot. However, if the user selects
the checkbox "Group by Common Name" in the "Species Legend Options" control box, then the
scientific names that are represented by one common name will all display the same color/shape point on
the plot.
The user has the option of removing selected species from the legend either by removing them directly
from the "Select Species for Legend" drop-down box or by hovering the mouse directly over the species
name in the legend. The mouse will change to a black arrow with a red 'x' next to it. Clicking the name
while this arrow is displayed will remove the species from the legend and from the control box.
Mammalia Testudines * Aves * Crocodylia * Lepidosauria * Amphibia * Chondrichthyes * Ceratodontimorpha * Coelacanthiformes * Actinopteri * Cladistia
Taxonomic Petromyzontiformes «j | Myxiniformes * Enteropneusta *51 Gastropoda * Bivalvia *! Branchiostomidae * I Cephalopoda BjP] Priapulimorpha * Ascidiacea * Ungulata
Arachnids
labels)
Malacostraca * Insecta * Collembola * Hexanauplia * Enopla * I Branchiopoda * Echinoidea * Merostomata * Clitellata * Liliopsida W?
' Holothuroidea * Rhopaluridae * Anthozoa * Asteroidea Appendicular^ * | Polyplacophora * Hydrozoa * Scyphozoa * Trichoplacidae * Chilopoda * Cubozoa *
Udeonychophora * Rhabditophora * Ghromadorea * Enoplea * Trematoda * Cestoda «j| Diplopoda *| Pilidiophora *
Select
> Species
Haliotis diversicolor * Castor canadensis *
Calypte anna * Camelus bactrianus * Pan troglodytes * Oncorhynchus keta * Gy
mnogyps catifornianus *
Mysia californica *
Sinocyclocheilus anshuiensis «! Sinocydoche
lus rhinocerous * Sinocyclocheilus grahami * Spinibarbus denticulatus * Gobiocypris
arus *

© Optional Selections
Download BoxPlot...
i. . Common Name
<§> Scientific Name
| Group by Common Name
Ortholog
Candidates
Threatened
Endangered
Common Model Organisms


m

Open Size Controls...
®	Hallotls dverslcolor
0	Castor canadensis
¦	Calypte anna
A	Camelus baotrianus
~	Pan troglodytes
©	Onccrhynchus keta
0	Gymnogyps calitornianus
¦	Aplysia californica
~	Sinocyclocheilus anshuiensis
~	Sinocyclocheilus rhinocerous
~	Slnocyclccheilus granami
~	Spinibarbus denticulatus
~	Gobiocypris rarus
"i-*
iii-
5 ip >-¦§.£•
a 8
ifSillli:
I! 1 S11 i £,
i j § i
c o '= 9 o = "S. 2
47

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Customize the Legend to Display Species Groups of Interest
In the "Optional Selections" controls box, the user has the option of displaying "Ortholog Candidates,"
"Threatened Species," "Endangered Species," or "Common Model Organisms." Upon selecting one of
the checkboxes, red data points corresponding to species will be displayed on the boxplot. By hovering
the mouse over a single red point, a pop-up box will appear with the corresponding species name,
taxonomic ID, query protein, and percent similarity.
Note: The user can select to display either species common name or scientific name in the hover over
information box by selecting from the "Species Legend Options."
If the user selects either "Threatened Species" or "Endangered Species," clicking on an individual red dot
will open a new web browser tab and link to the corresponding species page on th US Fish and Wildlife
Service's Environmental Conservation Online System (USFWS, ECOS; e.g.,)
(https: //ecos. fws. gov/ccpO/profilc/spccic sProfile ? sld= 1506).
0 Optional Selections:
Ortholog Candidates: Threatened Species: Endangered Species:
~	B	0
Common Model Organisms:
a
Download BoxPlot... Open Size Controls...
Boxplot
• Endangered Species
"I"
03 60-
E
C0 50-
C
V
O 40j
L_
0
Q_
Rainbow trout (taxid: 8022)
Estrogen receptor isoform X3
64.51% similarity
J.EZ.
*


11
o £ e
O — <1> c
^ o
if
~ 5
— q — -a o
o> <= ffl o g-
c m r? - I
™ £ 31 •
ro o ro o o
- "o ™ Q_ 3
' & & O
° ° O ^ TO
iz r. £ o. .
Q. Q_ TO O 3
Taxon
48

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
BoxPlot Controls Widget for Bar Width, Zoom and Pan
By clicking the "Open Size Controls" button, a "BoxPlot Controls" widget opens that allows the user to
adjust the size of the bars on the boxplot by increasing or decreasing the "Bar Width" using the up and
down arrows. The minimum and maximum size for bars are 6 and 60, respectively. To reset the bar width
on the boxplot to default size, click the "Reset" button to the right of the "Bar Width" adjustment box in
the "BoxPlot Controls" box. The user can also Zoom and Pan the boxplot by toggling the on /off button
under the "Zoom" heading. The user can then zoom in or out by clicking the up or down arrows or
entering a number in the text box and clicking enter. To reset the zoom on the boxplot to default size,
click the "Reset" button to the right of the "Zoom" adjustment box in the "BoxPlot Controls" widget.
The pan option is available when the "Zoom and Pan" option is toggled to the "on" position, which
allows the user to click on the boxplot and drag the plot around the screen to reposition. To reset all
BoxPlot Controls to default settings click the "Reset All" button.
Note: Upon exiting out of the BoxPlot Controls widget, the Zoom and Pan options are automatically
turned off.
BoxPlot Controls
Bar Width	
18 i Reset
I Zoom	 I
125 * Reset
Zoom & Pan on
Reset All
Download BoxPlot Widget
To download the boxplot, click "Download BoxPlot" button in the controls box. A "Download Boxplot"
Widget will pop up. It will be necessary to specify which type of file (SVG, PNG, or JPG,) to
downloaded by clicking on the desired radio button for "Image Type." The user may customize the
resolution of the boxplot for PNG and JPG files prior to download by altering the "Width" and "Height"
of the BoxPlot. To change "Width" or "Height," enter the desired number in the text boxes. Click
"Download Image" button to download the file. To close the "Download Boxplot" widget, click the "x"
on the top right of the widget.
Download Boxplot
Image 0 ®

Type: SVG PNG JPG
Width: 1,236

Height: 755

Download Image


49

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Hover-over Features in the BoxPlot
By hovering over a taxonomic group name on the x-axis of the box plot, an information box will pop-up
listing the top three species in order by highest percent similarity. If only one or two species are
represented in the taxonomic group, then only those species will be displayed. Hovering the mouse over
any of the species in the boxplot, that is present in the legend, will generate a pop-up box with the
corresponding species name, taxonomic ID, query protein, and percent similarity. The susceptiblity cut-
off is displayed in a pop-up text box upon hovering over the dashed horizontal cut-offline.
Summary Table for Species in a Specific Taxonomic Group
By clicking on a box representing a taxonomic group in the boxplot a table will pop-up providing
summary information for that particular group. The table header will provide summary statistics (i.e.,
mean and median percent similarity), including the Taxonomic Group name, number of species
represented in the box, the overall susceptiblity prediciton for the selected taxonomic group. Data table
includes protein and species information along with metrics for evaluated protein similarity and
predicting suseptiblity. Also inlcuded in the table are columns indicating if a species belongs to a certain
group of interest (e.g., Threatened Species; Endangered Species, Model Organism). Table can be
downloaded by clicking on the icon for excel or csv file.
Interactive Visualization with Level 1 Data Page and Level 2 Data Page
The data visualization is programmed to update with changes made to the Level 1 Data page and Level 1
Data page, respectively. Therefore, if the user updates the Susceptibility Cut-off (See user guide section
Susceptibility Cutoff Box for Level 1 and Susceptibility Cutoff Box for Level 2) to the "Second Local
Minimum" or "User Defined Cut-off," the previously opened data visualization boxplot tab will update
the cut-off accordingly. Similarly, the user modifies the Primary Report Settings (See user guide section
Level 1: Primary Report Settings and Level 2: Primary Report Settings), the data visualization will
update accordingly.
Note: If the user updates the "Primary Report Settings" for "Sorted by Taxonomic Group" the boxplot
will update to display the new taxonomic group selection that is present in the "Filtered Taxonomic
Group" column in the data table. The user should be aware that manipulating the "Sorted by Taxonomic
Group" to a different level in the taxonomic lineage (e.g., from class to order; from class to genus) adds a
larger number of taxonomic groups to the x-axis. Therefore, the plot may require greater user
manipulation using the BoxPlot Controls to view the data.
50

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3: Individual Amino Acid Residue Alignment
In the "View SeqAPASS Reports" tab, on the "Level 1 Query Protein Information" page, there is a
"Level 3" dropdown for setting up the query for comparing individual amino acid residues to a template
sequence. It is anticipated that the choice of template sequence and residues that are selected to align will
be derived from the published literature in most cases. Publications evaluating homology models, protein
crystal structures, pesticide field resistance, or utilizing site-directed mutagenesis are a few examples of
the types of studies that may contain such information to guide a Level 3 SeqAPASS evaluation.
Level 3
— Reference Explorer
Additional	I
Names:	I	
Add Protein Name
estrogen receptor isofbrm 1
Remove Selected Protein	Restore Default Proteins
Generate Google Scholar Link
Level 3 Query Amino Acid Residues
NCBI Protein Database fBcrr
Select Template Sequence
O
Additional Comparisons (optional)
I	lo
NCBI COBALT (EXIT
Enter Level 3 Run Name
I*
NCBI Taxonomy Database extt
Choose Taxonomic Group(s)
| All Groups	| w | 0
Use table below to select sequences
0 species selected
Request Residue Run
View Single Report
Choose Query to View
[ -Select Level 3 Run Name - ~p|Q
View Level 3 Data
View Combined Report
Combine Level 3 Data
Relevant literature containing these data can be identified using the SeqAPASS "Reference Explorer."
The user can search for literature with the protein(s) of interest with an auto-populated search term that is
integrated into a predefined Boolean string and generate a Google Scholar link that will take them to
scientific articles containing their protein(s).
— Reference Explorer
Additional
Names:
IE
Add Protein Name
estrogen receptor isoform 1
Remove
Selected
Protein
Restore
Default
Proteins
Generate Google Scholar Link
51

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The user can modify the Boolean search string by adding text to the "Additional Names" text box and
clicking the "Add Protein Name" button. By selecting a name that is currently in the text box and clicking
the "Remove Selected Protein" button, the user can delete names from the text box and therefore these
names will not be included in the Boolean string for the Google Scholar search.
— — Reference Explorer
Additional I „ i
Names:	I
Add Protein Name
estrogen receptor isoform 1
oestrogen
Remove

Restore
Selected

Default
Protein

Proteins
Generate Google Scholar Link
When satisfied with the protein names to be included in the Boolean search string, the user will select the
"Generate Google Scholar Link" button. A pop-up will appear displaying the Boolean sting to be
searched in Google Scholar. The user can continue to modify the Boolean string by clicking in the text
and adding additional information. Hie Boolean string can be copied and pasted elsewhere by the user by
clicking the "Copy to Clipboard" button. The user can also choose to use the generated Boolean string to
search Google Scholar. To do so the user will select the "Search Google Scholar" button.
Google Scholar
https://scholar qooqle.com/scholar?hl=en&as sdt=0%2C34&q=(estroqen receptor isoform 1 )AND("site-directed mutagenesis"
OR "molecular docking" OR "docking analysis" OR "docking simulations" OR "x-ray crystallography" OR "crystal structure"
OR "homology modeling" OR "protein structure" OR "protein binding" OR "molecular model" OR "binding" OR "field
resistance" OR "amino acid" OR "amino acid residues" OR "mutation" OR "mutations" OR "molecular dynamics" OR
"transcriptional activation" OR "3D-pharmacophore" OR "pharmacophore" OR "structure-based" OR "chemo-bioinformatics"
OR "3D-stuctures" OR "3D-QSAR")
Search Google Scholar	Copy to Clipboard
Upon selecting the "Search Google Scholar" button, a new tab will be generated in the browser for
Google Scholar that contains the Boolean string in the search with publications and articles that matched
the SeqAPASS generated Boolean sting. The literature displayed by Google Scholar for the user should
be evaluated to identify appropriate articles for determining Level 3 template sequences and critical
individual amino acids for comparisons across species.
Q Seqence Alignment to Predict A; X ^ (estrogen receptor isoform 1 )AT X +	—OX
4r C O A scholar.google.com/5cholar?hl=en&as_sdt=0%2C348tq=(estrogen%20receptor%20isoform%201)AND("site-directed%20mutagenesis"%200... ~ o o a i :
I
Go gle Scholar (estrogen receptor isoform 1 )AND("site-directed mutagenesis" OR "moleculai I
Articles
About 18,500 results (0.16 sec)
My profile if My library £
Any lime
Since 2019
Since 2018
Since 2015
Custom range...
Sort by relevance
Sort by date
•/ include patents
•/ include citations
Role of Pit-1 in the gene expression of growth hormone, prolactin, and
thyrotropin
LE Cohen, FE Wondisford, S Radovick - Endocrinology and metabolism ... 1996 - Elsevier
90 The ERE is distinct from but may interact cooperatively with, the other hormone response
elements 1 binding sites and the ER are required for distal enhancer activation by estradiol in
vitro ... Other Pit-1 binding sites also contribute to the estrogen response of the Prl gene, so ...
~ 90 Cited by 187 Related articles All 6 versions Web of Science: 108 £~
[html] Understanding the selectivity of genistein for human estrogen receptor-|3
using X-ray crystallography and computational methods
ES Manas, ZB Xu, RJ Unwalla, WS Somers - Structure, 2004 - Elsevier
up the possibility of targeting other tissues while avoiding certain classical estrogenic effects both
known to enhance ligand-dependent transcriptional activation of the estrogen receptor and they
GEN, 17-fj estradiol (E2), diethylstilbestrol (DES), and daidzein (see Figure 1) were
& 00 Cited by 176 Related articles All 7 versions Web of Science: 125 &S>
[html] sciencedirect.com
52

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
In the "Level 3" box, there is a link out to the "NCBI Protein Database" for identifying the template
sequence of interest. Below this link the user will find a text box where the user can enter an NCBI
Protein Accession with the version number (e.g., NP_000116.2) or a FASTA formatted sequence (e.g., <
>gi|62821794|ref|NP_000116.2| estrogen receptor isoform 1 [Homo sapiens]
MTMTLHTKASGMALLHQIQGNELEPLNRPQLKIPLERPLGEVYLDSSKPAVYNYPEGAAYEFNA
AAAANA
QVYGQTGLPYGPGSEAAAFGSNGLGGFPPLNSVSPSPLMLLHPPPQLSPFLQPHGQQVPYYLENE
PSGYT
VREAGPPAFYRPNSDNRRQGGRERLASTNDKGSMAMES AKETRY C AV CNDYASGYHY GVW SC
EGCKAFFK
RSIQGHNDYMCPATNQCTIDKNRRKSCQACRLRKCYEVGMMKGGIRKDRRGGRMLKHKRQRD
DGEGRGEV
GSAGDMRAANLWPSPLMIKRSKKNSLALSLTADQMVSALLDAEPPILYSEYDPTRPFSEASMMG
LLTNLA
DRELVHMINWAKRVPGFVDLTLHDQV).
Upon clicking on in the "Select Template Sequence" text box, a pop-up message will appear to provide
examples for the proper format of Accessions or FASTA files to be entered. A link out to the NCBI
Protein Database is available for the user and found above the template entry text box.
NCBI Protein Database
Select Template Sequence
Additional Comparisons (optional)
NCBI COBALT
Enter Level 3 Run Name
NCBI Taxonomy Database exit
Choose Taxonomic Group(s)
All Groups
Use table below to select sequences
0 species selected
Request Residue Run
Choose Query to View
•I®
-Select Level 3 Run Name
View Level 3 Data
Combine Level 3 Data
View Single Report
View Combined Report
Level 3 Query Amino Acid Residues
-Enter NCBI Protein Accession OR FASTA Sequence-
Examples:
NP 000116.2
OR
>Sequence description in first line
MTMTLHTKASGMALLHQIQGNELEPLNRPQLKIPLERPLGEVYLDSSKPAVY
53

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Additional sequences can (this is an optional field the user can choose to fill in) also be incorporated into
the Level 3 alignment using the "Additional Comparisons (optional)" text box. Upon clicking on the
"Additional Comparisons (optional)" text box, a pop-up message will appear to provide examples for the
proper format of Accessions or FASTA files to be entered.
Note: In the "Additional Comparisons (optional)" text box, zero or more NCBI Protein Accession must
be entered prior to FASTA sequence(s) if they are to be included in the Level 3 alignment.
NCBI Protein Database [exit
Select Template Sequence
Additional Comparisons (optional)
NCBI COBALT EXIT
Enter Level 3 Run Name
NCBI Taxonomy Database H
Choose Taxonomic Group(s)
All Groups
Use table below to select sequences
0 species selected
Request Residue Run
Choose Query to View
-Select Level 3 Run Name -
View Level 3 Data
View Combined Report
Combine Level 3 Data
View Single Report
Level 3 Query Amino Acid Residues
-Enter 0 or more NCBI Protein Accession(s) followed by 0 or more FASTA Sequence(s)-
Examples:
NP 000116.2
1JLYA
>Sequence description of first FASTA
MTMTLHTKASGMALLHQIQGNELEPLNRPQLKIPLERPLGEVYLDSSKPAVY
>Sequence description of second FASTA
XAGLPVIMCLKSNNHQKYLRYQSDNIQQYGLLQFSADKILDPLAQFEVEPSKTYDGLV
Below the text box where the user can choose to add additional sequences for comparison, is a link to
NCBI COBALT (Constraint-based Multiple Protein Alignment Tool). The NCBI COBALT allows the
user to align multiple sequences and is the alignment tool that SeqAPASS algorithms utilize to set up the
query of individual amino acid residues across species.
Note: The user does not need to use the COBALT link to run a Level 3 evaluation, however the link is
available in case the user chooses to further evaluate or compare multiple potential template sequences.
Under the text "Enter Level 3 Run Name," there is a text box where the user can enter a user defined
name for the run. The user may only enter letters or integers as text for the name. The user defined name
will appear in the "View Level 3 Data" dropdown upon completion of the Level 3 sequence alignment.
54

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3
— Reference Explorer
Additional
Names:
Add Protein Name
estrogen receptor isoform 1
Remove Selected Protein	Restore Defauit Proteins
Generate Google Scholar Link
Level 3 Query Amino Acid Residues


NCBl Protein Database exit
Select Template Sequence
o
Additional Comparisons (optional)
I	I ©
NCBl COBALT exit
Enter Level 3 Run Name
I		«
NCBl Taxonomy Database exit
Choose Taxonomic Group(s)
| All Groups	p~| ©
Use table below to select sequences
0 species selected
Request Residue Run
— View Single Report
Choose Query to View


| -Select Level 3 Run Name - - |0
View Level 3 Data

View Combined Report
Combine Level 3 Data
To complete the set-up for a Level 3 query the user must select which sequences to compare to the
identified template sequence. Listed in the Choose Taxonomic Group(s) drop-down are all Taxonomic
Groups that were identified as hits in the "Level 1" primary amino acid sequence alignment data. Because
COBALT is used to align all sequences that are selected, it is recommended that the user selectively
identify sequences from the hit table below to align. For example, selecting sequences with low similarity
to the template sequence along with sequences sharing high similarity to the template sequence can skew
the alignment because COBALT is trying to align all the sequences together. It is recommended that the
user select sequences by first selecting a taxonomic group from the "Choose Taxonomic Group(s) drop-
down. The user can also use the NCBl taxonomy link to type in the name of the "Taxonomic Groups"
found in the drop-down to look up which species fall in that group.
55

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Actinopteri
Amphibia
Anthozoa
Appendicularia
Arachnida
ftTiriiflrra
Level 3 Query Amino Acid Residues
NCBI Protein Database (EXT
Select Template Sequence
Additional Comparisons (optional)
NCBI COBALT EXIT
Enter Level 3 Run Name
[ Actinopteri
NCBI Taxonomy Database |s
Choose Taxonomic Group(s)
All Groups
View Combined Report
Combine Level 3 Data
©
Note: The "Choose Taxonomic Group(s):" drop-down will display the level of the taxonomic hierarchy
being displayed in the "Filtered Taxonomic Group" column of the Level 1 Data table. For example, if the
user changes the default option from "class" to "order," then "order will be displayed in the dropdown.
Level 3
~ Reference Explorer
Level 3 Query Amino Acid Residues
NCBI Protein Database UB
Select Template Sequence
Additional Comparisons (optional)
NCBI COBALT fflSp
Enter Level 3 Run Name
Order not Class
NCBI Taxonomy Database fgar
Choose Taxonomic Group(s)
[ All Groups
Acipenseriformes
Actiniaria
Amphipoda
Anabantiformes
Anguilliformes
AiwnfnrfflpH
View Combined Report
Combine Level 3 Data
56

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
By choosing a group from the drop-down menu, the "Level 1" table below will be fdtered by the selected
Taxonomic Group (see column "Taxonomic Group" in Level 1 data table). When a "Taxonomic group is
selected from the drop-down, it can take up to a few seconds for the Level 1 data table to filter
completely, depending on the size of the table. The user can then examine each hit protein in the Level 1
table and select those that they would like to compare to the template sequence. To select
sequences/species from the filtered Level 1 data table, the user will select the check boxes in the first
column of the table. Although it is not typically recommended, the user may also select the header check
box in the first column to select all sequences/species in the filtered table.
Note: The user can also type the "Taxonomic Group" of interest in the text search box at the top of the
drop-down for quick filtering.
Below is an example where the user selected the "Taxonomic Group" Actinopteri from the drop-down
and then selected individual sequences/species to align with the template sequence. The number of
selected species will be shown in the text above the "Request Residue Run" button.
Primary Report Settings
OH

Enter Level 3 Run Name
E-value:
Sorted by Taxonomic Group:
Common Domains:
Species Read-Across:
Update Report
"ciai
__)	0
a	o
~•
!"•« In	o
Use Default Settings
Refresh Level 2 and 3 runs
Actinopteri
NCBI Taxonomv Database

Choose Taxonomic Group(s)

[Actinopteri
•

Visualization

Visualize Data
This will open in a separate tab.

Use table below to select sequences
3 species selected
Request Residue Run
View Single Report
Choose Query to View
[ -Select Level 3 Run Name - 0
View Level 3 Data
View Combined Report
Combine Level 3 Data
0) Primary Report
Q Full Report
Partial Hit Protein Sequence
Show Only Eukaryotes
View Level 1 Summary Report
Level 1 Data • Primary
The following links exit the site |
Download Current Level 1 Report Settings







Search: Actinopteri ®





Data
version
NCBI Accession 5
Protein
Count 5
Species
Tax ID 5
Taxonomic
Group 5
Filtered
Taxonomic
Group 5
Scientific Name o
Common Name »
Protein Name 5
BLASTp
Bitscore i


—:—
BAG826531

512342


Atractoteus tropicus
-jjijjisjy-ir estrogen receotor alpha | |


RXM34939.1
I 22508
7906
Actinopteri
Actinopteri
Acioenser ruthenus
Sterlet
Estrooen receDtor
1
629.79

y

BAG826501 111304




estrooen receotor alohal


(See Search, View, and Download Data Tables section of user guide for more information)
The user can choose to align sequences/species from multiple taxonomic groups with the template
sequence, by going back to the "Choose Taxonomic Group" drop-down and selecting another group,
which filters the Level 1 table based on the group selected, and then the user can select additional species
from the newly filtered table. As before, the number of selected species can be tracked in the text above
the "Request Residue Run" button that reads "X species selected."
When the user has selected all sequences they want to align, then click the "Request Residue Run" button.
Upon successful submission of a Level 3 query the user will see the following pop-up message. If
submission is unsuccessful, a message will appear describing the reason for the unsuccessful submission.
57

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
j Level 3 Run Requested
Status queued
Home Request SeqAPASS Run SeqAPASS Run Status
View SeqAPASS Reports
Settings
\
— nr
SeqAPASS Reports

Version 4.0
Logged in as: Blatz,Donovan
To update the "Choose Query to View'' drop-down menu with the completed Level 3 alignments, the user
can click on the "Refresh Level 2 and 3 runs" button.
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

SeqAPASS Run Status


Version 4.0
Q Level 1 Status
© Level 2 Status
@ Level 3 Status
Refresh Data
Additionally, the user can check the status of the Level 3 run by clicking the "SeqAPASS Run Status" tab
and the radio button for "Level 3 Status." Typically, Level 3 alignments complete in a few seconds. When
the Level 3 query completes and the Level 1 page has been updated, the user defined Level 3 Run Name
will be available in the "Choose Query to View" drop-down menu. After selecting the desired Run Name
from the drop-down, click "View Level 3 Data" button to view the aligned sequences and set up the
individual amino acid residue alignments with the selected sequences/species.
View Level 3 Data
Choose Query to View
-Select Level 3 Run Name -
Actinopteri
Amphibia
Chondrichthyes
COBALT V1 to COBLAT v2
View Level 3 Data
Choose Query to View
Actinopteri
View Level 3 Data
Upon a successful Level 3 query submission a pop-up message will be displayed as follows in the upper
right-hand side of the screen:
'j Level 3 Run Requested
Status queued
58

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
— [View Single Report
Choose Query to View
¦Select Level 3 Run Name - * €>
View Level 3 Data
View Combined Report
Combine Level 3 Data
Once the Level 3 run has completed, the user can select the "Select Level 3 Run Name" drop down in the
View Single Report box to view an individual user defined level 3 run. If the user has completed multiple
Level 3 alignments, between a template sequence and more than one taxonomic groups, the user can
combine Level 3 reports by selecting the "Combine Level 3 Data" button. A pop-up will appear for the
Combine Level 3 Reports. There are a series of three steps to combine Level 3 reports. First the user will
Choose a Level 3 Template from the dropdown that contains a list of all templates used to generate
alignments in Level 3 by the user. The template sequence must be in-common to the Level 3 runs that will
be combined.

Combine Level 3 Reports
X



~

l!PM \iSj5E1S Level 3 Jobs Order Level 3 Jobs



Choose a level 3 Template:
-Select Level 3 Template -













I
NP_Q00116.2
(user defined) NP_0Q0116,2 estrogen receptor isoform 1 [Homo sapk

After selecting the template, the user will click the "Next"' button. At this point the user will select all
Level 3 Jobs that are to be combined by selecting the check box in the "Level 3 Jobs" dropdown next to
the user defined names. After all jobs that are to be combined are selected the user will click the "Next"
button. Note that as the user moves through each step of the Combine level 3 Reports feature, the step the
user is currently on is indicated by highlighting the button in blue coloring (example "Level 3 Jobs"
button is highlighted when working on selecting Jobs to combine).
59

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Combine Level 3 Reports
Level 3 Templates Is
Order Level 3 Jobs
Choose level 3 Job(s):
Choose level 3 Job(s) "
la

o
m
Amphibia

m
Aves

m
Actinopteri

The next step in the Combine Level 3 Reports feature is to put the jobs in order as to how they should be
displayed in the output. Typically, sequences from an individual taxonomic group are aligned to a
template sequence and named accordingly (e.g., Actinopteri, Amphibia, Aves, etc.). It may be useful to
order the combined report similarly to how the taxonomic groups are displayed on the x-axis of the Level
1 or Level 2 data visualization. Therefore, the user can select the user defined name from the Order Level
3 Jobs: text box and drag and drop the name to the desired order from top to bottom. To move on to select
individual amino acids for sequence comparisons the user will select the "View Level 3 Data" button.
Combine Level 3 Reports
Level 3 Templates Level 3 Jobs
Order Level 3 Jobs:


Amphibia
Aves



View Level 3 Data
«- Back
The order selected will translate to the top to bottom order displayed in the data table, with the template
sequence only displayed once ill the first row and all selected jobs below.
Level 3 Data - Primary
The following links exit the site HH	Download Current Level 3 Report Settings
Search:! Enter keyword ®
Data
Version
Job Name NCBI Accession 0 Count'" TEbTiD^* Taxonomic Group 0 Scientific Name C
4
Amphibia
NP 000116.2
1265506
9606
Mammalia
Homo sapiens
4
Amphibia
OCT77903.1
130454
8355
Amphibia
Xenopus laevls
4
Amphibia
BAF30926.1
83
166789
Amphibia
Andrias japonicus
4
Amphibia
AU W64608.1
1591
141262
Amphibia
Andrias davidianus
4
Amphibia
BAE81788.1
94392
8364
Amphibia
Xenopus tropicalis
4
Amphibia
BAJ05031.1
18
2040589
Amphibia
Sclerophrvs capensis
4
Aves
XP 019468458 1
34219
9103
Aves
Meleaaris aallopavo
4
Aves
XP 025978017.1
31563
8790
Aves
Dromaius novaehollandiae
4
Aves
KFQ02396.1
30590
8969
Aves
Haliaeetus albicilla
4
Aves
XP 010580195.1
25311
52644
Aves
Haliaeetus leucocephalus
(1 of 2)	1 2 - •• 10' Download Table:
60

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View Level 3 Individual Amino Acid Query and Data Page
Clicking the "View Level 3 Data" button, the Level 3 data page opens. The "Level 3 Template Protein
Information" box contains the SeqAPASS Run ID, Query Accession (with link out to NCBI), Ortholog
Count (# of hits identified as ortholog candidates to the query species protein sequence), NCBI Data
(displays the date that NCBI databases and executables were downloaded and incorporated into
SeqAPASS), Level 3 Run Name (defined by user), Template Species (Entered by user in Level 3 query),
Template Protein, and Query Residues (this field is populated with residues upon selection and successful
table update).
Main Level 1 Level 3

Level 3 Template Protein Information
Individual amino acid residue(s) aligned with template sequence. Use the ma
n button to go back to the SeqAPASS Reports list.

SeqAPASS ID: 1290
Query Accession: NP 000116 2 but Ortholog Count 348
Protein and Taxonomy Data: 02/28/2019
Level 3 Run Name: Actinopteri


BLAST Version: 2.8.1
Template Species: Homo sapiens


Cobalt Data: 07/09/2010
Template Protein: [NP_000116.2] estrogen receptor isoform 1

Cobalt Version: 2.1.0
Query Residues: No Residues Selected


Software Version: 3.2
Show Amino Acid Info...





Select Amino Acid Residues 00





1M *



2T



4T

Enter Amino Acid Residue Positions
I J

6H
I 171

Copy to Residue List

9A ~ |



Update Report







^ Primary Report
Q Full Report
View Level 3 Summary Report

Level 3 Data - Primary
The folowing links exit the site |B0T|	Download Current Level 3 Report Settings
Search: Enter keyword ®
Data
Version
NCBI Accession S
Protein
Count C
Species
Tax ID 0
Taxonomic Group 0
Scientific Name o
Common Name c
Protein Name S
Analysis Completed S
Similar
Susceptibility as
Template >
4
NP 000116.2
1265506
9606
Mammalia
Homo sapiens
Human
estroaen receotor isoform 1
2019 08 29 14:55:59
TBD
4
AAU87498.1
495
90988
Actinopteri
Pimeohales Dromelas
Fathead minnow
estrooen receDtor aiDha
2019 08 29 14:55:59
TBD
4
XP 014061037.1
112166
8030
Actinopteri
Salmo salar
Atlantic salmon
PREDICTED: estrooen receotor isoform X2
2019 08 2914:55:59
TBD
4
XP 020570152.1
47555
8090
Actinopteri
Orvzias latipes
Japanese medaka
estroaen receptor
2019 08 29 14:55:59
TBD
4
XP 021454037.1
124397
8022
Actinopteri
Oncorhvnchus mvkiss
Rainbow trout
estroaen receDtor isoform X3
2019 08 2914:55:59
TBD
4
AAI62466 1
87698
7955
Actinopteri
Dan io rerio
Zebrafish
Estroaen receDtor 1
201908 29 14:55:59
TBD





(1 of 1) * 3 T Download Table:"

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
For additional information on Amino Acid Residues, including definition of the acronym, the amino acid
residue name, the classification for the amino acid side chain and the size of the amino acid residue based
on molecular weight, the user can click the "Show Amino Acid Info..." button. A pop-up table, "Amino
Acid info," will be displayed providing this information.
Level 3 Template Protein Information
Individual amino acid residue(s) aligned with template sequence. Use the main button to go back to the SeqAPASS Reports list.
SeqAPASS ID: 1290	Query Accession:^EJ2QQ31iL2_^B	
Level 3 Run Name: Actinopteri
Template Species: Homo sapiens
Template Protein: [NP_000116.2] estrogen receptor isoform 1
Query Residues: No Residues Selected
Amino Acid info
Show Amino Acid Info...
Update Report
jjl Primary Report
Q Full Report
The following links exit the site EXIT
Ortholog Count: 348
ID 0
Name 0
Side Chain $
Size 0
A
Alanine
Aliphatic
89.094
C
Cysteine
Sulfur-Containing
121.154
D
Aspartic Acid
Acidic
133.104
E
Glutamic Acid
Acidic
147.131
F
Phenylalanine
Aromatic
165.192
G
Glycine
Aliphatic
75.067
H
Histidine
Basic
155.156
1
Isoleucine
Aliphatic
131.175
K
Lysine
Basic
146.189
L
Leucine
Aliphatic
131.175
M
Methionine
Sulfur-Containing
149.208
N
Asparagine
Amidic
132.119
P
Proline
Aliphatic
115.132
Q
Glutamine
Amidic
146.146
R
Arginine
Basic
174.203
S
Serine
Hydroxylic
105.093
T
Threonine
Hydroxylic
119.119
U
Seleno-cysteine
Sulfur-Containing
168.064
V
Valine
Aliphatic
117.148
w
Tryptophan
Aromatic
204.228
X
Unknown
Unknown

Y
Tyrosine
Aromatic
181.191
Protein and Taxonomy Data: 02/28/2019
BLAST Version: 2.8.1
Cobalt Data: 07/09/2010
Cobalt Version: 2.1.0
Software Version: 32
Download Current Level 3 Report Settings
To obtain individual amino acid residue alignment data in the Level 3 data table, the user must use the
shuttle in the "Level 3 Template Protein Information box to select positions and amino acid residues from
the chosen template sequence to align with the sequences/species that were selected by taxonomic group.
Single letter abbreviations are used for the amino acid sequences.
G: Glycine A: Alanine S: Serine T: Threonine C: Cysteine V: Valine
L: Leucine I: Isoleucine M: Methionine P: Proline F Phenylalanine U: Seleno-cysteine
Y: Tyrosine W: Tryptophan D: Aspartic Acid	E: Glutamic Acid
N: Asparagine Q: Glutamine H: Histidine K: Lysine R: Arginine
62

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The user can select one residue at a time by clicking and highlighting the residue of interest and then
clicking the top right arrow shuttle button to move the residue to the right-hand box for inclusion in the
alignment. Each time a residue is added to the right-hand box, the left-hand box resets itself to the 1st
residue. Or the user can select multiple residues at the same time by holding the Ctrl button, clicking on
residues, and then clicking the top right arrow shuttle button to move the residues to the right-hand box.
The user can choose to remove selected residues by using the left arrow button to clear one at a time or
the double left arrow button to remove all selected residues at once. When residues of interest (likely
defined from the literature as described above) have been selected, click the "Update Report" button,
which then updates the Level 3 Data table with the individual residue alignment data.
Select Amino Acid Residues
1M
2T
J
LJ
3M
219Y
4T

~
267H
5L

26SK
6H

LaJ
272D
7T

594T
SK



9A



Update Report
Alternatively, the user can enter the amino acid positions in the "Enter Amino Acid Residue Positions"
text box (e.g., 351,353,362) and click the "Copy to Residue List" button.
Enter Amino Acid Residue Positions
351.353.362.364.394,524
Copy to Residue List
Upon clicking "Copy to Residue List" the "Select Amino Acid Residues" shuttle box is populated with
the position and residues typed. The user can then click the update Report button to produce Level 3
results in the table below.
351D
353E
362K
364V
394R
524H
3M
4T
6H
7T
8K
9A
1M
Update Report
Enter Amino Acid Residue Positions
351,353,362.364,394,524
Copy to Residue List
Select Amino Acid Residues
Ol-I
The individual amino acid residue alignment data will then be updated on the right most columns of the
Level 3 Data table. The user can submit a maximum of 50 individual amino acid residues from the
template sequence to compare to the other selected sequences. The individual amino acid residues will be
listed in numerical order starting with the 1st position in the template sequence to the last position in the
template sequence.
63

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3 Data - Primary Report
The default report is the "Primary Report" and can be recognized as such because the radio button for
"Primary Report" above the "Level 3 Data" table is selected.
The "Primary Report" columns for the alignment will be titled "Similar Susceptibility as Template" ("Y"
or "N" for yes or no, respectively), followed by Position 1, Amino Acid 1, Total Match 1, Position 2
Amino Acid 2, Total Match 2, Position 3, Amino Acid 3, Total Match 3.... The template sequence will
always be in the top row of the Level 3 Data table followed by the previously selected sequences. Further,
the residues selected in the shuttle will also be displayed in the top row corresponding to the template
sequence. Each Position and Amino Acid in the following rows are those corresponding to the Protein
Accession identified in that row and aligning with the template sequence. The Total Match X describes
whether the amino acid residue matches the template based on side-chain classification and molecular
weight, "Y," for yes, or "N," for not a match to the template. The user can evaluate this data to understand
how well conserved an amino acid residue is across species or in a species of interest to add an additional
line of evidence to support (or question) susceptibility predictions. The user can also download the current
report settings by selecting the "Download Current Level 3 Report Settings." This csv allows the user to
track which settings were used or changed by the user when downloading a data table.
% Primary Report
Q Full Report
View Level 3 Summary Report

Level 3 Data - Primary

The following links exit the site mgr

Download Current Level 3 Report Settings
Search: Enter keyword ®



Similar










Protein Name 0
Analysis
Completed 0
Susceptibility as
Position 1
Amino Acid
Total Match 1
Position 2
Amino Acid
Total Match
Position 3
Amino Acid
Total Match
Pos

Template 0










estroaen receDtor isoform 1
2019 Of
2914:55:59
Y
351
D
Y
353
E
Y
362
K
:

estroaen receptor alpha
2019 Of
29 14:55:59
Y
320
D
Y
322
E
Y
331
K
Y

PREDICTED: estrooen receptor isoform X2
2019 Of
2914:55:59
Y
316
D
Y
318
E
Y
327
K
Y

estroaen receptor
2019 Of
2914:55:59
Y
355
D
Y
357
E
Y
366
K
Y

estroaen receptor isoform X3
2019 Of
2914:55:59
Y
319
D
Y
321
E
*
330
K
*

Estroaen receptor 1
2019 Of
2914:55:59
Y
319
D
Y
321
E
V
330
K
Y




(1 of 1) • « [T
10 » Download Table:







When downloading the current level 2 report settings, the following information will be present in the
csv. If the user decides to change the default settings, the csv can be utilized for quick information if the
SeqAPASS page is no longer accessible.

A
B
1
Level 3 Report Settings

2


3


4
Analysis TimeStamp
2019 05 16 11:04:08
5
SeqAPASS version
3.2
6
Level 3 Run Name
Actinopteri
7
Template Species
Homo sapiens
8
Template Protein
[NP 000116.2] estrogen receptor isoform 1
9
Query Residues
1M, 2T, 3M, 4T, 5L, 6H, 7T, 8K, 9A, 10S
10
Query Accession
NP 000116.2
64

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3 Data - Full Report
The user may choose to view the Full Report for Level 3 data by selecting the radio button above the
"Level 3 Data" table for "Full Report." The table below will automatically update to display all of the
alignment details.
The "Full Report" columns for the alignment will be titled "Similar Susceptibility as Template" ("Y" or
"N" for yes or no), followed by Position 1, Amino Acid 1, Direct Match 1, Side Chain 1, MW1, MW
Match lTotal Match 1, Total Match 1, Position 2, Amino Acid 2, Direct Match 2, Side Chain 2, MW2,
MW Match Total Match 2, Total Match 2	The template sequence will always be in the top row of
the Level 3 Data table followed by the previously selected sequences. Further, the residues selected in the
shuttle will also be displayed in the top row corresponding to the template sequence. Each Position and
Amino Acid in the following rows are those corresponding to the Protein Accession identified in that row
align with the template sequence. The Total Match X describes whether the amino acid residue matches
the template based on side-chain classification and molecular weight, "Y," for yes, or "N," for not a
match to the template. The user can evaluate this data to understand how well conserved an amino acid
residue is across species or in a species of interest to add an additional line of evidence to support (or
question) susceptibility predictions.
The following links exit the site [ BCfT)	Download Current Level 3 Report Settings
Search: Enter keyword ®
Analysis Completed 0
Similar
Susceptibility as Position 1 Amino Acid 1 Direct Match 1 Side Chain 1
Template 0
Side Chain .....,
Match 1 MW1
MW Match 1
Total Match 1
Position 2 Amino Acid 2
201908 2914:55:59
o
3
>-
Y Acidic
Y
133.104
Y
Y
353
E
201908 2914:55:59
Y 320 | D
Y Acidic
Y
133.104
Y
Y
322
E
201908 29 14:55:59
Y 316 D
Y Acidic
Y
133.104
Y
Y
318
E
201908 29 14:55:59
Y 355 D
Y Acidic
Y
133.104
Y
Y
357
E
201908 2914:55:59
Y 319 D
Y Acidic
Y
133.104
Y
Y
321
E
201908 29 14:55:59
Y 319 | D
Y Acidic
Y
133.104
Y
Y
321
E

(1 Of 1) 1 10
Download Table: r —

The "Direct Match X" column describes whether the hit amino acid is an exact match to the template
amino acid, providing a "Y" or "N" for yes or no, respectively. The "Side Chain X" column indicates the
side chain classification for the amino acid residue (click on "Show Amino Acid Info... for more
information on classifications). The "Side Chain Match X" column indicates whether the hit side chain
has the same classification as the template amino acid, providing a "Y" or "N" for yes or no, respectively.
The "MW X" column indicates the molecular weight (g/mol) of the amino acid residue and the "MW
Match X" column indicates whether the hit molecular weight has a difference in molecular weight greater
than or equal to 30 g/mol compared to the template amino acid, providing a "Y" or "N" for yes or no,
respectively. For the "Total Match X" to be "Y," both "Side Chain Match X" and "MW Match X" should
be either "Y" and Y" or one "Y" and one "N," respectively. Only if both "Side Chain Match X" and
"MW Match X" are "N" and "N," then the "Total Match X" is "N" for no. Ultimately, the Total Match 1,
2, 3, 4.... are used to inform the "Similar Susceptibility as Template" column. If there is one or more "N"
for Total Match comparing any amino acid residue to the template across a row for a given species, then
the "Similar Susceptibility as Template" is "N" for no, indicating that the hit species is predicted NOT to
have the same susceptibly prediction as the template sequence. However, if all "Total Match X" are "Y"
for yes, then the "Similar Susceptibility as Template" is "Y" indicating that the hit species is predicted to
have the same susceptibly prediction as the template sequence.
65

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Multiple Level 3 Runs Requiring the Same Amino Acid Residue Comparisons
Typically, Level 3 individual amino acid residue alignments are submitted repetitively, comparing species
from one taxonomic group at a time to the template amino acid residue(s).
View Level 3 Data
Choose Quety to View
Amphibia
Aves
Crocodyliadae
Dipnoi
-Select Level 3 Run Name -
Lepidosauria
mammalia
Testudines
Therefore, to increase efficiency in submitting the same alignments in Level 3 over and over again, the
user can take advantage of the "Copy to Residue List" button. For the first alignment of amino acid
residues, the user would select the amino acid residues to align and click the "Update Report" button.
Select Amino Acid Residues	0~
1M
2T
3M
4T
5L
6H
7T
8K
9A
Update Report
By clicking "Update Report" the residues that were selected will be copied into the "Enter Amino Acid
Residue Positions" text box. When the user selects a new Level 3 Run Name from the same Level 1 query
accession) to view by using the "View Level 3 Data" dropdown and clicking the "View Level 3 Data"
button on the Level 1 Query Protein Information page, the "Enter Amino Acid Residue Positions" text
box will be populated with the amino acid residues selected from the previous run.
Enter Amino Acid Residue Positions
351.353,362.364,394,524| j
Enter residue positions as a comma separated list l
Copy to Residue List

The user can keep, add, or delete, residue positions in this box and click "Copy to Residue List" button.
The amino acid residues will then be moved to the "Select Amino Acid Residues Shuttle" and the user
can then click "Update Report" to view the data in the table below.
351D
355V
I | 356H
375Q
H
U
400G
Enter Amino Acid Residue Positions
[351,355.356,375,400
Copy to Residue List
66

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Moving Between Level 1, Level 2, and Level 3 Data Pages
As a user chooses to view Level I. Level 2, or Level 3 data in the "View SeqAPASS Reports" tab, new
buttons become available for allowing the user to move between Levels of an analysis. Please see
snapshot below.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)	Logout
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports	Version 4.0	Logged in as: Biatz,Donovan
Main Level 1 Level 2 Level 3
The user can use the "Main" button to return to the list of completed Level 1 runs and select a different
query accession to view. The "Level 1" button brings the user to the Level 1 data page, where the user can
set up queries for Level 2 and Level 3, as well as select the button to view Level 2 and Level 3 data pages.
Open Level 1, Level 2, and Level 3 pages remain open until the user selects a different ran to view on the
"Main" page. Moving between tabs, such as "Home/' Request SeqAPASS Run," and "SeqAPASS Run
Status", does not close the Level 1, Level 2, or Level 3 pages that have been opened.
Note: If the user logs out of the SeqAPASS tool, upon logging back in, the data will reset to default
settings. Therefore, the View SeqAPASS Reports tab will not display the "Main," "Level 1," '"Level 2,"
or "Level 3" buttons, until a query is chosen and Level 2 and Level 3 pages are opened.
Search, View, and Download Data Tables
The user can use the "Search" box to enter text to search the table. Further, the user can use the arrow
buttons and page numbers on the bottom of the screen to view all data and the drop-down to expand the
table to 10, 20, or 50 rows. There are also left and right scroll bars at the bottom of the tables to allow the
user to view all columns of the table.
Search using text box on top of tables:
Search: Enter keyword
Options for viewing data:
(1 of 95)	1 2 3 4 5 6 7 8 9 io|j^j£| 10- Download Table: ^
All data tables in the SeqAPASS tool can be downloaded as Excel or csv fdes. The icons for downloading
the fdes are present on the bottom right-hand side of all tables. Click the icon to down load data .
Download Table:
67

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Upon selecting a csv fileTthe user can choose to save or open the file. Each file is appropriately named by
Level of the SeqAPASS evaluation and report type.
Sorted by Taxonomic Group: id3S5
t>peaes Read-Across
Update Report
Use Default Settings
View Cutoff
Partial Hit Protein Sequence
• P-tmaiy Report
>full Report
Show Only Euiaryotes
Level 2 Data - Primary
tJCBI Accessi
Taxonomic
Filtered
Taxonomic
j^Ente^eyworc^^i
Scientific Name i
Opening SeqAPAS5_Level2_Primary_Report.csv
You have chosen to open:
Q.\'j SeqAPASS_Level2_Primary_Report.csv
which is: Microsoft Excel Comma Separated Valui
from: https://seqapassstage.rtpnc.epa.gov
What should Firefox do with this file?
» j Open with: | Microsoft Excel (default)
© Save File
~ Do this automatically for files like this from now on.
Common Name o
Protein Name 5
MP 000116.2
estrogen receptor isoform 1
XP 008993525.1
white-tufted-ear marmoset
PREDICTED: estrogen receptor
XP 017393067.1
PREDICTED: estrogen receptor
XP 018884801.1
Gorilla oorilla oor
western lowland gorilla
PREDICTED: estrooen receptor
XP 003811544,1
pygmy chimpanzee
PREDICTED: estrc-aen receptor
XP 003311596.1
Pan troglodytes
PREDICTED: estrogen receptor
blade spider monkey
XP 011852190.1
Mandrillus leucophae
PREDICTED: estrooen receptor isoform X2
Sumatran cwangutan
PREDICTED: estrooen
(1 of 82)
112.3 4 567 89 10
10[71 Download Table:

Upon selecting a .xls file, the user can save the report to their desired location. Each file is appropriately
named by Level of the SeqAPASS evaluation and report type.
Full Repor
-I""



\/ipw 1 pvpI :

0 Show Only Eukaryotes


f Save As X
«- " "f- Desktop > SeqAPASS Reports v O Search SeqAPASS Reports P
Organize * New folder |S ~ Q
Name Date modified Type
jt Quick access
No items match your search.
Level 2 Data - Primary


The following links exit the site fiiifflittii




^ This PC
Network



Search: Enter keyword 1
Data
Version
NCBI Accession 0
Protein
Count 0
Species
Tax ID 0
Taxonomic
Group i
Filtered
Taxonomic
Group 0

4
NP 000116.2
1265506 |
9606
Mammalia
Mammalia


4
XP 014992596.1
88400 j
9544
Mammalia
Mammalia

File name:
Save as type:

4
ABY64721.1
931
9534
Mammalia
Mammalia
Mammalia
Mammalia

SeqAPASS_L.evel2_Primary_Report.xls
Microsoft Excel 97-2003 Worksheet (*.xls)
4
XP 025240309.1
52618 |
9565
Mammalia
Mammalia
	




4
XP 003811544.1
51891
9597
Mammalia
Mammalia

* Hide Folders



4
XP 011922091.1
66748 |
9531
Mammalia
Mammalia

| Save
~ Cancel
4
ABY64717.1
2023
9593
9601
Mammalia
Mammalia




4
XP 002817538.1
145798
Mammalia
Mammalia
Pongo abelil
Sumatran orangutan
4
XP 011852190.1
38580
9568
Mammalia
Mammalia
Mandrillus leucophaeus
	Drill	
El



(1 of 95)

123456789 10
~* ~> 10» Download Table:
J Download table in excel(.xls) format

68

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Log out
The user can log out from any page in SeqAPASS, by clicking the "Log out" link on the upper right-hand
side of the page. If a user clicks Log out and then Logs back in, all settings will be set back to default.
User can log out at any time by clicking the "Log out" link on the upper right-hand side. Any successfully
submitted queries that were requested prior to logging out will continue running and when completed,
will be available to the user in the "View SeqAPASS Reports" tab.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

Log out
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings

Welcome to SeqAPASS
Version 4.0
Logged in as: Blatz,Donovan
Pop-up Messages
The Spinning Wheel pop-up is used as an indicator to alert the user that an action is taking place, where
the interface of the SeqAPASS tool is contacting the backend database. For example, upon clicking the
"SeqAPASS Run Status" tab, "Refresh Data" button, "View Level 2 Data" button, or "View Level 3
Data" button the Spinning Wheel will pop-up and disappear from the screen. There are multiple other
instances where the spinning wheel is used as an indicator to the user that an action is occurring.
Querying database ... Please wait

Pop-up messages are meant to guide the user to submit the correct information for a query, inform the
user of a successful or failed query submission, or otherwise inform the user of an error. All pop-up
messages will appear for 10 seconds on the upper right-hand side of the screen, and then disappear. If the
user would like to close the message before the 10 seconds is up, click on the message and an "x" will
appear of the upper right-hand corner of the message box. Click the x to close the message.
In the "Request SeqAPASS Run" tab, Compare Primary Amino Acid Sequences "By Species" page, a
successful Level 1 query submission will display a pop-up message indicating that the query has been
submitted to the run queue or if "existing" message appears indicating that the accession has been ran
previously either by a user and is available to view.

Success

Submitted NP_064393.2:

submitted
OR
T
Success

NP_000116.2: existing


69

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
User did not select any query proteins from the "Request SeqAPASS Run" tab, Compare Primary Amino
Acid Sequences "By Species" or "By Accession" page, and clicked "Request Run" button.
0
Error
Must select query
proteins




OR


0
Error


Must enter NCBI


accession


If the user enters non-sense text (or any text that is not an NCBI accession) into the "NCBI Protein
Accession" text box for submitting a Level 1 query in the "Request SeqAPASS Run" tab, in the Compare
Primary Amino Acid Sequences "By Accession" page, and clicked "Request Run" button, the message
below will pop-up indicating that the Accession entered is not in the SeqAPASS database.
J Success
fgafgaf: not in database
In the "View SeqAPASS Reports" tab, Level 1 page, if a user clicks "View Level 2 Data," a successful
Level 2 query submission will display a pop-up message indicating that the query has entered the run
queue.
j Level 2 Run Requested
	jStatus queued
In the "View SeqAPASS Reports" tab, Level 1 page, if a user selects a domain that has already been
submitted (but not completed) and clicks "Request Domain Run" a message for successful Level 2 query
submission will display a pop-up message indicating that the query has entered the run queue
j Level 2 Run
Requested
Status Already run or
could not submit
70

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
In the "View SeqAPASS Reports" tab, Level 1 page, if a user clicks "View Level 2 Data'' without
selecting a domain to view from the drop-down, the message below will pop-up to indicate that the user
must select a domain.
(X) Error
Must select domain from
drop-down
In the "View SeqAPASS Reports" tab, Level 1 page, a successful Level 3 query submission will display a
pop-up message indicating that the query has entered the run queue.

J Level 3 Run Requested


Status queued

In the "View SeqAPASS Reports" tab, Level 1 page, if a user fails to type a user defined Level 3 Run
Name, the message below will pop-up to indicate that the user must do so.
© Error
You must specify a
Template Sequence and
Level 3 Run Name
In the "View SeqAPASS Reports" tab, Level 1 page, if a user fails to select species from the Level 1 Data
table to be compared with the template sequence, the message below will pop-up.
0 Error
You must select
sequences from the
Level 1 Data table to
request a Level 3 Run
In the "View SeqAPASS Reports" tab, Level 1 page, if a user fails to select a Level 3 Run Name from the
Choose Query to View drop-down and clicks the "View Level 3 Date" button, the message below will
pop-up.
(x) Error
Must select level 3 run
from drop-down
71

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
In the "View SeqAPASS Reports" tab, "Level 3 Template Protein Information" data page, if a user fails
to select amino acid residues using the "Select Amino Acid Residues" shuttle and clicks the "View Level
3 Date" button, the message below will pop-up.
No Residues Selected
User must select
residues
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Documentation
Query Species: The selection of the query species for a SeqAPASS analysis is dependent upon the
question the user is addressing. For example, the query species can be the target species (i.e., human or
companion animal in the case of drugs; or insect, plant, fungus, or pest in the case of pesticides) or,
depending on the application of the susceptibility prediction, the query species may be a species known or
hypothesized to be sensitive to a chemical acting on the protein molecular target of interest. There may be
instances where a protein for the species of interest has not been sequenced, in this case it may serve the
users purpose to identify another taxonomically related species from the same organism Class, Order,
Family, or Genus as a surrogate query species. In certain cases, when there is interest in the susceptibility
of a particular species (e.g., honey bee) and in the case that there are numerous potential target species
(e.g., neonicotinoids are intended to cause mortality in a number of pest insects) the species of particular
concern may serve as the query species.
Query Protein: SeqAPASS can be queried with any protein sequence available in the NCBI protein
GenBank database, by protein name, or NCBI Accession. It is suggested that the user of SeqAPASS
examines their query protein and species in the NCBI protein database prior to submitting a run to
SeqAPASS (use NCBI link on query page). It is not uncommon for a protein of a specific species to be
represented by more than one sequence. In such cases there are some guiding principles for identification
of the best sequence available for the SeqAPASS run.
General guidelines: These guidelines describe best practices for identifying the most useful sequence for a
species susceptibility prediction in SeqAPASS, however, in some cases, limited sequence information is
available and therefore less desirable sequences may be used. It is up to the user of SeqAPASS to
recognize the quality and limitations of the sequence chosen for the SeqAPASS query. The information
about a particular protein can be found on the Protein page in the NCBI database
(http: //www .ncbi .nlm .nih. gov/protein/).
72

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
http: //www. ncbi .nlm .nih .gov/protein/
5 Home - Protein - NCBI
el.H-Google
3
www, ncbi. nlm, nih, gov/protein/
fil Most Visited Getting Started !....! Customize Links :,.J Windows Marketplace
% NCBI Resources © How To ©
Protein
1 Protein	vj j androgen receptor, homo sapiensj
ISO
Help

I Protein
The Protein database is a collection of sequences from several sources, including translations from annotated coding
regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein sequences are
the fundamental determinants of biological structure and function.
I
Using Protein
Quick Start Guide
FAQ
Help
GenBank FTP
RefSeq FTP
Protein Tools
BLAST
LinkQut
E-Utilities
Blink
Batch Entrez
Other Resources
GenBank Home
RefSeq Home
CDD
Structure
Search for a protein of interest using protein name and/or species of interest: For the example above,
multiple hit proteins were identified.
J NCBI Resources© How To©
Protein
[ Protein v, androgen receptor, homo sapiens
Save search Advanced
(213
Help
Show additional filters
Species
Animals
Fungi
Bacteria
More ...
Enzyme types
Ligases
Oxidoreductases
Source
databases
DDBJ
EMBL
GenBank
PDB
PIR
RefSeq
UniProtKB / Swiss-Prot
Sequence length
Custom range. .
Molecular
weight
Custom range. .
Release date
Custom range...
Revision date
Custom range...
Display Settings: R Summary, 20 per page, Sorted by Default order
Send to: © Filters: Manage Filters
Results: 1 to 20 of 540	Page [T] of 27 Next > Last >i
~	RecName. FulNAndrogen receptor. AltNarne. Full=Dihvdrotestosterone receptor. AltName.
1-	Full=Nuclear receptor subfamily 3 group C member 4
919 aa protein
Accession: P10275.2 Gl: 113830
GenPept FASTA Graphics Related Sequences Identical Proteins
~	androgen receptor |Homo sapiens!
2-	917 aa protein
Accession: A4A51772.1 Gl: 178882
GenPept FASTA Graphics Related Sequences Identical Proteins
~	androgen receptor, partial (Homo sapiens]
3-	2 aa protein
Accession: MD14959.1 Gl: 4262811
GenPept FASTA Graphics
~	androgen-receptor [Homo sapiens]
4-	906 aa protein
Accession: AAA51780.1 Gl: 179034
GenPept FASTA Graphics Related Sequences Identical Proteins
~	androgen receptor |Homo sapiens!
5-	917 aa protein
Accession: AAA51771.1 Gl: 178872
GenPept FASTA Graphics Related Sequences Identical Proteins
~	androgen receptor |Homo sapiens]
~ Top Organisms ITreel
Homo sapiens (531)
Aspergillus niger (4)
Chlorocebus aethiops (1)
Cardiobacterium valvarum F0432 (1)
Streptococcus pneumoniae MNZ41 (1)
All other taxa (2)
| More...
Find related data
Database: | Select
Search details
androgen receptor[All Fields] AND
("Homo sapiens"[Organism] OR homo
sapiens[All Fields])
Recent activity	-
Turn Off Clear
C\ androgen receptor, homo sapiens (540)
PjoU
Select one of the proteins by clicking on the link shown above to see detailed information about the
protein
73

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
% NCBI Resources© How To©


Sian in to NCBI




Protein I Protein
[



Advanced

Help
Display Settings: 0 GenPept
androgen receptor [Homo sapiens]
GenBank: AAA51771.1
FASTA Graphics
Send to: (vl
Go to: R
LOCUS
DEFINITION
ACCESSION
VERSION
DBSOURCE
KEYWORDS
SOURCE
ORGANISM
REFERENCE
AUTHORS
TITLE
JOURNAL
PUBMED
REFERENCE
AUTHORS
JOURNAL
PUBMED
COMMENT
FEATURES
source
PRI 31-0CT-1994
AAA51771	917 aa
androgen receptor [Homo sapiens]
AAAS1771
AAA51771.1 GI:178872
locus HUMARA accession M21748.1
Homo sapiens (human)
Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
1 (residues 1 to 917)
Tilley,W. D., Marcelli,M., Wilson,J.D. and McPhaul,M.J.
Characterization and expression of a cDNA encoding the human
androgen receptor
Proc. Natl. Acad. Sci. U.S.A. 86 (1), 327-331 (1989)
2911578
(si
s)
Marcelli,M., Tilley,W.D., Wilson,C.M., Griffin,J.E., Wilson,J.D.
and McPhaul,M.J.
Definition of the human androgen receptor gene structure permits
the identification of mutations that cause androgen resistance:
premature termination of the receptor protein at amino acid residue
588 causes complete androgen resistance
Hoi. Endocrinol. 4 (8), 1105-1116 (1990)
2293020
[2] sites; androgen resistant mutation.
Draft entry and computer-readable sequence [1] kindly submitted by
M.J. McPhaul, 09-DEC-1988.
Method: conceptual translation.
Location/Qualifiers
1..917
	/organism="Homo sapiens"	
Change region shown
Customize view
Analyze this sequence
Run BLAST
Identify Conserved Domains
Highlight Sequence Features
Find in this Sequence
Protein 3D Structure
Targeting The Binding
Function 3 (bG) Site Of The
Human Androgen Receptor
PDB: 4HLW
Source: Homo sapiens
Method: X-Ray Diffraction
Resolution: 2.5 A
See all 54 structures...
Articles about the AR gene
Repression of cell proliferation and androgen
receptor activity in prostat [Anticancer Res. 2013]
TALEN-engineered AR gene rearrangements
reveal endocrim [Proc Natl Acad Sci USA. 2013]
Androgen receptor (AR) positive vs negative roles
in prostate cancer cell d [Cancer Treat Rev. 2014]
Identical proteins for AAA51771.1
Guiding principles: On the NCBI protein page, rows to examine include: "DEFINITION,"
"REFERENCES," COMMENTS," and "FEATURES." The information provided in these rows can aid a
SeqAPASS user in the identification of an ideal query sequence for SeqAPASS.
It is desirable to:
a.	Use accessions with the following prefix: NP_
b.	Avoid use of protein sequences labeled "partial," "PREDICTED," "PROVISIONAL," "INFERRED,"
or "hypothetical"
c.	Avoid using those labeled "TPA" (Third Party Annotation), however if TP A is all that is available
"TPA: experimental" would be preferred over "TPA: inferential"
d.	Look at the date associated with the protein in the "LOCUS" row of the detailed protein page. A more
recent date can have the most up-to-date annotation of the protein. Under the "DBSOURCE" row of the
detailed protein page other accessions associated with past protein sequences can be viewed. Many times,
if the "xrefs" row is heavily populated and has the most recent annotation update date, it is likely to be the
best sequence to use as a query sequence in SeqAPASS.
d.	Short sequences should be avoided when possible as query sequences. Many times, if one selects the
protein from the protein output derived from the NCBI protein database query, they will find that the
short sequence is actually a partial sequence described in the "DEFINITION" row of the Protein page.
e.	Unless there is reason for doing so (based on the question the user is trying to address), splice-variants
labeled in "FEATURES" rows of the Protein page as "alternatively spliced" would be less desirable
f.	It is important to check the references associated with the selected query protein. In some cases, certain
sequences are associated with sensitivity to a given chemical. This can be particularly useful when
predicting susceptibility to pesticides, where certain strains of insects are produced to be readily sensitive
or insensitive to a chemical.
74

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
g. A secondary check of the sequence used in the SeqAPASS run would be to look at the output derived
and see whether ortholog candidates were detected. Ideally a preferential sequence would have more
ortholog candidates identified.
Important Note: To identify which query protein has the greatest number of Ortholog Candidates the user
can choose to submit multiple proteins with the same species and protein. Upon the Level 1 runs
completing for those similar proteins, the user can then select the "View SeqAPASS Reports" tab and
look at the table for "Ortholog Count" the protein with the highest number is likely to be the most
appropriate query species for a SeqAPASS evaluation.
Example: Androgen receptor, Homo sapiens
3[
Display Settings: GenPept
androgen receptor [Homo sapiens]
GenBanklAAA51771.11
DEFINITION
ACCESSION
VERSION
DBSOURCE
KEiTOORDS
SOURCE
ORGANISM
ffcA51771	917
ndrogen receptor [Homo sapiens] .
AAA51771
AAA51771.1 &I:17887i
locus HIMM& accession M21748.1
FRll 31-0CT-1994
sapie:
JLiEIH
: (tnifilXi)
; Primates
; Haplorrhini;
Eukaryota; Metasoa; Chordata; Craj
Majrmalia; Eutheria; Euarchontogli:
fatarrhini; ftcminidae; Homo.
(residues 1 to 917)
Tilley,W.D., Marcelli,M., Wilson,J.D. and McPhaul,H.J.
Characterization and egression of a cDHA encoding the hmuji
JOURNAL
PUIMED
REFERENCE
AUTHORS
JOURNAL
PIJHT'
. Acad. Sci. U.S.A. 86 (1), 327-331 (1989)
(si

Marcelli,M., Tilley,W.D., Mi:
and McPhaul,M.J.
Definition of the hwian andr>
the identification of nnutati>
premature termination of the
588 causes complete androgen resistance
Hoi. Endocrinol, 4 (»), 1105-1116 (1990)
J293020
,C.M., (rriffin,J.E. , Wilson,J.D.
that cause androger
:eptor protein at m
in
"TJraft entry and computer-readable
M.J. McPhaul, 09-DEC-1988.
fethod: conceptual translation.
Loc ation/Quali j
eguence [1] kindly submitted by
Protein
Region
1..917
/organism="Hcmo sapiens"
/db_xre f = "taxom:9606"
/map="Xqll.i-qir
/sex="male"
/ ti ssue_type = "prostate"
1..917
/product1"androgen receptor"
6..446
/re gi on_name = "Androgen_re c ep"
/note="Androgen receptor; pfam0£166"
/db_xref="CUD:111097"
5 5 £..633
/re gi onjname = "NR_DBD_AR"
/note="DNfc-binding domain of androgen receptor (AR) is
composed of two C4-type zinc fingers; cd07173"
/db_xref = "CDD: 143547"
order(557,56t,574,577,593,599,609,61£)
/s ite_type-"othe r"
/note="zinc binding site [ion binding]"
/db_xref="CUD:143547"
order(566..569,576,57$..579,58i..583,591,606..607,610,613)
/ s ite_type = "DNfc binding"
/note1"UNA binding site [nucleotide binding]"
/db_xref="CDD:143547"
order(59£..596 ,598..600 ,605 ,608)
/s ite_type = "othe r"
Change region shov/n
Customize view
Analyze this sequence
Run BLAST
Identify Conserved Domains
Highlight Sequence Features
Find in this Sequence
Protein 3D Structure
Targeting The Binding
Function 3 (bf3) Site Of
The Human Aidrogen
PDB: 4HLUV
Method: X-Ray
1 Diffraction
Resolution: 2.5 A
see all5i$trictires...
Articles about the AR gene	*
Repression of cell proliferation and androgen
receptor activity in pre [Aiticancer Res. 2013]
TALEN-engineered AR gene rearrangements
reveal endo [Proc Natl A;ad Sci U S A 2013]
Androgen receptor (AR) positive vs negative
roles in prostate car [Cancer Treat Rev. 2014]
Identical proteins for AAA51771.1
androgen receptor [Homo sapiens]
[AAA61772]
see all...
Path^ys for the AR gene
Integrated Breast Cancer Pathway
SIDS Susceptibility Pathways
Nuclear Receptors
Reference sequence information	~
RefSeq genomic sequence
Seethe genomic reference sequence for the
AR gene (NG_009014.2).
RefSeq protein isoforms
See 4 reference sequence protein isoforms
for the AR gene.
75

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Example cont:
/note="dimer interlace
/db_xre i-"CDD:143S47 "
[polypeptide binding]"
670.
.915
/re gi on_name-"HE_LED_AE"
/note:"Ligand binding donain oi the rruclear receptor
androgen receptor, ligand activated transcription
regulator; cd07073"
/ db_xre f = " CDD: ism® "
order(699,70t..703,705..706,709,739..740,743..744,747,750,
762,778,785,871,875)
/site_type s "othe r"
/note = " 1 igand binding site [chemical binding]"
/db_xref="CDD:13*758"
order(711,714,71$,7£4,7£8,73i,736,891..89*,895..896)
/j ite_type = "othe r"
/note="coactivator recognition site [polypeptide binding]"
/db_xre i-"CDD:13*758"
1..917
/gene = "AE"
/coded_by="Hil748.1:163..4916"
/db_xref = "HJE: &00-l*0-556"
1
mevqlglgrv
61
qqqqqqqqqq
1*1
echpergcvp
181
iljea;tmql
241
jvimglgvea
301
edtaeyjpfk
361
yynfplalag
421
gjp* aaa.* _p.
481
trppqglagq
541
rdhvlpidyy
601
idkfrrkncp
661
hiegyecqpi
7*1
pgf mlhvdd
781
qcvimrhliq
841
ackrknptjc
901
vpkiljgkvk
yprppjktyr
qqqqqqqet;
lqqqqqeav;
lehlspgegl
ggytkglege
pppppppphp
jwhtlftaee
es dftapdwo
fppqktclic
scrlrkcyea
gnaviqyjom
ei gwlqitpq
jrrfyqltkl
piyfhtq
galqnlfqsv
prqqqqqqge
glpqqlpapp
rgdcmyapll
slgcsgsaaa
hariklenpl
gqlygpcggg
ypgyriv; rvp
gdeasgchyg
^ntlgarklk
gwcaghdrm
glrrivf amg»r
etlcmkalll
ldsvqpiare
reviqnpgpr
dgspgahrrg
deddsaapst
rsgaptsskd
gvppavrptp
gsjgtlelpj
9999999999
ypjptcvkse
altcgsckvf
klgnlklqee
qpdjiaalls
jftnvnjiwil
fjiipvdglk
lhqftfdlli
hpeaasaapp
ptgylvldee
1sllgptfpg
nylggtjtis
caplaeckgs
tlslyksgal
aqcrygdla;
9999999999
rngpomdjyj g
fkraaegkgk
geasJttspt
jlnelgerql
yfapdlvfne
ngkf f delim
kshrrvjvdfp
gajllllqqq
qqp;qpq;al
lsscsadlkd
dnokelckav
llddsagkst
deaaayq;rd
lhgagaagpg
e agavapygy
pyginrleta
ylcajrndct
e ettqkltvj
vhwkwakal
yimhki imy j
nyikeldrii
erifriaeii jvq
More about the AR gene	*
The androgen receptor gene is more than 90
kb long and codes for a protein that has 3
major functional domains: the N-terminal
domain, DNAb...
Aso ttiown As: RP11-383C12.1. AJS. DHT...
Homologs of the AR gene
The AR gene is conserved in Rhesus
monkey, dog. cow, mouse, rat, and chicken.
Link Out to external resources
Aselection of literature about the proteins
[GoPubMed Proteins]
Transcript/Protein Information
[PANTHER Classification System]
Transcript/Protein Information
[PANTHER Classification System]
biochemicals
[Exact ATtigen/Labome]
antibody review
others
antibody
cDNA clone
protein and peptide
ELI SA and assay kit
[Exact Aitigen/Labcme]
[Exact Artigen/Labome]
[Exact Artigen/Labome]
[EoctAitigen/Labome]
[E^actAnigen/Labome]
[Exact Aitigen/Labome]
h. If multiple proteins appear to be the best query protein for SeqAPASS, the sequences can be aligned
using NCBFs COBALT. Enter (copy and paste from NCBI protein search list) accessions and align.
O COBALT

My NCBI 1
Home Recent Results Help



Cobalt Constraint-based Multiple Protein Alignment Tool

Enter Query Sequences
Enter at least 2 piotein accessions, gis, 01 FASTA sequences
COBALT computes a multiple protein sequence Alignment using conserved domain and local sequence similai ity infoi motion.
P10275.2
AAAS1772.1
AAAS1780.1
AAAS1771.1
AAA51729.1
AAD4E921.1
AAA51886.1
Or. upload FASTA file
Job Title
Browse..."] No file selected.
Align
~ Advanced parameter
EH Show results in a new window
76

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Alignment page will be generated
O COBALT
Constraint-based Multiple Alignment Tool
MyHCBI
Home Recent Results Help

Inl IRen
Phvioaenetic Tree Edit and Resubmit >Down!oad
- Cobalt RID EMV7SF1X211 (7 seqs)

All queries form only one cluster. No domain information was used for generating constraints. Decreasing maximum in-cluster distance or turning off query clustering option may improve
results.
T Descriptions 0 Select All Re-align >Alignment parameters
Legend for links to other resources: UniGene B GEO ~ Gene S Structure CD Map Viewer




Accession
Description
Links

0 P10275.2
RecName: FulMAndrogen receptor; AltName: Full=Dihydrotestosterone receptor; AltName: FulMNucle
M ,'t I


0 AAA51772.1
androgen receptor [Homo sapiens] >gb|AAA51771.1| androgen receptor [Homo sapiens]
E


0 AAA51780.1
androgen-receptor [Homo sapiens]
M.M


0 AAA51771.1
androgen receptor [Homo sapiens] >gb|AAA51772.11 androgen receptor [Homo sapiens]
QjU


0 AAA51729.1
androgen receptor [Homo sapiens] >sp|P10275.2|ANDR_HUMAN RecName: Full=Androgen receptor;
LSinJPubChem BioAssav Info linked to AAA51729.1


0 AAD45921 1
androgen receptor [Homo sapiens]
M' I


0 AAA51886.1
androgen receptor [Homo sapiens] >sp|P10275.2|ANDR_HUMAN RecName: FulKAndrogen receptor;
BPubChem BioAssay Info linked to AAA51886.1




~ Alignments 0 Select All _R6tali9fl	Mouse over the sequence identiferfor sequence title
View Format: | Compact ^ # Conservation Setting: | 2 Bits v w,
0P1O275
1
0AAA51772
1
0AAA51780
1
0AAA51771
1
0AAA51729
1
0AAD45921
1
0AAA51886
1
0P1O275
81
0AAA51772
80
0AAA51780
76
0 AAA51771
80
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVTQNPGPPHPEAASAAPPGASLLLLQQQQGQQQQGQQQQQQQQQQqET	80
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQ-ET	79
MEVQLGLGRVYPRPP3KTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQ	ET	75
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQ-ET	79
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET	80
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQHPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQ	ET	75
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQHPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET	80
To evaluate sequences, change settings for "Conservation Setting"' from "2 Bits"' to "Identity"'
O COBALT
Constraint-based Multiple Alignment Tool
MyNCBI
Home Recent Results Help

^BflSionlnlfRel

Phvloqenetic Tree Edit and Resubmit >Download
- Cobalt RID EMV7SF1X211 (7 seqs)

All queries form only one cluster. No domain information was used for generating constraints. Decreasing maximum in-cluster distance or turning off query clustering option may improve
results.
T Descriptions 0 Select All Re-align >Alianment parameters
Legend for links to other resources: m UniGene Q GEO e Gene Structure Map Viewer
Accession
Description
Links
0 P10275.2
0 AAA51772.1
0 AAA51780.1
0 AAA51771 1
0 AAA51729.1
0 AAD45921 1
0 AAAS1B8R 1
RecName: Full=Androgen receptor; AltName: Full=Dihydrotestosterone receptor; AltName: Full=Nucle
androgen receptor [Homo sapiens] >gb|AAA51771.11 androgen receptor [Homo sapiens]
androgen-receptor [Homo sapiens]
androgen receptor [Homo sapiens] >gb|AAA5!772.1| androgen receptor [Homo sapiens]
androgen receptor [Homo sapiens] >sp|P10275.2|ANDR_HUMAN RecName: Full=Androgen receptor;
androgen receptor [Homo sapiens]
androgen receptor [Homo sapiens] >sp|P10275.2|ANDR_HUMAN RecName: FulNAndrogen receptor;
ran
E
,'i I
M.'l
LSiulPubChem BioAssay Info linked to AAA51729.1
rcT.Tii
BPubChem BioAssay Info linked to AAA51886.1
~ Alignments 0 Select All Re-afign
View Format: | Compact jvj| #
Mouse over the sequence identiferfor sequence title
0P1O275	1
0AAA51772 1
0AAA51780 1
0 AAA51771 1
0AAA51729 1
0AAD 45921 1
0AAA51886 1
MEVQLGI
MEVQLG!
MEVQLGL
Conservation Setting: 2 Bits
1	Bit
2	Bits
3	Bits
JRVYPRPPSKIYRGAFQNL 4 gits
JlIjRVY PRP P SKTYRGAFQNL
PGPRH1
'GPRH1
!I'AASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET	80
:i1AASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQ-ET	79
LAASAAPPGASLLLLQQQQQQQQQQQQQQQQ	ET	75
HEVQLGLGRVYPRPPSKTTRGAFQMLFQSVPEVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQ-ET	79
HEVQLGLGRVYPRPPSKTTRGAFQNLFQSVPEVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQOqET	80
MEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQ	ET	75
MEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET	80
77

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Look for differences in the sequence (e.g., conserved residues, gaps) and start by eliminating sequences
that have gaps.
i. If, after the suggested evaluations of the proteins are performed, questions remain as to which sequence
would be best to run in SeqAPASS, run all relevant sequences in SeqAPASS for the evaluation. The
individual residue differences between commonly named sequences will become most important when
evaluating residues known to be important for binding the chemical or activating the protein (Level 3
SeqAPASS analysis). After completing the SeqAPASS run, select the data that has the greatest number of
ortholog candidates for your evaluation of conservation and further predictions of cross species
susceptibility. Depending on the protein of interest, multiple subunits may be associated with a protein. In
this case, all relevant subunits can be queried using SeqAPASS.
Level 1 Calculated Percent Similarity
The SeqAPASS algorithms submit the query to NCBFs standalone BLASTp (using default settings,
including BLOSUM-62 matrix), which aligns the query protein with all proteins available in the NCBI
protein database and provides a variety of metrics associated with each pairwise alignment between the
query and hit sequences. SeqAPASS selectively captures output from BLASTp, including one sequence
per species with the highest bit score. Detailed descriptions of metrics derived from BLASTp (e.g.,
BLASTp Bitscore, E-Value, Positives, Identity, Hit length) can be found in:
The NCBI Handbook: (http://www.ncbi.nlm.nih.gov/books/NBK21106/);
BLAST® Help: (http://www.ncbi.nlm.nih.gov/books/NBK62051/) and the
NCBI Glossary Field Guide: (http://www.ncbi.nlm.nih.gov/Class/FieldGuide/glossary.html)
The top row of the Level 1 data corresponds to the queried protein selected by the user. For each sequence
queried, the Level 1, top row query sequence is used to determine the maximum bitscore for the analysis,
which is derived from aligning the query sequence to itself using BLASTp. To calculate percent
similarity, the bitscore for each hit sequence is normalized to the maximum bit score and then multiplied
by 100.
Note: SeqAPASS v2.0 and newer parse the BLASTp query and hit accessions to identify all the
species/accessions from the identical proteins. Therefore, if a hit sequence represents multiple species, all
species with the identical sequence will be found in the data table for Level 1 and Level 2. To determine
which sequence/species was identified from BLASTp as a hit and which sequence/species was parsed
from the identical sequence, view the "Full Report" for Level 1 or Level 2, column "Identical Protein,"
Where "N" is indicative of the original hit sequence and "Y" is the parsed sequence.
Common Domain Count
Reversed Position Specific BLAST (RPS BLAST) is used to compare each query and hit sequence to
conserved domains defined in NCBIs Conserved Domain Database. A hit domain is considered in
common with the query domain if it contains the same domain accession as the query and it aligns with
the NCBI curated domain with the same or greater amino acid residue coverage than the query sequence.
78

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Ortholog Candidate Identification
Ortholog sequences are those that have diverged from a speciation event and therefore are more likely to
maintain similar function. SeqAPASS uses reciprocal best hit (RBH) BLAST for ortholog detection by
automatically comparing each hit protein to all protein sequences available for the query species and if the
original query protein or one of its identical protein matches is identified to by the best match to the hit or
maintain the same bitscore, then the hit sequence would be considered an ortholog candidate. The
sequence is indicated an Ortholog Candidate or not with a yes (Y) or no (N) in the column.
Note: Many NCBI protein accessions represent multiple identical protein sequences in the BLASTp
output. This is due to BLASTp querying and presenting data from the non-redundant protein database.
Sometimes the identical sequences are from different species. This can be checked by following the link
for the top row "NCBI Accession" in the table to the NCBI protein page. Below the protein name
[species] title will be a link to "Identical Proteins."
Click the "Identical Proteins" link and look for a sequence in the list from the user defined query species.
1 % NCBI Resources © How To 0

Sian in to Ncl



Protein Protein


Advanced

He
<£& NCBI is phasing out sequence Gl numbers in September 2016. Please use accession.version! Read more...


GenPept^
Send to: ~

Change region shown

estrogen receptor isoform 1 [Homo sapiens]
Customize view

NCBI Reference Sequence: NP_000116.2

Identical Proteins FASTA Graphics


Analura thic cannanra
Note: If the top hit is a Protein DataBank (PDB) code (e.g., 1AHRA) from RBH BLAST there will be
no ortholog candidates identified. BLASTp when ran against all accessions for a given species does not
return PDB codes. It is recommended that the user identify a similar/identical sequence to the PDB code
and use that sequence as the query sequence.
Susceptibility cut-off
The susceptibility cut-off values listed on the "Level 1 (and Level 2) Susceptibility Cut-off' page are
determined by plotting the % similarity data from the "Primary Report" or "Full Report" and identifying
the local minimums in the data. The default cut-off is determined by taking the 1st local minimum and
moving up in percent similarity until the next ortholog candidate is found. The susceptibility cut-off
displayed in the list is the percent similarity of the identified ortholog candidate.
Criteria for Susceptibility Prediction (when "Primary> Report Settings " is set to "Species Read-Across: "
Yes)
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate = Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
79

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
If the hit sequence is below the susceptibility cut-off but belongs to any organism class found above the
susceptibility cut-off, the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for
"yes". This criterion allows susceptibility predictions to be made across taxonomic groups based on the
likelihood that the sequences above the cut-off are better matches to the query.
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate = N, for "no,") and does not belong to any organism class found above the susceptibility cut-
off, the hit is predicted to not be susceptible; therefore, Susceptibility Prediction = N for "no"
Note that the "Primary Report" may yield different Susceptibility Predictions than the "Full Report," as
the predictions are based on the data in the different reports. The Primary Report is filtered to only display
E-value <0.01 and Common Domain Count > 1.
Criteria for Susceptibility Prediction (when "Primary Report Settings " is set to "Species Read-Across: "
No)
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate = Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate = N, for "no,"), the hit is predicted to not be susceptible; therefore, Susceptibility Prediction =
N for"no"
Level 2 Calculated Percent Similarity
Data obtained from the Level 1 RPS BLAST evaluation is used to assign sequence ranges that aligned
with a user selected domain (from the NCBI CDD database) to each accession from the Level 1 Full
report. BLASTp is then used to align the query domain range to each hit domain range. The percent
similarity is calculated based on the bit scores from the BLASTp alignment of the domain regions. For
each sequence queried, the Level 2, top row query species is used to determine the maximum bitscore for
the analysis, which is derived from aligning the query sequence to itself using BLASTp. To calculate
percent similarity, the bitscore for each hit sequence is normalized to the maximum bit score and then
multiplied by 100.
Susceptibility cut-off (same method as used in Level 1)
The susceptibility cut-offs listed on the "Level 2 Susceptibility Cut-off' page are determined by plotting
the % similarity data from the "Primary Report" or "Full Report" and identifying the local minimums in
the data. The default cut-off is determined by taking the 1st local minimum and moving up in percent
similarity until the next ortholog candidate is found. The susceptibility cut-off displayed in the list is the
percent similarity of the identified ortholog candidate.
80

-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 09/10/19; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 2 Criteria for Susceptibility Prediction (when "Primary Report Settings " is set to "Species Read-
Across: " Yes)
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate = Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off but belongs to any organism class found above the
susceptibility cut-off, the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for
"yes". This criterion allows susceptibility predictions to be made across taxonomic groups based on the
likelihood that the sequences above the cut-off are better matches to the query.
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate = N, for "no,") and does not belong to any organism class found above the susceptibility cut-
off, the hit is predicted to not be susceptible; therefore, Susceptibility Prediction = N for "no"
Note that the "Primary Report" may yield different Susceptibility Predictions than the "Full Report," as
the predictions are based on the data in the different reports. The Primary Report is filtered to only display
E-value <0.01 and Common Domain Count > 1.
Level 2 Criteria for Susceptibility Prediction (when "Primary Report Settings " is set to "Species Read-
Across: " No)
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate = Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate = N, for "no,"), the hit is predicted to not be susceptible; therefore, Susceptibility Prediction =
N for "no"
Level 3 Sequence Alignments
COBALT is used to align all user selected sequences (from Level 1 hits) with a user defined template
sequence. Because COBALT algorithms align all sequences, it is recommended that the user align the
template sequence with sequences that are most similar to one another. As a means to capture the most
similar sequences from the SeqAPASS data it is recommended that the user filter the Level 1 data by
taxonomic group and step through the Level 1 data pages one by one while selecting sequences. It is
recommended that the user look at the name of the sequence and exclude 'partial" sequences when
possible. Requesting a query from one taxonomic group at a time, breaks the data down in manageable
alignments.
Selecting Amino Acid Residues to Align
The user may select up to 50 amino acid residues to compare across selected species in Level 3.
81

-------