EPA/600/R-18/062
Sequence Alignment to Predict Across
Species Susceptibility
(SeqAPASS)
VERSION 3.0
syspsacjcass SWIH.J.JL lace yyj-yypi-yyy
rdhvlpid]:^*f^^""' i/^^^easqchyq
i dk-f /L, tEKVrteS^arklk
h ieqr/®lHS AJtihwak 3 &-SX. fuMdnn
w
aaa
enpl
rppqqlaqq e*
jpejjjqjugpcggg
Sipdvw ypggmvsrvp
User Guide
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) User Guide
Quick Notes: Use Mozilla Firefox or Chrome for optimal performance and PLEASE DO NOT submit
more than 10 Level 1 queries at a time. Wait until they run to completion prior to submitting more.
Table of Contents
Background page 2
Accessing SeqAPASS page 3-4
Returning Users (page 3)
First Time Users (page 4)
Messages from the SeqAPASS Development Team page 5
SeqAPASS Home Tab page 6
Request SeqAPASS Run Tab page 7-12
Query "By Species " (page 8)
Query "By Accession" (page 11)
SeqAPASS Run Status page 13-14
View SeqAPASS Reports page 15-21
View Report {page 16)
Save Report(s) (page 17)
Level 1: Primary Amino Acid Sequence Alignment page 22-26
Primary Report Settings (page 24)
Susceptibility Cutoff Box for Level 1 page 27-31
No Ortholog Candidate (page 30)
Level 2: Functional Domains Alignment page 32-34
View Level 2 Data Page page 36-41
Primary Report Settings (page 37)
Susceptibility Cutoff Box for Level 2 page 40-43
No Ortholog Candidate (page 42)
Level 1 and Level 2 Data Visualization page 44-55
Level 1 Information Page (page 46)
Level 2 Information Page (page 47)
Level 1 and 2 BoxPlot Page - Controls (page 48)
Level 3: Individual Amino Acid Residue Alignment page 56-61
View Level 3 Individual Amino Acid Query and Data Page page 62-67
Level 3 Data - Primary Report (page 65)
Level 3 Data - Primary Report (page 66)
Moving Between Level 1, Level 2, and Level 3 Data Pages page 68
Search, View, and Download Data Tables page 69-70
Log out page 71
Pop-up Messages page 72-74
Detailed Documentation page 75-84
Note: Screen-shots displayed throughout User Guide may depict a different version than the current
version if functionality did not change in SeqAPASS Version 3.0.
1
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Background
The SeqAPASS tool has been developed to predict across species relative intrinsic susceptibility to
chemicals with known molecular targets (e.g., pharmaceuticals, pesticides) as well as evaluate
conservation of molecular targets from high-throughput screening assays (i.e., U.S. Environmental
Protection Agency ToxCast Program) and molecular initiating events (MIEs) and early key events in the
adverse outcome pathway framework, as a means to extrapolate such knowledge across species. The term
"relative" is used because it is recognized that molecular target similarity is one consideration, though an
important one, for making predictions of susceptibility to a chemical. Other important considerations for
susceptibility that are not evaluated using the SeqAPASS methodology include how well a chemical is
absorbed, distributed, metabolized, and eliminated, life stage, and other life history traits. Also, "relative"
indicates that the determination of sequence similarity between proteins is based on comparison to a
single protein sequence for a specific species. Additionally, we describe "intrinsic susceptibility" as the
vulnerability (or lack thereof) of an organism to chemical perturbation due to its inherent biological
composition.
Cross-species comparisons of proteins can be conducted through examination of sequence and structural
information, depending on how well the protein has been characterized and what is known about a
chemical-protein interaction. SeqAPASS allows the user to assess various levels of protein sequence
detail across species including comparisons of primary amino acid sequence (including ortholog
detection), functional domain(s), and individual amino acid residue positions. Each level requires a
greater understanding of the protein and its interaction with a chemical of interest (or similar ligand).
Because human and veterinary drugs, as well as pesticides, are designed to act specifically on well
characterized molecular targets, these chemical classes have proven useful for demonstrating the utility of
the SeqAPASS tool and its application to various hazard assessment/research scenarios.
The pertinent information necessary to begin a SeqAPASS query includes: the identification of a single
(or multiple) query species and a query protein, which would be the molecular target(s) of interest (e.g.,
receptor or enzyme).
The SeqAPASS algorithms mine, collect, and collate information from the National Center for
Biotechnology Information (NCBI) protein database, conserved domains database, taxonomy database,
and strategically utilizes the Stand-Alone Basic Local Alignment Search Tool for proteins (BLASTp) and
the Constraint-based Multiple Alignment Tool (COBALT).
http://www.ncbi.nlm.nih.gov/protein/
http://www.ncbi.nlm.nih.gov/cdd/
http://www.ncbi.nlm.nih.gov/taxonomy/
http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE TYPE=BlastDocs&DOC TYPE=Download
http://www.st-va.ncbi.nlm.nih.gov/tools/cobalt/re cobalt.cgi?
2
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Accessing SeqAPASS
For optimal SeqAPASS performance use Chrome or Mozilla Firefox
Access SeqAPASS using the following URL: https://www.seqapass.epa.gov7se(iapass/
Returning Users
Enter Username and Password
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Log In to SeqAPASS Version 2.0
Welcome to SeqAPASS
Username
Password
For optimal SeqAP
Want an account?
I
I
Login
ASS performance use Mozilla Firefox
To request an account, click here
About SeqAPASS Report a problem
Note: If user enters incorrect login information the following message will be displayed
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Log In to SeqAPASS Version 1.0
Welcome to SeqAPASS
Login Error Invalid credentials (Contact: LaLone.Carlie@epa.gov)
Username
Password
Login
For optimal SeqAPASS performance use Mozilla Firefox
Want an account? To request an account, click here
About SeqAPASS Report a problem
3
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
First time users
To request a username and password to access the SeqAPASS tool, select "click here" below the login
and a pop-up email will be presented. Send an email to LaLone,Carlie @ epa. gov requesting a password. A
reply email will be delivered to you with your temporary password. The login does not limit access to the
tool. Everyone that requests an account will be given one in a timely manner. Individual account allow
users to store all previous SeqAPASS runs.
On the Log in screen the user will enter the provided Login information:
Username: Email address
Password: Temporary password
Upon receiving your temporary password, login to SeqAPASS as described above. Click on "Settings"
Tab and Change the temporary password to a user defined password. This is completed by first: entering
the password from the reply email as the "Current Password" and then typing a new password and re-
entering the new password. Click "Change Password." User will then use the new password to login.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Log out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
Welcome to SeqAPASS Version 2.0 Logged in as: lalone.carlie@epa.gov
Change Password
Current Password
Enter New Password
Re-Enter New Password
Change Password
Note: Users can change their password at any time in the "Settings" tab.
4
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Messages from the SeqAPASS development team
Look for messages about planned version releases, data updates, and/or fixes to the SeqAPASS tool.
These will occasionally be displayed below the SeqAPASS banner when the development team has
information to share with SeqAPASS users.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
New to SeqAPASS Version 2 (See user guide for more details)
1 Data version descriptions and timely downloads of recent protein, taxonomy, conserved domain data from NCBI, as well as new versions of BLAST+ and COBALT executables (See About SeqAPASS page for details)
1 Capability to change default settings for Level 1 and Level 2 primary reports, such as changing the taxonomic lineage descriptions and choosing to not use species read-across for susceptibility predictions
1 Level 3: individual amino acid residue comparsions now allow the user to enter additional sequences using NCBI protein accessions or FASTA format to include in the alignments (include those not found in the original SeqAPASS output)
Log In to SeqAPASS
Vfersion 2.0
Welcome to SeqAPASS
Usemame £
Password |~
Fof optimal SeqAPASS perfofmance use Mozills Firefox
Want an account? To request an account, didc here
About SeqAPASS Report a problem
5
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
SeqAPASS Home Tab
The "Home" tab indicates who is logged in to the tool (right-hand of the screen) and contains links to
obtain information about the SeqAPASS tool (About SeqAPASS), including contact information for
support and references to published articles describing the SeqAPASS tool and it applications. Other
relevant references to databases and tools are also referenced. A link to the SeqAPASS User Guide can
also be found on this page. To Submit a Comment/Question click on the "Submit Comment/Question"
link to email the developer. "Log out" icon in upper right-hand corner of screen can be clicked at any time
to log out.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Log out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
Welcome to SeqAPASS Version 2.0 Logged in as: lalone.carlie@epa.gov
SeqAPASS Home
About SeqAPASS
SeqAPASS User Guide
Submit Comment/Question
6
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Request SeqAPASS Run Tab
Clicking the "Request SeqAPASS Run" tab opens a page to enter the query information necessary for a
SeqAPASS run. Each section of the "Request SeqAPASS Run" will be described below:
1 Sequence Alignment to Predict Across Species
1 Susceptibility (SeqAPASS)
Log out
1
Home Request SeqAPASS Run SeqAPASS Run Status
View SeqAPASS Reports Settings
Request Level 1 SeqAPASS Run
Version 2.0
Logged in j
as: LaLone.Carlie@epa.gov
Compare Primary Amino Acid Sequences
Select Search:
Query Species Selection
SBSSd
Query Protein Selection
SeqAPASS Submission
Final Query
Proteirrfs)
Select Search
There are two options for entering query information: "By Species" or "By Accession" (See radio buttons
to the right of "Select Search"). Selecting "By Species" will allow the user to enter text and select from a
dropdown list of species and then select a protein from any sequence available for that species in the
NCBI protein database. Selecting "By Accession" allows the user to enter a NCBI protein accession.
Sequence Alignment to Predict Across Species
Susceptibility (SeqAPASS)
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings
Request Level 1 SeqAPASS Run
Version 2.0 Logged in as: LaLone.Carlie@epa.gov
Compare Primary Amino Acid Sequences
Select Search:
By Species
By Accession
7
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Query "By Species "
Type the name of the query species of interest in the "Query Species Search" text box. The species
common name, scientific name, or Taxid (ID number derived from the NCBI taxonomy database) may be
typed into the search bar. This is the species you would like to compare all other species to. The search
bar has an auto-complete function and will generate a list of species with corresponding Taxid. When text
is typed into the search bar, the auto-complete function queries the database in the order of "starts with"
then "contains." If an integer is typed in the Search bar the autoconiplete function queries the database in
the order of "Taxid", "starts with", then "contains."
Sequence Alignment to Predict Across Species
Susceptibility (SeqAPASS)
Home Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings
Request Level 1 SeqAPASS Run
Version 2.0 Logged in as: LaLone.Carlie@epa.gov
Compare Primary Amino Acid Sequences
_ . . _ , • By Species
Select Search:
W By Accession
Query Species Selection
Homo sa|
Homo sapiens Linnaeus, 1758 (Taxid:9606)
Homo sapiens neanderthalensis (Taxid:63221)
Homo sapiens ssp. 'Denisova' (Taxid:741158)
Homo sapiens ssp. Denisova (Taxid:741158)
Homo sapiens x Mus musculus hybrid cell line (Taxid: 1131344)
Query Species Search:
Add Query Species
Query Species:
Note: The user can also use the NCBI taxonomy database to identify query species using the NCBI link
on the right-hand side of the "Add Query Species" button.
Select species of interest by clicking on the name in the drop-down box. Once species is selected, click
"Add Query Species" button. This advances the species of interest to the "Query Species" box and fills
the "Query Proteins" box with all available protein sequences for that species from the NCBI protein
database (although the box only displays the initial 200 proteins/species). The protein list includes the
protein NCBI accession, protein name, and species scientific name.
Query Species Selection
Query Species Search:
I
Add Query Species
NCBI Taxonomv Database
Query Species:
Query Protein Selection
Query Protein Search:
Filter Protein NCBI Protein Database
Query Proteins:
[NP_005711.1]actin-related protein 2/3 complex subunit 1B
[XP 006715888.1]PREDICTED: actin-related protein 2/3 complex subunit 1B isoform
P
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
To filter the query protein list, type the query protein name or partial name in the "Query Protein Search"
box and click the "Filter Protein" button. This action will filter the protein list in the "Query Proteins" box
to only display proteins that contain the user defined text. Proteins will be listed in alphabetical order
based on NCBI accession Example: typing "estrogen" retrieves all proteins that contain the word
"estrogen" in the protein name (the user can scroll to identify proteins of interest).
Query Protein Selection
Query Protein Search: estrogen
Filter Protein NCBI Protein Database
[AAA36523.1 ]estrogen sulfotransferase
[AAA52399.1]estrogen receptor
[AAA52402.1]estrogen receptor, partial
[AAA58461.1]estrogen receptor-related protein
|[A8MWY0.2jRecName: Full=UPF0577 protein KiAA1324-like; AltName; Full=Estrogei|
Add Selected Protein(s)
Note: To explore details associated with a protein of interest, click the "Search the NCBI Protein
Database" link to the right of the "Filter Protein" button to open NCBI proteins database (See
"SeqAPASS Documentation" Section of user guide for details about searching for query proteins using
NCBI database).
Highlight the protein or proteins of interest (Ctrl left click to select multiple proteins) in the "Query
Proteins" box and click "Add Selected Protein(s)" button. This moves the protein(s) of interest to the
"Final Query Protein(s)" box. To remove proteins from the "Final Query Protein(s)" box highlight those
to be removed and click the "Remove Selected Protein(s)" button. Select "Remove All Proteins" to
discard all proteins from "Final Query Protein(s)" box. The clear button removes all information
previously entered on the "Request SeqAPASS Run" page.
estrogen receptor
Query Protein Selection
SeqAPASS Submission
|[NP_Q01258805.1]estrogen receptor beta isoform 5
|[NP__0Q1278159.1]estrogen receptor isoform 2
|[NP__001278641.1]estrogen receptor beta isoform 2
[NP_Q01258806.1]estrogen receptor beta isoform 6
[NP_001278170.1]estrogen receptor isoform 3
[NP_001258805.1]estrogen receptor beta isoform 5
[NP_001278159.1]estrogen receptor isoform 2
[NP_001278641.1]estrogen receptor beta isoform 2
Request Run Clear
Final Query Protein(s)
Remove Selected
Protein(s) Remove All Proteins
Query Protein Search:
Query Proteins:
Filter Protein
NCBI Protein Database
9
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Once user identifies protein(s) to be queried, select "Request Run." A message will briefly appear in
upper right-hand corner of the screen for 10 seconds to alert the user of the request status.
Home Request SeqAPASS Run
Request Level 1 SeqAPASS Run
SeqAPASS Run Status View SeqAPASS Reports
Version 2.0 Loggi
gad
L
Compare Primary Amino Acid Sequences
Select Search:
•' By Species
©By Accession
Submitted
NP_001278159.1:
submitted
Submitted
NP_001278641.1:
submitted
Query Species Selection
Query Species Search:
Add Query Species NCBI Taxonomy Database
Query Species
Multiple proteins can be added to the final list for multiple SeqAPASS runs. If another query species is
desired, then move to the top to select the next species. Follow the process described above for selecting
the proteins associated with this species. The proteins populated in the "Query Proteins" box will always
be associated with the species highlighted in the "Query Species" box.
Note: In the current version of SeqAPASS, PLEASE do not request more than 10 query proteins at a
time to avoid longer wait times for the completion of a run.
Query Species Selection
Query Species
Search:
Add Query Species J NCBI Taxonomy D
Query Species: Homo sapiens (Taxid:9606)
Query Protein Selection
Query Protein Search: |~~
Filte
II Protein Database
Query Proteins:
[NP_776588-1]ras-related C3 botulinum toxin substrate 1 precursor -»
[P62998.1]RecName: Full=Ras-related C3 botulinum toxin substrate 1: AltName: FiQ
[AAF00714.1]GTPase
[AAI03062.1]Ras-related C3 botulinum toxin substrate 1 (rho family, small GTP bind
TNP 001015639.1126S protease requlatorv subunit 7
<' ¦ ] ~
Add Selected Protein) s)
Note: A user may check the progress of the run by clicking on the "SeqAPASS Run Status" tab. (See
SeqAPASS Run Status" section of the user guide for more information)
10
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Query "By Accession "
Users familiar with the NCBI database can utilize NCBI protein accessions (e.g., NP_000116.2) to query
the SeqAPASS tool. This is done by selecting the "By Accession" radio button to the right of the "Select
Search" text on the "Request SeqAPASS Run" page.
Sequence Alignment to Predict Across Species
Susceptibility (SeqAPASS)
Home
Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports
Settings
Request Level 1 SeqAPASS Run Version 2.0 Logged in as: LaLone.Carlie@epa.gov
Compare Primary Amino Acid Sequences
Select Search: Species
® By Accession
Upon selecting the "By Accession" radio button, a new query page will be displayed. Type the NCBI
protein accession (e.g., NP_000116.2) for the protein of interest (this Accession comes from the NCBI
protein database; See "SeqAPASS Documentation" for details) in the "NCBI Protein Accession" box. If
desired, more than one NCBI Accession may be entered into the "NCBI Protein Accession" box by
clicking the enter key after each additional NCBI Accession entry.
Upon clicking the "NCBI Protein Accession" text box, a pop-up message will appear in the lower right-
hand side of the text box, to provide an example for the proper format of Accessions to be entered.
SeqAPASS Submission
NCBI Protein Accession:
NCBI Protein Database
| Example: NP_000116.2[
Note: To avoid longer wait times for the completion of a run, in the current version of SeqAPASS, please
do not request more than 10 NCBI Accessions at a time.
Sequence Alignment to Predict Across Species
Susceptibility (SeqAPASS)
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
Request Level 1 SeqAPASS Run Version 2.0 Logged in as: LaLone.Carlie@epa.gov
Compare Primary Amino Acid Sequences
Select Search: ©"'a*®5
• By Accession
SeqAPASS Submission
JCBI Protein Database
NCBI Proiein Accession:
NP_000116
Request Run Clear
Clicking the "Clear" button will clear the "NCBI Protein Accession" text box.
11
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
After the NCBI accession(s) of interest have been typed in the "NCBI Protein Accession" box, click the
"Request Run" button. To remove proteins from the "NCBI Protein Accession" box click the "Clear"
button. A message will briefly appear in the upper right-hand corner of the screen to alert the user of their
run request status.
Sequence Alignment to Predict Across Species
Susceptibility (SeqAPASS)
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports
Request Level 1 SeqAPASS Run Version 2.0 Loggd^
Compare Primary Amino Acid Sequences
_ . , _ . wBy Species
Select Search:
(•) By Accession
SeqAPASS Submission
NCBI Protein Accession:
Request Run Clear
sICBI Protein Database
Note: All NCBI Accessions can include the version number (one digit after the decimal place, e.g.,
NP_000116.2). Otherwise, if the version is not included, the most recent version of the accession will be
queried automatically.
NP_001315029:
submitted:
NP 001315029.1
12
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
SeqAPASS Run Status
Level 1 SeqAPASS (primary amino acid sequence comparisons) status is displayed as the default. The
Accession in the column "Level 1 Query Accession" is that selected and queried by the user. For a query
to finish it must display "complete" in the BLASLp column, 100% in the "Common Domains" column,
and 100% in the "Ortholog Candidate" column. The "Common Domains" column displays the %
completion for running Reverse Position Specific (RPS)-BLASL (Default e-value of <0.01) on the
Accessions from the Level 1 Full Report. RPS-BLAST, and therefore "Common Domains" status, will
take the longest to complete. The "Ortholog Candidate" column displays the % completion for running a
reciprocal best hit BLAST evaluation for each hit sequence. The status for the "BLASTp" column is
described as "started," "analyzing," or "complete." If the user's successfully submitted query has entered
the run queue, the position of the submitted query in the queue will be indicated in the column (e.g., 2nd in
queue). The "Common Domains" and "Ortholog Candidate" columns will also describe the position of
the user's submitted query in the run queue. Once the run has begun processing, the % completed for
RPS-BLAST or reciprocal best hit BLAST, respectively, will be displayed. Please see example below:
Sequence Alignment to Predict Across Species
Susceptibility (SeqAPASS)
Log out
Home Request SeqAPASS Run SeqAPASS Run Status
View SeqAPASS Reports Settings
SeqAPASS Run Status
Version 2.0
Logged in as: LaLone.Carlie@epa.gov
® Level 1 Status
Level 2 Status
O Level 3 Status
SeqaPASS Level 1 Run Status
Search:! enter keyword |
SeqAPASS
Run Id -
Data Version
Level 1 Query
Accession J
BLASTp S
Common
Domains ;
Ortholog
Candidate ;
Start Date -
Date Completed -
SeqAPASS Run Duration S
1507
2
lalone.csrlie@eps gov
NP_001496.1
complete
100%
100%
2017 05 09 11:40:40
2017 05 09 11:45:01
4 minute(s) 21 seconds?)
1567
2
lslone.csrlie@epa gov
NP_001315029.1
complete
100%
100%
2017 05 09 11:40:40
2017 05 09 11:43:27
2 minute(s) 47 secondls)
1566
2
lalonecarlie@epa.gov
NP_001496.1
complete
100%
100%
2017 05 09 11:38:52
2017 05 09 11:45:01
6 minute(s) 9 seconds)
1565
1
lalone.carlie@epa.gov
NP_001116214.1
complete
100%
100%
2017 05 09 11:38:20
2017 05 09 11:38:20
1 seconds
1564
1
lalone.carlie@epa gov
NP_001116213.1
complete
100%
100%
2017 05 09 11:37:40
2017 05 09 11:37:40
1 seconds
1563
1
lalone.carlie@epa.gov
NP_000116.2
complete
100%
100%
201705 09 11:36:12
2017 05 09 11:36:12
1 seconds
1562
2
ialone.carlie@ep3.gov
NPJ301258805.1
complete
100%
100%
2017 05 09 11:22:07
2017 05 09 11:29:25
7 minutels) 18 9eoond(s)
1562
2
lalone.cariie@epa.gov
NP_001278159.1
complete
100%
100%
2017 05 09 1122:07
2017 05 09 11:40 06
17 minutefs) 59 seoondjs)
1562
2
lalone.carlie@epa.gou
NP_Q01278641.1
complete
100%
100%
2017 05 09 11:22:07
2017 05 0911:34:42
12 minute{s) 35 second{s)
1561
2
goldAdmin
ABI984251
complete
100%
100%
2017 05 08 11:16:19
2017 05 08 11:28:08
11 minutejs) 49 second (s)
< 1
111
i >
(1 of 116)
|1 2 3 4 5 6 7 8
9 10 '
|10'M Download Table:
Top of Page
The user can view the status of requested SeqAPASS runs. Each Run is assigned a unique "SeqAPASS
Run Id." A Run is considered a query that was requested either individually or as a batch in the "Request
SeqAPASS Run" tab. The user can view run start and end dates/times, and the duration of the run. (See
"Search, View, and Download Data Tables" section of user guide for more information). The "Data
Version" column indicates which version of NCBI data is being used (See "About" page for details on
Data Versions)
13
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The user is also able to view the status of Level 2 (Functional domain(s)) and Level 3 (individual amino
acid residue alignments).
View Level 2 Status by selecting the radio button. Also, while viewing the page, the user can click the
"Refresh Data" button to refresh the data. "Level 1 Query Accession" column displays the NCBI
accession selected and queried by the user. Please see below:
Sequence Alignment to Predict Across Species
Susceptibility (SeqAPASS)
Loo out
Home Request SeqAPASS Run SeqAPASS Run Status
View SeqAPASS Reports Settings
SeqAPASS Run Status
Version 2.0
Logged in as: LaLone.Carlie@epa.gov
0 Level 2 Status
O Level 3 Status
SeqaPASS Level 2 Run Status
Search:! tnier keyword | _
SeqAPASS Data Version ,
AccessKwTc' NCBI Accession 5 Domain Type S J BLASTp 5 Start Date i Date Completed S | SeqAPASS Run Duration c
6348
2
lBlone.c8flie@eps.gov
NP_0372631
NP_037263.1
NR_DBD_GR_PR
complete
2017 0 5 03 09:24:06
2017 05 03 09 24:12
6 seconds
6347
2
lalone. catl ie@epa gov
NP_037263.1
NP_037263 1
IIR_LBD_MR
complete
2017 05 03 09:23:52
2017 05 03 09:23:59
7 seconds
6346
2
lalone.oaflie@epa.gov
CAC38767.1
CAC38767.1
p450
complete
2017 05 02 13:28:27
2017 05 02 13:29:23
50 seconds
6345
2
lalone.carlie@epa.gov
WP—003156430.1
WP_003156430.1
RL11
complete
2017 05 02 13:23:55
2017 05 02 13:29:06
5 minutefs) 11 secondfs)
6344
2
lalone.carlie@epa.gov
WP_003156430 1
WPJJ03156430.1
Ribosomal_L11
2017 0502 13.23:50
2017 05 02 13:29:06
5 minutels) 16 seconds!
6343
2
Isione.oarl ie@epa.gcv
WP_003156430.1
WP_003156430.1
Ribosomal_L11
2017 05 02 13:23 43
2017 05 02 13:28:55
5 minutes) 12second|s)
6342
2
lalone.carlie@epa gov
WP_000428021 1
WP_000428021.1
AAA
complete
2017 05 02 13:18 55
2017 05 02 13:22:40
3 minute® 45 second
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View SeqAPASS Reports Tab
The "View SeqAPASS Reports" tab provides a table of completed SeqAPASS runs. From this page the
user can choose to either "View Report" or "Save Report(s)."
Sequence Alignment to Predict Across Species
Susceptibility (SeqAPASS)
Home Request SeqAPASS Run
SeqAPASS Reports
SeqAPASS Run Status View SeqAPASS Reports Settings
Version 2.0
Logged in as: LaLone.Carlie@epa.gov
@Partial Protein Sequence
Refresh Available Reports
©View Report
©Save Report(s)
The completed runs, by default, are listed in the order in which they were completed, with the most recent
runs at the top. The table includes information for each run, such as SeqAPASS Run ID (unique for every
run regardless of if it is the same protein/species combination ran twice), Data Version, Ortholog Count
(number of orthologs detected from the aligned hit sequences in Level 1; see Detailed Documentation
page 79), NCBI Accession, Query Protein Name, taxonomy information for the query species, and the
date/time of run completion.
While viewing the page, the user can click the "Refresh Available Reports" button to refresh the table
with additional completed runs. Partial protein sequences are highlighted in yellow as illustrated in the
example below. (See "Search, View, and Download Data Tables" section of user guide for more
information).
Sequence Alignment to Predict Across Species
Susceptibility (SeqAPASS)
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports Version 2.0 Logged in as: LaLone.Carlie@epa.gov
SPartial Protein Sequence
Refresh Available Reports
©View Report
QSave Report(s)
Available Reports
Search: Enter
keyword 1
SeqAPASS
Run Id *
Data Version
Query Protein Name i
NCBI
Taxonomy ID -
Query Species Name I
Query Common Name 0
taxonomy 0
o
1567
2
NP_001496.1
G-protein coupled estrogen receptor 1
9906
Homo sapiens
human
Mammalia
1567
2
NP_001315029.1
estrogen receptor isoform 4
9506
Homo sapiens
human
Mammalia
Q
1566
2
NP_001496.1
G-protein coupled estrogen receptor 1
9606
Homo sapiens
human
Mammalia
1565
1
NP_001116214.1
estrogen receptor isoform 1
9606
Homo sapiens
human
Mammalia
Q
1564
1
NP_001116213.1
estrogen receptor isoform 1
9606
Homo sapiens
human
Mammalia
1563
1
NP_000116.2
estrogen receptor isoform 1
9606
Homo sapiens
human
Mammalia
Q
1562
2
NP_001258805.1
estrogen receptor beta isoform 5
9606
Homo sapiens
human
Mammalia
O
1562
2
NP_0Q1278159.1
estrogen receptor isoform 2
9606
Homo sapiens
human
Mammalia
o
1562
2
NP_001278641.1
estrogen receptor beta isoform 2
9606
Homo sapiens
human
Mammalia
1561
2
ABI38425.1
wingless, partial
405016
Ajgynnis laodice
Pallas' fritillBiy
Insecta
< 1
<1 of 116) 1°|,2 3 4 ! 5 j 6 7 8 9 10 • ' 10[£j Download Table:
Top of Page
15
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View Report
To select a completed run to view Level 1 data, select the corresponding radio button in the first column
of the table and click "Request Selected Report." This will open the Level 1 page to view the Level 1 data
and to set up queries for Level 2 and Level 3.
Note: The user MUST select a radio button PRIOR to clicking "Request Selected Report." If the user
fails to select a radio button and clicks "Request Selected Report" a Spinning Wheel will appear and
disappear and no completed run will be opened. Further, there is no pop-up message indicating that the
user did not select a radio button.
Sequence Alignment to Predict Across Species
Susceptibility (SeqAPASS)
Home Request SeqAPASS Run
SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports
Version 2.0
Logged in as: LaLone.Carlie@epa.gov
SPartial Protein Sequence
Request Sele
cted Report Refresh Available Reports
®View Report
QSave Report(s)
Available Reports
Search|Ente^
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Save Report(s)
To download completed Level 1, 2, and/or 3 data, select the "Save Report(s)" radio button. Upon doing
so the user can select which accession(s) to download by clicking the checkbox in the first column of the
table associated with desired accession and click "Save Selected Report(s).''
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Home | Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports Version 2.0
Logged in as: LaLone.Carlie@epa.gov
BPartial Protein Sequence . Save Selected Reportjs) Refresh Available Reports
J View Report
©Save Report(s)
Available Reports
Search: Enter keyword |
SeqAPASS
Run Id -
Data Version
Accession - '
Query Protein Name
NCBl
taxonomy ID :
Query Species Name s
Query Common Name ;
Taxonomy I
1583
1583
2
NP_001267534 1 1
AAL06716.1
CD8a molecule precur
type III GrlRH receptor splioe
«*
9739
8-300
T ursiops truncatus
Rans catesbeiana
botflenosed dolphin
American bullfrog
Amphibia
<1 '» 1 ~
(1 of 120)
'1 2 3 4 5
6 7 8 9 10 ' V 100
The user can also deselect data that is not wanted in the download by scrolling to the far right of the table
and deselecting the checkboxes for the different levels of the SeqAPASS analysis. By default, all
available data for the selected accession will be downloaded in a zip file.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports Version 2.0
Logged in as: LaLone.Carlie@epa.gov
SPartial Protein Sequence
u 'View Report
®Save Report(s)
Available Reports
Search: Enter keyv
Query
Level! Leve 2 Leve 3
bullfrog
(1 of 120) 12 3 4 5 6 7 8 9 10 " " 10Q
Top of Page
17
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
A zip file will be created for all of the selected Reports. Ensure in the popup you select WinZip to open
the file and click OK.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Download
Your download will begin
momentarily.
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports
Version 2.0
Opening seqapass.zip
You have chosen to open:
m
:
^Partial Protein Sequence
1 Save Selected Reports) Refresh Available Reports
which is,
WinZip File
¦--View Report
from: https://seqapassstage.rtpnc.epa.gov
®Save Report(s
What should Firefox do with this file?
Available Reports
0 Save File
0 Do this automatically for files like this from now on,
Search:! Enter keyword ]
SeqAPASS
Run Id •
Data Version
Level 1 Query
Query Protein Name c
NCBI
laxonomy ID t
Query Specie
1584
AAB53939.1
beta-3-adrenergi c receptor
9544
Macaca m
[ OK | Cancel
~
1583
NP_001287576.1
sodiunVpotassiunv'calcium exchanger 1
9739
Tursiops tri
U
1583
«
AAQ03208 1
T-oell surface glycoprotein CD4 precursor
9739
1
a
§
~
1583
2
P68279.2
RecName: Full=Myoglobin
9739
T ursiops tnmcatus
bottlenosed dolphin
Mammalia
y
1583
2
NP_001267534.1
CDS a molecule precursor
9739
T ursi ops truncatus
bottlenosed dolphin
Mammalia
~
1583
2
AAL06716.1
type III GnRH receptor splice variant 1
8400
Rana catesbeiana
American bullfrog
Amphibia
y
1583
2
AAG31441.2
RC-RNase4 ribonudease precursor
8400
Rana catesb
eiana
American bullfrog
Amphibia
1583
2
ACD44939.1
kisspeptin receptor
8400
Rana catesbeiana
American bullfrog
Amphibia
y
1583
2
AAK85138.1
cadherin-lite protein
7102
Helicthis virescens
tobacco budworm
Insecta
~
1583
2
CAA10110.1
7102
Heliothis virescens
tobacco budworm
(1 of 120) 1 2 3 4 5 6 7 8 9
¦ :
Top of Page
A pop-up seqapass.zip file should appear with data files for each selected report. The naming convention
is the NCBI Protein Accession and the Data Version (e.g., AAG31441.2_v2).
10 6 |I> » I seqapass - WinZip [-en ||-B
Unzip/Share Edit Backup Tools Settings View
Help Upgrade
#
Files >
seqapass.zip
Actions
Recent Zip Files
Unzip All Files
tS\t seqapass.zip
% , 1
i AAG31441.2_v2
F Type Folder
Date modified: 5/17/2017 8:58 AM
A Unzip to:
fcP \\Aa.ad.e...\seqapass
seqapass- Lzip
i AAK85198.1_v2
F i Type: Folder
Date modified: 5/17/2017 8:58 AM
Convert & Protect Files
i3li seqapass-2.zip
m , 1
i AAQ03208.1_v2
Date modified: 5/17/2017 8:58 AM
When adding files to this zip:
f Type: Folder
Q Encrypt
i ACD44939.1_v2
Date modified: 5A7/2017 8:58 AM
r
Places
Convert to PDF 0
]
F Type: Folder
>•-, Favorites
j CAA10110.1_v2
Date modified: 5/17/2017 8:58 AM
Resize Photos O
•• HiBj
F', Type: Folder
i Watermark
M—i]
" * "| Libraries
LrSs
| N P_001267576.1_v2
f Type: Folder
Date modified: 5/17/2017 8:58 AM
Save or Share Zip
ilft-l Computer
382 GB free of 464 GB
i P68279.2_v2
I Type Folder
Date modified: 5/17/2017 8:58 AM
Save as...
Network
05 Email
1 I 7 item(s)
Zip File: 44 item(s), 130 MB
18
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
By clicking on one of the Reports for a Protein Accession_version, all available files for each Level of the
SeqAPASS evaluation are available.
Note: This download includes default settings only. If susceptiblity cut-off or any defaults were
manipulated on Level 1 or 2 pages they will NOT be downloaded here and can ONLY be downloaded
directly from the Level 1 or Level 2 page where the setting was manipulated by the user. Also, data
visualizations can ONLY be downloaded from the Level 1 and 2 pages. They DO NOT populate in the
zip file folders.
! @ If' ' seqapass-2 -winzip CT-siaa
Unzip/Share Edit
Backup Tools Settings View
Help
Upgrade
#
Files
Recent Zip Files
V
©
AAB53939.1.
seqapass-2.zip
_v2
Actions
Unzip Selected Files
aw seqapass-2-zip
"tjI | 1
LevellReports
Type: Folder
Date modified: 5/17/2017 9:03 AM
A Unzip to:
& \\Aa.ad..\seqapass-2
aw seqapass-l.zip
| 1
I
Level2Reports
Type: Folder
Date modified: 5/17/2017 9:03 AM
Convert & Protect Files
aw seqapass.zip
m , 1
Level3Reports
Date modified: 5/17/2017 9:03 AM
When adding files to this zip:
f
Type: Folder
fl Encrypt Off
=
Places
f|> Convert to PDF C ff
' Resize Photos Off B^^MI
V, Favorites
\
Watermark Off
Libraries
Save or Share Zip
|JLi Computer
382 GB free of 464 GB
Save as...
-
Networlc
iB Email
~ 3 item(s)
Zip File: 78 item(s), 1.88 MB
By selecting Level 1 Reports, both full and primary reports are available as csv files as well as a graphic of
the density plot for determining the susceptibility cut-off.
- a b si * seqapass-2 - WinZip | p-|| GO-1!¦£*¦!
ILdBi Unzip/Share Edit Backup Tools Settings View Help Upgrade
#
Files >
Recent Zip Files
(Z) LevellReports
>¦—y seqapass-2.zip ~ AAB53939.1_v2
Actions
Unzip Selected Files
ai • seqapass-2,zip
k i
AAB53939.l_Full_v2.CSV Date modified: 5/17/2017 9:03 AM
Qa.l Type: Microsoft Excel Comma Separated Values File Size: 167 KB ¦* 44.8 KB
Unzip to:
0 \\Aa.ad....\seq3pass-2
ay seqapass-l,zip
m k 1
AAB53939.l_Full_v2_CUtoff.png Date modified: 5/17/2017 9:03 AM
Type: PNG Image Size: 16.0 KB -* 14.6 KB
Convert & Protect Files
aw seqapass.zip
k i
Places
t Favorites
J^i AAB53939.l_Primary_v2.CSV Date modified: 5/17/2017 9:03 AM
Br-'] Type: Microsoft Excel Comma Separated Values File Size: 105 KB -> 26.3 KB
AAB53939.l_Primary_v2_cutoff.png Date modified: 5/17/2017 9:03 am
Type: PNG Image Size: 161 KB ¦+ 14.7 KB
When adding files to this zip:
Encrypt Off
pj, Convert to PDF Off B^M
q Resize Photos B^^M
Watermark Off
"* "| Libraries
Save or Share Zip
hAm' Computer
382 GB free of 464 GB
Hj Save as...
Network
® Email
~ 4 item(s) Zip File: 78 item(s), 1.88 MB
19
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
By selecting Level2Reports, all completed domain comparisons will be available and named by NCBI
domain accession with the starting amino acid residue position for the domain (e.g., pfam00001(54)).
H b - seqapass-2 - WinZip l-cj tl S- ILgjJ
Unzip/Share Edit
Backup Tools Settings View Help
Upgrade
#
Files
> Level2Reports
seqapass-2.zip ~ AAB539391_v2
Actions
Recent Zip Files
Unzip Selected Files
j-gf seqapass-2.zip
j pfam00001(54)
9 Type; Folder
Date modified: 5/17/2017 9:03 AM
A Unzip to:
0 \\Aa.ad....\seqapass-2
,gy seqapass-l.zip
, 1
j pfaml0320(54)
f Type: Folder
Date modified: 5/17/2017 9:03 AM
Convert & Protect Files
¦su seqapass.zip
i 1
i pfaml3853(54)
Date modified: 5/17/2017 9:03 AM
When adding files to this zip:
f- Type; Folder
ft Encrypt Off
Places
Favorites
pj/ Convert to PDF Off
Resize Photos "
Watermark Off
""' j Libraries
Save or Share Zip
Computer
382 GB free of 464 GB
0, Save as...
Network
Si Email
~ 3 item(s)
Zip File: 78 item(s), 1.88 MB
Upon selecting a domain file to view, both full and primary reports are available as csv files as well as a
graphic of the density plot for determining the susceptibility cut-off.
•I 0 ^ ' seqapass-2 - WinZip
Unzip/Share Edit
Backup Tools Settings View Help Upgrade
#
Files
Recent Zip Files
> pfam00001(54)
* seqapass-2.zip ~ AAB539391_v2 > Level2Reports
Actions
Unzip Selected Files
seqapass-2.zip
©
pfam00001(54)_Full_v2.csv
Type: Microsoft Excel Comma Separated Values File
Date modified: 5/17/2017 9:03 AM
Size: 191 KB -» 45.0 KB
A Unzip to:
\\Aa.ad....\seqapass-2
seqapass-l.zip
p f a rnOOOOl (54)_Fu 1 l_v2_cutof f. p n g
Type: PNG Image
Date modified: 5/17/2017 9:03 AM
Size: 18.4 KB 171 KB
Convert 8t Protect Files
>5ftf seqapass.zip
m, i
£a]
pf a m00001(54)_Pri m a ry_v2.csv
Type: Microsoft Excel Comma Separated Values File
Date modified: 5/17/2017 9:03 AM
Size: 162 KB -» 37.4 KB
Date modified: 5/17/2017 9:03 AM
Size: 18.4 KB -> 171 KB
When adding files to this zip:
Encrypt 1
Places
^ Favorites
pf a m00001(54)_Pri m a ry_v2_cutoff .p ng
Type: PNG Image
Convert to PDF Off
1 Resize Photos
JL Watermark Off
Libraries
Save or Share Zip
Computer
382 GB free of 464 GB
M, Save as...
Network
>35 Email
I | 4 item(s)
Zip File: 78 item(s), 188 MB
20
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
By selecting LeveDReports, all user defined Level 3 alignments are available as csv.
Note: These csv files show the alignments across the entire sequence, not just those amino acid residues
selected by the user.
I B E» T seqapass-2 ¦
[jJQH Unzip/Share Edit
Files
Recent Zip Files
^¦v seqapass-2.zip
ggw seqapass-l.zip
, 1
WinZip
Backup Tools Settings View Help Upgrade
n r
Places
" " | Libraries
P Computer
" 382 GB free of 464 GB
> (£) Level3Reports
seqapass-2.zip ~ AAB53939.1 v2
£l
3 try(318) V2.CSV Date modified: 5/17/2017 9:03 AM
Type Microsoft Excel Comma Separated Values FifeSize: 22.0 KB 4.77 KB
closer yet(310) V2.CSV Date modified: 5/17/2017 9:03 AM
Type: Microsoft Excel Comma Separated Values FileSize: 513 KB -~ 7.38 KB
four(316) V2.CSV Date modified: 5/17/2017 9:03 AM
Type: Microsoft Excel Comma Separated Values FileSize: 28.6 KB •¥ 4.98 KB
multi part test(313)_v2.csv Date modified: 5/17/2017 9:03 AM
Type: Microsoft Excel Comma Separated Values FileSize: 34.7 KB -* 8.06 KB
multi_test with non canonicals(320)_v2.... Date modified: 5./17/2017 9:03 AM
Type: Microsoft Excel Comma Separated Values FileSize: 31.8 KB 7.95 KB
not yet working(309)_v2.csv Date modified: 5/17/2017 9:03 AM
Type: Microsoft Excel Comma Separated Values FileSize: 512 KB 8.57 KB
repeat of 301(311)_v2.csv Date modified: 5A7/2017 9:03 AM
Type: Microsoft Excel Comma Separated Values FileSize: 31.5 KB 8.02 KB
Should be 3(319)_v2.C.SV Date modified: 5/17/2017 9:03 AM
Type Microsoft Excel Comma Separated Values FileSize 25.2 KB -¥ 4.76 KB
#
Actions
Unzip Selected Files
^ Unzip to:
\\Aa.ad,. ..\seqapass-2
Convert & Protect Files
When adding files to this zip:
ft Encrypt j
Convert to PDF Off |
q Resize Photos |
Save or Share Zip
hR Save as...
05 Email
~ 14 item(s)
Zip File 78 item(s), 1.88 MB
21
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1: Primary Amino Acid Sequence Alignment
From the "View SeqAPASS Reports" tab, upon selecting a radio button and clicking "Request Selected
Report" the Level 1 data will be displayed.
The "Level 1 Query Protein Information" box contains the SeqAPASS Run ID, Query Accession,
Ortholog Count (# of hits identified as ortholog candidates to the query species protein sequence), NCBI
Data updates ("Protein and Taxonomy Data:" displays the date that NCBI databases were downloaded
and incorporated into the SeqAPASS database; BLAST Version: and Software Version: displays the
version being used by the SeqAPASS tool for the selected data), Query Species, and Query Protein. Other
information in this box will be described below.
Sequence Alignment to Predict Across Species
Susceptibility (SeqAPASS)
Log out
Home Request SeqAPASS Run SeqAPASS Run Status
View SeqAPASS Reports Settings
SeqAPASS Reports
Version 2.0
Logged in as: LaLone.Carlie@epa.gov
Main Level 1
Level 1 Query Protein Information
Hit proteins are identified for the following query protein. Use the main button to go t
SeqAPASS ID: 1563 Querv Accession: NP 000118 2
Query Species: Homo sapiens
Query Protein: estrogen receptor isoform 1
sac* to the SeqAPASS Reports list.
Ortholog Count 281 Protein and Taxonomy Data: 11/09/2016
BLAST Version: 2.3.0
Software \fersion: 1.0
The default table displayed at the bottom of the page is the "Primary Report", which includes query
protein information in the first row below the column titles, followed by hit proteins whose sequences
aligned with the query protein. The hit proteins are ordered from the highest to lowest percent similarity
(Maximum percent similarity =100%). For each hit protein, Data version, NCBI Accession and species
information is provided including the "Protein Count" which indicates the number of protein records per
species in the NCBI protein database, taxonomic information (See "Primary Report Settings" section
below in user guide for more detail on "Taxonomic Group" versus "Filtered Taxonomic Group"
columns), and species names. Also included are the NCBI protein accession, protein name, BLASTP
bitscore (describes overall quality of the alignment, See BLASTp tutorials), and percent similarity ([hit
bitscore/query bitscore] *100). If the hit protein has been identified as an ortholog candidate (using
reciprocal best hit blast method), it will be noted with a "Y" for yes or if not an ortholog candidate, a "N",
for no. If the hit protein is predicted to be susceptible according to the susceptibility cut-off criteria, that
will also be noted with a "Y" for yes or alternatively an "N" for no. The date the analysis was completed
is also identified. The data also includes a column describing the number of ortholog candidates identified
using the reciprocal best hit BLAST method. The susceptibility cut-off is also listed in a column. The cut-
off is determined through identifying local minimums in the density plot of the percent similarity values
for the primary report data set and evaluation of ortholog candidates. Additionally, there is a column that
identifies if the species is a Eukaryote noted with a "Y" for yes or alternatively an "N" for no. Links out
to the NCBI Protein Database and NCBI Taxonomy Database (specific to the data row) are embedded in
the Level 1 data table for "NCBI Accession," "Species Tax ID," "Scientific Name," and "Protein Name"
columns. (See "Search, View, and Download Data Tables" section of user guide for more information).
22
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Default highlights identify partial protein sequences, sequences with a bitscore higher than the query
sequence and therefore percent similarity greater than 100% (commonly synthetic constructs), and when
zero ortholog candidates are identified (in this case a user should consider a different query sequence or
check the full report). Please see "Susceptibility Cutoff Box for Level 1" section of user guide for details
when no orthologs are detected. Additionally, the default setting for the report shows only eukaryote data
if a eukaryote is selected as the query protein, excluding prokaryote data from the table with the "Show
Only Eukaryotes" checkbox checked. Lo view prokaryote data, deselect this checkbox. If a prokaryote is
selected as the query protein, the default setting will include both eukaryote and prokaryote data and the
"Show Only Eukaryotes" checkbox will not be selected. Lo limit the data to eukaryotes only, the user
would check the "Show Only Eukaryotes" checkbox.
Columns in left side of table:
151 Partial Hit Protein Sequence
® Primary Report ®
Qpull Report
PI Show Only Eukaryotes
Level 1 Data - Primary
Search:! Enter keyword j
,—, Data Protein Species Taxonomic Filtered „ .. i _ . -
NCBI Accession : „ Taxonomic Scientific Name s Common Name : Protein Name C
Version Count s Tax ID 5 Group C
y
1
NP 000116.2
1058918
9606
Mammal
a
Mammalia
Homo sapiens
human
estroaen receptor isoform 1
~
1
ABY64717.1
2004
9593
Mamma
a
Mammalia
Gorilla gorilla
western gorilla
estrogen receptor alpha
y
t
XP 003311596.1
113964
9598
Mammal
a
Mammalia
Pan troqlodytes
chimpanzee
PREDICTED: estroaen receptor isoform X2
a
1
XP 018884801.1
68319
9595
Mammal
a
Mammalia
Gorilla aorilla aorilla
western lowland gorilla
PREDICTED: estroaen receotor isoform X2
y
1
XP 003811544.1
49145
9597
Mammal
a
Mammalia
Pan paniscus
pygmy chimpanzee
PREDICTED: estroaen receptor isoform X2
~
t
ABY64718.1
1703
9600
Mammal
3
Mammalia
Ponoo ovomaeus
Bomean orangutan
estroaen receptor aloha
y
1
XP 002817538.1
44332
9601
Mammal
a
Mammalia
Ponoo abelii
Sumatran orangutan
PREDICTED: estroaen receotor isoform X2
~
t
XP 011922091.1
66739
9531
Mamma
a
Mammalia
Cercocebus atys
sooty manga bey
PREDICTED: estroaen receptor isoform X2
y
t
XP 014992596.1
86552
9544
Mammal
a
Mammalia
Mscaca mulatta
Rhesus monkey
PREDICTED: estroaen receotor isoform X2
~
1
XP 005552209.1
121048
9541
Mammalia
Mammalia
Macaca fascicularis
crab-esti ng macaque
PREDICTED: estroaen receptor isoform X1
— -= v .
(1 of 82) 1 2 3456 7, 8 9 10 " : 10Q Download Table:
Top of Page
Columns in right side of table:
M Partial Hit Protein Sequence
®Primary Report ® RJISimilarity
@ Show Only Eukaryotes
Level 1 Data - Primary
Search:] Enter keyword |
Common Name 5
. _ BLASTp Ortholog
Bitscore ; Candidate ;
Ortholog
Count
_ „ Percent Susceptibility > . „ ...
Cut-off : .. .. „ .. .. Analysis Completed C
Similarity c Prediction ;
Eukaryote
human
estrooen receptor isoform 1
1241.9
Y
281
33.9
100.0
Y
2017 03 0216:17:23
Y
western gorilla
estroaen receotor aloha
1229.5
Y
281
33.9
99.0
Y
2017 03 02 16:17:23
Y
chimpanzee
PREDICTED: estroaen receotor
soform X2
1229.5
Y
281
33.9
990
Y
2017 03 02 16:17:23
Y
western lowland gorilla
PREDICTED: estroaen receotor
soform X2
1228.8
Y
281
33.9
98.9
Y
201703 0216:17:23
Y
pygmy chimpanzee
PREDICTED: estroaen receotor
soform X2
1228.0
Y
281
33.9
98 9
Y
2017 03 0216:17:23
Y
Bornean orangutan
estroaen receotor aloha
1227.6
Y
281
33.9
98.9
Y
201703 02 16:17:23
Y
Sumatran orangutan
PREDICTED: estroaen receotor
soform X2
1227.6
Y
281
33.9
98.9
Y
2017 03 0216:17:23
Y
sooty manga bey
PREDICTED: estroaen receotor
soform X2
1227.2
Y
281
33.9
98.8
Y
2017 03 02 16:17:23
Y
Rhesus monkey
PREDICTED: estroaen receotor
soform X2
1227.2
Y
281
33.9
98.8
Y
2017 03 0216:17:23
Y
crab-eating macaque
PREDICTED: estroaen receotor isoform X1
1227.2
Y
281
33.9
98.8
v
2017 03 0216:17:23
(1 of 82) 1 2 3 4 5 6 7 8 9 10 * • 10Q Download Table:
Top of Page
23
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1: Primary Report Settings
Default settings:
The "Primary Report Settings" box allows the user to view default settings on the table below and
manipulate certain settings. The "Primary Report Settings box is only available on the "Primary Report"
display, not the "Full Report." The default settings show data for hits whose E-value are < 0.01 and have
been identified to have > 1 domain in common with the query sequence. The default setting for the
"Sorted by Taxonomic Group" is "class," therefore the "Filtered Taxonomic Group" column in the table
is set to identify and report the taxonomic lineage of "class" from the NCBI Taxonomy Database.
However, if class is not identified in the NCBI Taxonomic Hierarchy associated with the hit accession,
then the algorithm will report the next available Taxonomic Group moving from class to subclass, to
superorder, to order, to suborder, to superfamily, to family, to subfamily, to genus. Finally, the
susceptibility predictions are set by using species read-across. (Please view Documentation Section of the
User Guide for details on Read-Across settings). Briefly, Species Read-across is used to set the
susceptibility prediction, where all ortholog candidates are Susceptible = Y; all species listed above the
susceptibility cut-off are Susceptible = Y; all species below the cut-off from the same taxonomic group of
one or more species above the cut-off are Susceptible = Y; and those below the cut-off that are not
ortholog candidates and do not belong to a taxonomic group above the cut-off are Susceptible = N.
Primary Report Settings
Sorted by Taxonomic Gcoup: class
Common Domains:
Species Read-Across:
Update Report
Use Default Settings
Changing Default Settings:
The "E-value" and "Common Domains" settings can be manipulated by the user by entering the desired
E-value or number of Common Domains in the respective text boxes and clicking "Update Report." The
table and data visualization will automatically be updated after a few seconds. The user may choose to
change the level of the taxonomic hierarchy that is used for the susceptibility prediction. From the Sorted
by Taxonomic Group" dropdown the user may choose to display a different taxonomic group in the
"Filtered Taxonomic Group" column of the data table.
Primary Report Settings
Sorted by Taxonomic Group:
Common Domains:
Species Read-Across:
subclass
superorder
order
suborder
superfamily
family
subfamily
genus
24
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
If the user chooses "order" for example, the "Filtered Taxonomic Group" column in the data table will
report the taxonomic lineage of "order" from the NCBI Taxonomy Database and all species read-across
for the susceptibility prediction will be based on order instead of class. The data visualization will also
update. As described previously, if order is not identified in the NCBI Taxonomic Hierarchy associated
with the hit accession, then the algorithm will report the next available Taxonomic Group moving from
suborder, to superfamily, to family, to subfamily, to genus. Upon selecting the Taxonomic Group from
the dropdown and clicking "Update Report," the Level 1 Data for the Primary report will update to the
selected taxonomic level.
Primary Report Settings
Sorted by Taxonomic ©roup:
Common Domains
IE
Species Read-Across: Yes
Update Report Use Default Settings
Choose Domain to View
| -Select Completed Domain -
View Level 2 Data
NCBI COBALT
Enter Level 3 Run Name
I
NCBI Taxonomy Database
Choose Taxonomic Groupfs)
I All Groups
Use table below to select seat
0 species selected
Request Residue Run
Vie\
Choose Query to View
[ -Select Level 3 Run Name -
View Level 3 Data
H Partial Hit Protein Sequence
® Primary Report
0 Show Only EuKaryotes
Level 1 Data - Primary
_ Searcfcj Enter keyword l
Data
Version
NCBI Accession 5
Protein
Count ;
Tax ID :
r Filtered
Taxonomic _
_ Taxonomic
Group ; _
Group 5
Scientific Name « Common Name S
~
2
NP 000116.2
1058918
9606
Mammalia
Primates
Homo saoiens
human
H
2
ABY64717.1
2004
9593
Mammalia
Primates
Gorilla aorilla
western gorilla
P
2
XP 003311596.1
113964
9598
Mammalia
Primates
Pan troolodvtes
chimpanzee
~
2
XP 018884801.1
68319
9595
Mammalia
Primates
Gorilla aorilla aorilla
western lowland gorilla
P
2
XP 003811544.1
49145
9597
Mammalia
Primates
Pan paniscus
pygmy chimpanzee
~
2
ABY64718.1
1703
9600
Mammalia
Primates
Ponao ovomaeus
Bornean orangutan
P
2
XP 002817538.1
44332
9601
Mammalia
Primates
Ponoo abelii
Sumatran orangutan
~
2
XP 011922091 1
e«739
9531
Mammalia
Primates
Cercocebus atvs
sooty mangabey
u
2
XP 014992596.1
86552
9544
Mammalia
Primates
Macaca mulatta
Rhesus monkey
p
2
XP 005552209.1
121048
9541
Mammalia
Primates
Macaca fascicularis
crab-eating macaque
The user may also choose to turn species read-across off, by using the "Species Read-Across" drop-down
and selecting "No" and clicking "Update Report." When "No" is selected, the susceptibility predictions
will only be "Y" in the table below if Percent Similarity is above the Cut-off or if the hit is identified as
an Ortholog Candidate, yes or "Y." Any hit below the cut-off will yield a susceptibility prediction of no
or "N."
Primary Report Settings
E-value:
Sorted by Taxonomic Group:
Common Domains:
Species Read-Across:
order
<«> Lil
Yes
Update Report
wit Settings
25
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The user can select the "Full Report" on the "Level 1" page, which includes the same information as the
"Primary Report" and additional information pertaining to the alignment of the protein sequence using
BLASTp. Additional information includes the number of amino acid residues in the sequence (Hit
Length), the number of exact matching amino acids between the hit and query sequence (Identity), the
number of exact and similar matches in amino acids between the hit and the query sequence (Positives),
the expect value (E-value) describing the number of different alignments expected to occur in the
database search by chance, and the conserved domain count. The conserved domain count identifies all
domains associated with the query protein in the NCBI conserved domains database (Specific hits, Non-
specific hits, Superfamilies, and Multi-domains; See NCBI conserved domains database for details).
SeqAPASS algorithms record the query sequence coverage of each curated domain and compares that
coverage to that of the hit sequence. If the hit sequence covers the curated domain greater than or equal to
the query sequence, then the domain is considered a common domain between the hit and query. The
number of common domains comparing each hit sequence to the query sequence are summed and
reported. This column displays "0" when the hit protein and query protein do not have any common
domains. (See "Search, View, and Download Data Tables" section of user guide for more information).
OPrimary Report
®Full Report
Level 1 Data - Full
Search: Enter keyword |
Hit Length i
Identity '
Positives 0
Evalue C
BLASTp
Bits core c
Ortholog
Candidate
Ortholog
Cut-off;
Common
Domain Count c
Percent
Similarity s
Susceptibility
Prediction s
Analysis Completed i
Eukaryote
isoform 1
595
595
595
O.OOOEO
1241.9
Y
281
33.9
78
100.0
Y
201703 0216:17:23
Y
x alpha
595
590
592
O.OOOEO
1229.5
Y
281
339
75
99.0
Y
2017 03 02 16:17:23
Y
ceotoi isoform X2
595
590
592
O.OOOEO
1229.5
Y
281
33.9
75
99.0
Y
201703 02 16:17:23
Y
ceotc-r isoform X2
595
589
592
O.OOOEO
1228.8
Y
281
33.9
75
98.9
Y
2017 03 02 16:17:23
Y
ceotor isoform X2
595
5S9
592
0OOOEO
1228.0
Y
281
33.9
75
98.9
Y
2017 03 02 16:17:23
Y
:-r alpha
595
5S9
591
O.OOOEO
1227.6
Y
281
33.9
75
98.9
Y
2017 03 02 16:17:23
Y
ceotor isoform X2
595
589
591
O.OOOEO
1227.6
Y
281
33.9
75
98.9
V
2017 03 02 16:17:23
Y
ceotor isoform X2
595
588
592
O.OOOEO
1227.2
Y
281
33.9
75
98.8
Y
2017 03 02 16:17:23
Y
ceotor isoform X2
595
588
592
O.OOOEO
1227.2
Y
281
33.9
75
98.8
Y
2017 03 02 16:17:23
Y
ceotor isoform X1
595
588
592
O.OOOEO
1227.2
Y
281
33.9
76
98.8
Y
2017 03 02 16:17:23
Y
< I I >
(1 of 85)
g)_2 3 4 5 6
7 8 9 10
10F1 Download Table:
Top of Page
Note: SeqAPASS v2.0 and newer parse the BLASTP query and hit accessions to identify all the
species/accessions from identical proteins. Therefore, if a hit sequence represents multiple species, all
species with the identical sequence will be found in the data tables for Level 1. To determine which
sequence/species was identified from BLASTP as a hit and which sequence/species was parsed from the
identical sequence, view the "Full Report" for Level, column "Identical Protein," where "N" is indicative
of the original hit sequence and "Y" is the parsed sequence.
Partial Hit Protein Sequence
Show Only Eukaryotes
26
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Susceptibility Cutoff Box for Level 1
The susceptibility prediction is determined by identifying ortholog candidates, sequences above a defined
susceptibility cutoff, or by identifying those species below the susceptibility cut-off from an organism
class above the susceptibility cutoff. The default susceptibility cut-off is set by plotting the distribution of
percent similarities calculated for each hit protein. From this plot, the critical points are identified and the
local minimums and maximums reported. Using the ortholog candidate data, a susceptibility cut-off is
automatically determined by identifying the first ortholog candidate at an equal or higher percent
similarity than the first local minimum percent similarity. The user can view this graph by clicking the
"View Cutoff' button in the "Susceptibility Cut-off' box, which will open a new tab in the web browser.
Radio buttons located to the right of the graphical display indicate which Cut-off has been applied for the
evaluation of susceptibility in the report. These radio buttons can be selected to change the cut-off in the
table to the 2nd local minimum, where the 2nd local minimum is identified in the density plot and the first
ortholog candidate at an equal or higher percent similarity than the second local minimum percent
similarity is used to set the cut-off. Or the user can define the local minimum by clicking on the "User
Defined" radio button, which again opens a new tab in the web browser. Alternatively, the user can view
and closely examine the density plot and manipulate the cut-off by clicking the "View Cutoff' button.
Susceptibility Cut-off
-.•'Default
V..:Second Local Minimum
(Jllser Defined
View Cutoff
Upon clicking "View Cutoff' button, a new tab opens in the web browser with a page that displays a
drop-down that allows the user to set the susceptibility cut-off using the first local minimum and the
identified ortholog candidate, the second local minimum and the identified ortholog candidate, or by the
"User defined cut-off' (where the user selects the cutoff). To update the cut-off in the Level 1 data report
and/or close the cutoff tab and return to the Level 1 page, click "Update Cut-off' button.
Note: The user should have a justification for changing the susceptibility cut-off, either based on
evaluation of Ortholog cutoffs in the data visualization or from empirical evidence.
27
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Upon selecting the User defined cut-off from the dropdown, the Enter Cut-off text box becomes active
and the user can enter a number 1-100.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level 1 Susceptibility Cut-off
Local mini mums are identified and susceptibility cut-off is set based on % similarity of next ortholog candidate. Use up-date cut-off button to go back to Level 1 data.
SeqAPASS ID: 581 Query Accession: MP 000116.2 Ortholog Count 281 Protein and Taxonomy Data: 01/04/2017
Query Species: Homo sapiens BLAST Version: 2.5.0
Query Protein: estrogen receptor isoform 1 Software Version: 2.0
Select Cut-off: j Default: Identify 1st local minimum and find next ortholog candidaj *" | Enter Cut-ofF"
Update Cut-off
Density Plot
Cut-off Based on Ortholog Candidates
Cut-off
Susceptibility
*
Cut-off
1
33.9
2
63.6
a
S Density
S Local Max
A
S Local Min
¦ Inflection Point
/ \
\
{
r
\
£ £
Percent Similarity
All potential susceptibility cut-offs generated by the data distribution and ortholog candidate
identification are reported in the table with columns "Cut-off #" and "Susceptibility Cut-off', The user
can use these numbers to define a cut-off if empirical evidence suggests that the "Default" or "2
minimum" are not supported.
28
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Note: In the case that 0 orthologs are identified from the hit data, the cutoff will be set by the local
minimums only, therefore the susceptibility prediction will NOT take into account ortholog candidates. It
is recommended that the user checks the full report for Ortholog candidates or identifies a different
query sequence for the susceptibility predictions. Here, the susceptibility predictions will be
highlighted in dark pink in the Level 1 data table to indicate that 0 orthologs were detected and the
susceptibility cutoff was determined from plotting the distribution of percent: similarities and identifying
the local minimums.
Level 1 Query Protein Information
Susceptibility Cut-off
Pr*n»u Rfiort S«
i* u
Level 2 Query Domain
View Level 2 Data
Level 3 Query Amino Acid Residuefs)
View Level 3 Data
0 Par.tti HS PKMCT seqjeoce
a ——
Level 1 Data - Primary
Search: Ents
29
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
No Orthologs Detected
If no orthologs are detected from reciprocal best hit blast analysis "Ortholog Count" will be "0" at the top
of the "Level 1 Query Protein Information" page. Further, dark pink will highlight the entire query/hits
table indicating that the "Susceptibility Prediction" columns were determined from the local minimums
identified in the "View Cutoff' density plot, without consideration of orthologs.
Note: De-select the "Show Only Eukaryotes" checkbox to see if prokaryotes were identified as orthologs.
Level 1 Query Protein Information
Use the main button to go back to the SeqAPASS Reports list
Ortholog Count 0 NCBI Data: 02/01/2015
By clicking on the "View Cutoff' button when no orthologs are detected, the "Cut-off #" and
"Susceptibility Cut-off columns will report only the local minimum values.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Loq out
Level 1 Susceptibility Cut-off
Local minimums are identified and susceptibility cut-off is set based on % similarity of next ortholog candidate. Use update cut-off button to go bac* to Level 1 data
Se<(ARASS ID: 19 Query Accession: CAA. 74 340.1 Ortholog Count 0 NCBI Data: 02/01/2015
Query Species: Bubalus bubalis
Query Protein: insulin receptor
Select Cut-off: I DefBult: Identify 1st local minimum and find next ortholog candidal * | Enter Cut-off
Density Plot
Cut-off j Susceptibility
# Cut-off
* <£»
Percent Similarity
Hit proteins are identified for the following query protein.
SeqAPAS S ID: 19 Query Accession: CM74340.1
Query Species: Bubalus bubalis
Query Protein: insulin receptor
Cut-off Based on Ortholog Candidates
Hi Density
¦J Local Max
r
\
H) Local Min
i
\
¦ Inflection Point
J
\
/
*
\
1
/
\
/
f
/
30
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
From the "Level 1" page the user can return to the list of completed SeqAPASS runs by clicking the
"Main" button on the upper left-hand side of the "Level 1 Query Protein Information" page.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Home Request SeqAPASS Run
SeqAPASS Run Status View SeqAPASS Re
>orts Settings
SeqAPASS Reports
Version 2.0
Logged in as: LaLone.Carlie@epa.gov
Main Level 1
Level 1 Query Protein Information
Hit protein; are identified for thefollowing query proti
SeqAPASS ID: 1570 Query Access*
Query Species: Homo sapiens
Query Protein: peroxisome proliFerator-sctivstsd re
Susceptibility Cut-off
Primary Report Settings
sd oyTsxonomic Group
Species Read-Across*
EB
Update Report Use Default Settings
Level 2 Query Dom:
NCBI Conserved Domain PataCase
Level 3
el 3 Query Amino Acid Residue(s)
NCBI Taxonomy Datafrsse
Choose Tsx
15=5
View Level 3 Data
31
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 2: Functional Domain(s) Alignment
In the "View SeqAPASS Reports" tab, on the "Level 1 Query Protein Information" page, there is a
"Level 2" box for comparing hit domains to the query domain. In the "Level 2" box, there is a link out to
the "NCBI Conserved Domain Database" for the query protein of interest. Below this link the user will
find a drop-down containing functional domains associated with the query sequence for comparison
across species.
Main Level 1
Level 1 Query Protein Information
H
proteins are identified for the following queiy protein Use the mair
button to go bade to the SeqAPASS Reports list
SeqAPASS ID: 581
Querv Accession. NP 000116.2
Ortholog Count 281 Protein and Taxonomy Data: 01/04/2017
Query Species: Homo sapiens
BLAST Version: 2.5.0
Query Protein: estrogen receptor isoform 1
Software Version: 2.0
Susceptibility Cut-off
Level 2
Level 3
Level 2 Query Domain
Level 3 Query Amino Acid Residue(s)
NCBI Protein Database
li;l
Second Local Minimum
Functional Domains
Select Template Sequence
{Jlisef Defined
! -Select Domain -
Additional Comparisons (optional)
View Cutoff
View Level 2 Data
I
NCBI COBALT
Primary Report Settings
inter Level 3 Run Name
j -Select Completed Domain -
| |
View Level 2 Data
NCBI Taxonomy Database
Sorted by Taxonomic Group
[d.„
Choose Taxonomic Groupjs)
| All Groups
Use table below to select sequences
¦ .—,
0 species selected
Spedes Read-Across:
Yes -
Request Residue Run
Update Report
Use Default Settings
View Level 3 Data
Choose Query to View
-Select Level 3 Run Name -
View Level 3 Data
In the drop-down box (below the words "Functional Domains") the user will find all domains associated
with the query protein listed in the NCBI Conserved Domains Database. To compare a domain from the
query protein to domains of the hit proteins, the user will use the drop-down to highlight a domain and
click the "Request Domain Run" button.
Note: Domains in the drop-down are listed with the first amino acid residue position that aligns with the
NCBI curated domain in parenthesis, followed by the NCBI domain Accession, domain name, and
description.
Level 2
j
Level 2 Query Domain
NCBI Conserved Domain Database
Functional Domains
NCBI P
Select
-SelectDomain- : '
I
i! r
(345) cd06l57. NR_LBD. The ligand binding domain of nuclear re|—j
(185) cd069l6. NR_DBD_like, DNA-binding domain of nuclear rec
(341) cd06929, NR_LBD_F1, Ligand-binding domain of nuclear re
(343) cd06930. NR_LBD_F2. Ligand-binding domain of nuclear re
(316) cd06931. NR_LBD_HNF4_like. The ligand binding domain o
' ! ¦" J *
Note: The user can also use the text box on the top of the drop-down to search the "Functional Domain"
list in the drop-down.
32
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
It is recommended that the user click on the "NCBI Conserved Domains Database" link to identify which
domains are "Specific hits" in the NCBI Conserved Domains Database. On the NCBI page, the user can
scroll over the graphical representation of the domains associated with the query sequence to highlight
and identify the Accession associated with domain "Specific hits," The example below shows the user
hovering over the NR_LBD_ER domain with the computer mouse.
Conserved
Domains
SH3 r J
NewSearch
Structure Home
3D Macromolecular Structures
Conserved Domains 1 Pubchem 1 BioSystems |
Conserved domains on [gi 6282i?94jrefiNP_ooo 116 2]]
estrogen receptor isoform 1 [Homo sapiens]
View Concise Results T ©
Graphical summary
O Zoom to residue level I
Specific hits
Superf aniiies
Binding sits
DNfl binding site
R_DBD_like superfam
Dest_recep superfanily
ligond binding site t j
'' jcoactiygtor recognitiof
NFT
Search for similar domain architectures
List of domain hits
gi Name Accession
H NR_LBD_ER Ligand binding domain of Estrogen recei
[Specific hit, evalue = 1.46e-
146]cd06949, Ligand binding domain
of Estrogen receptor, which are
activated by the hormone 17beta-
estradiol (estrogen) ;The ligand binding
domain (LBD) of Estrogen receptor
D - - (ER): Estrogen receptor, a member of I
•- nuclear receptor superfamily, is activated by the hormone
estrogen. Estrogen regulates many physiological
Ligand binding domain of Estrogen receptor, which are activated by the hormone 17beta-estradiol (estroge processes induding reproduction bone integrity,
Estrogen receptor, a member of nuclear receptor superfamily, is activated by the hormone estrogen. Estrog ; cardiovascu|ar health and behavior The main
bone integrity, cardiovascular health, and behavior. The main mechanism of action of the estrogen receptor mechanlsm of action ofthe estrogen receptor is as a
element of target genes upon activation by estrogen and then recruiting coactivator proteins which are resp transcriptlon fectof Cy [,lndirl810 the estrogen response
may associate with other membrane proteins and can be rapidly activated by exposure of cells to estrogen e|emen( of ,a , activation by estrogen and .
ligand-activated transcription factors, ER has a central well conserved DNA binding domain (DBD), a variat ••• •
binding domain (LBD). The C-terminal LBD also contains AF-2 activation motif, the dimerization motif, and part ofthe nuclear localization region. Estrogen receptor has C
linked to aging, cancer, obesity and other diseases.
lue
SB
Pssm-ID: 132747 Cd Length: 235 Bit Score: 426.07 E-value: 1.46e-146
10 20 30 40 50 60 70 80
I * I * I * I * I * I *...,1 * I
gi 62821794 310 LTA3>QMVSALLDAEFPILYSEYDPIRPFSEASMMGLLTJILADRELVHMINWAKRVPGF7DLTLHE>QVHLLECAWLEILMI 389
Cdd:cd06949 1 LSAEQLISAI2XAEFFHIYSEYDPTRPFTEASLMMLLTNLADREI.VHMINKAKKIPGFVDLSLHDQVHLLESAWLELLML 80
90 100 110 120 130 140 150 160
gi 62821794 390 GLVWRSMEHFGKLLFAPNT.TT.DRNQGKCyEGMVEIFDMLLATSSRFRMMNLQ'GEEF/CLKSIILLHSGVYTF133TLK5L 469
Cdd:cd06949 81 GLVWRSKEH PGKLLFAPDT 1,T DRMQSSCVEa^VE I FDMLL&IA5RFPJXQLQRE1E YV'CLKM ILLHSSVYTF LLESL 157
170 180 190 200 210 220 230
* I * I *....| *.... I I * I....* I *...
gi 62821794 470 EEKDHIHRVLDKITmiaLM&KA3LTLQ^2HQRLAQLLLILSHIRHMSIIKG>^EHLYSKKCKKV'.rPLYDLLLEHLDAH 547
After identifying the domain(s) of interest and the corresponding starting residue and domain Accession,
the user can return to the SeqAPASS tool, scroll to the domain of interest in the drop-down. If that
domain has not been previously run by the user, the "Request Domain Run" button will become active
and the user can click it to submit the domain query.
Level 2
Level 2 Query Domain
NCBI Conserved Domain Database
Functional Domains
[(310) cd06949, NR_LBD_ER, Ligand bindirj,-
Request Domain Run
View Level 2 Data
Choose Domain to View
|-Select Completed Domain- I -
View Level 2 Data
33
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
When user clicks the "Request Domain Run" button, the following message will appear if the runs has
been submitted successfully.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Lo
gout
J Level 2 Run Requested
Status ^queued
Home
Request SeqAPASS Run
SeqAPASS Run Status
View SeqAPASS Reports
Settings
When sequence comparisons have completed for the selected functional domain, the domain will be
present in the drop-down in the View Level 2 Data area. The drop-down is not automatically populated
with the completed domain ran. The user must click on the "Level 1" button to update the page for
the newly completed domain to present itself in the Choose Domain to View drop-down.
Home Request SeqAPASS Run
SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports
Version 2.0
Logged in as: LaLone.Carlie@epa.gov
Main Level 1
Lo view a completed Level 2 Domain, highlight the domain of interest in the drop-down box and click the
"View Domains" button. Lhis will bring the user to the "Level 2" data page for the selected query
protein/domain.
Note: The user can also use the text box on the top of the drop-down to search the "Completed Domain"
list.
Level 2
Level 2 Query Domain
NCBI Conserved Domain Database
Functional Domains
-Select Domain - ;
View Level 2 Data
Choose Domain to View
-Select Completed Domain -
NCBI F
Select
NCBI C
Enter L
(310) cd06949, NR_LBD_ER. Ligand binding domain of Estrogen r
Level 2
Level 2 Query Domain
NCBI Conserved Domain Database
Functional Domains
|-SelectDomain- | *
View Level 2 Data
Choose Domain to View
[(310) cd06949, NR_LBD_ER, Ligand bindir|j;
View Level 2 Data
34
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View Level 2 Data Page
The "Level 2 Query Domain Information" box contains the SeqAPASS Run ID, Query Accession,
Ortholog Count (# of hits identified as ortholog candidates to the query species protein sequence), NCBI
Data updates ("Protein and Taxonomy Data:" and "CDD Data:" display the dates that NCBI databases
were downloaded and incorporated into the SeqAPASS database; "BLAST version:;" and "Software
Version:" displays the version being used by the SeqAPASS tool for the selected data), Query Species,
Query Domain (with link out to NCBI domain page), Query Protein name.
Home Request SeqAPASS Run
SeqAPASS Reports
SeqAPASS Run Status
View SeqAPASS Reports Settings
Version 2.0
Logged in as: LaLone.Carlie@epa.gov
Level 2 Query Domain Information
Hit domains are identified for the following query domain. Use the main button to go bacx to the SeqAPASS Reports list.
SeqAPASS ID: 581 Query Accession: NP 000116.2 Ortholog Count: 23
Query Species: Homo sapiens
Query Domain: (310) od0e949 . NR_LBD_ER , Ligarul binding domain of Estrogen receptor, which are activated by the
Query Protein: estrogen receptor isoform 1
Protein and Taxonomy Data: 01/04/2017
BLAST\fersion: 2.5.0
17beta-estradiol (estrogen) CDD Data: 04/25/2015
Software Version: 2,0
Susceptibility Cut-off
^Default
V JSeoond Local Minimum
©User Defined
Primary Report Settings
Sorted by TsxonomicGrDup: j class
Species Read-Across:
Update Report
Use Default Settings
The default "Level 2" table is the "Primary Report", which includes query domain information in the first
row below the column titles, followed by hi I domains whose sequences aligned with the selected query
domain. The hit domains are ordered from the highest to lowest percent similarity (Maximum percent
similarity =100%). For each hit domain, Data Version, NCBI Accession and species information is
provided, including the "Protein Count" which indicates the number of protein records per species in the
NCBI protein database, taxonomic information, and species names. Also included are the NCBI accession
for the query protein, query protein name, Domain Type, BLASTP bitscore (describes overall quality of
the alignment, See NCBI BLASTp tutorials), and Domain percent similarity ([hit bitscore/query
bitscore] *100). If the hit protein has been identified as an ortholog candidate (using reciprocal best hit
BLAST method), it will be noted with a "Y" for yes or if not an ortholog candidate, a "N", for no. A
prediction of susceptibility is displayed based on the susceptibility cut-off, identified with a "Y" for yes or
an "N" for no. The date/time the analysis was completed is also identified. (See "Search, View, and
Download Data Tables" section of user guide for more information). Additionally, there is a column that
identifies if the species is a eukaryote, noted with a "Y" for yes or alternatively a "N" for no if the hit is a
prokaryote.
35
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Default highlights identify partial protein sequences, sequences with a bitscore higher than the query
domain and therefore percent similarity greater than 100% (commonly synthetic constructs), and when
zero ortholog candidates are identified (in this case a user should consider a different query sequence).
Additionally, the default setting for the report shows only eukaryote data, excluding prokaryote data from
the table with the "Shows Only Eukaryotes" checkbox checked. To view prokaryote data deselect this
checkbox.
SJprimary Report
".'Full Report
Partial Hit Protein Sequence
Show Only Eukaryotes
Level 2 Data - Primary
Search:! Enter keyword
Filtered
Group
Scientific Name 0
HP 000116.2
estr&nsn receptor isoform 1
XP 005552209.1
crab-eating macaque
PREDICTED: estrogen receptor isoform X1
Chlorocebus ai
estrogen receptor aloha
PREDICTED: estroner
NP 001158059.1
estrogen receptor
XP 011922091.1
sooty mangabey
PREDICTED: estrogen receptor
estrogen receptor aloha
XP 014992596.1
PREDICTED: estrogen receptor is
XP 003255939.1
PREDICTED: estrogen receptor isoform X3
PREDICTED estrogen receptor isoform X2
(1 of 81)
1 23456789 10
Top of Page
100 Download Table:
36
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 2: Primary Report Settings
Default settings:
The "Primary Report Settings" box allows the user to view default settings on the table below and
manipulate certain settings. The "Primary Report Settings box is only available on the "Primary Report"
display. The default settings show data for hits whose E-value are < 10. The default setting for the "Sorted
by Taxonomic Group" is "class," therefore the "Filtered Taxonomic Group" column in the table is set to
identify and report the taxonomic lineage of "class" from the NCBI Taxonomy Database. However, if
class is not identified in the NCBI Taxonomic Hierarchy associated with the hit accession, then the
algorithm will report the next available Taxonomic Group moving from class to subclass, to superorder,
to order, to suborder, to superfamily, to family, to subfamily, to genus. Finally, the susceptibility
predictions are set by using species read-across. (Please view Documentation Section of the User Guide
for details on Read-Across settings). Briefly, Species Read-across is used to set the susceptibility
prediction, where all ortholog candidates are Susceptible = Y; all species listed above the susceptibility
cut-off are Susceptible = Y; all species below the cut-off from the same taxonomic group of one or more
species above the cut-off are Susceptible = Y; and those below the cut-off that are not ortholog candidates
and do not belong to a taxonomic group above the cut-off are Susceptible = N.
Primary Report Settings
Sorted by Taxonomic Group: class
Species Read-Across:
Update Report Use Default Settings
Changing Default Settings:
The user may choose to change the level of the taxonomic hierarchy that is used for the susceptibility
prediction. From the Sorted by Taxonomic Group" dropdown the user may choose to display a different
taxomic group in the "Filtered Taxonomic Group" column of the data table.
Primary Report Settings
E-value:
Update Report
Sorted by Taxonomic Group: order
Species Read-Aaoss:
class
subclass
superorder
suborder
superfamily
family
subfamily
genus
37
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
If the user chooses "order" for example, the "Filtered Taxonomic Group" column in the data table will
report the taxonomic lineage of "order" from the NCBI Taxonomy Database and all species read-across
for the susceptibility prediction will be based on order instead of class. As described previously, if order
is not identified in the NCBI Taxonomic Hierarchy associated with the hit accession, then the algorithm
will report the next available Taxonomic Group moving from suborder, to superfamily, to family, to
subfamily, to genus. Upon selecting the Taxonomic Group from the dropdown and clicking "Update
Report," the Level 1 Data for the Primary report will update to the selected taxonomic level.
Level 2 Query Domain Information
Hit domains are identified for the following query domain. Use the main button to go bade to the SeqAPASS Reports list.
SeqAPASS ID: 581 Query Accession: NP 000116.2 Ortholog Count: 281
Query Species: Homo sapiens
Query Domain: (310) cd06949 , MR_LBD_ER , Ligand binding domain of Estrogen receptor, which are activated by the hormone 17beta-estradiol (estrogen)
Query Protein: estrogen receptor isoform 1
Susceptibility Cut-off
® Default
I"
OSecond Local Minimum
I 'User Defined
, , . ,
View Cutoff
Primary Report Settings
E-value:
Sorted by Taxonomic Group:
C'2 £'
Species Read-Across:
[y« |-|
Update Report
Use Default Settings
' •*\ Partial Hit Protein Sequence
j~ Penert Stmt!«r tv > 100%
Primary Report —
@ Show Only Eufcaryotes
Level 2 Data - Primary
Search^ Enter keyword
Data
Version
NCBI Accession 2
Protein Species
Count 0 Tax ID s
Taxonomic
Group c
Filtered
Taxonomic Scientific Name 5
Group 0
2
HP 000116.2
1058918
9606
Mammalia
Primates
Homo saDiens
2
XP 005552209.1
121048
9541
Mammalia
Primates
Macaca fascicular is
2
ABY64721.1
909
9534
Mammalia
Primates
Chlorocebus aethioos
2
XP 011751932.1
65086
9545
Mammalia
Primates
Macaca nemestrina
2
NP 001158059.1
77385
9555
Mammalia
Primates
Paoio anubis
2
XP 011922091.1
66739
9531
Mammalia
Primates
Cercocebus atvs
2
ABY64717.1
2004
9593
Mammalia
Primates
Gorilla oorilla
2
XP 014992596.1
86552
9544
Mammalia
Primates
Macaca mulatta
2
XP 003255939.1
38938
61853
M amma I i a
Primates
ftomascus leucooenvs
no
2
XP 008005788.1
62274
60711
Mammalia
Primates
Chlorocebus sabaeus
< I- '»
38
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
The user may also choose to turn species read across off, by using the "Species Read-Across" drop-down
and selecting "No" and clicking "Update Report." When "No" is selected, the susceptibility predictions
will only be "Y" in the table below if Percent Similarity is above the Cut-off or if the hit is identified as
an Ortholog Candidate, yes or "Y." Any hit below the cut-off will yield a susceptibility prediction of no
or "N."
Primary Report Settings
E-vslue:
Seated by Taxonomic Group:
cfdef
H
Species Read-Across:
u° Ll
Yes
Update Report
ult Settings
The user can select the "Full Report" on the "Level 2" data page, which includes the same information as
the "Primary Report" and additional information pertaining to the alignment of the protein sequence using
BLASTp and domain information. Additional information includes the NCBIPSSM ID, NCBI Domain
ID, Domain Name, number of amino acid residues in the sequence (Hit Length), the number of exact
matching amino acids between the hit and query sequence (Identity), the number of exact and similar
(similar side-chain substitutions) matches in amino acids between the hit and the query sequence
(Positives), and the expect value (Lvalue) describing the number of different alignments expected to
occur in the database search by chance. (See "Search, View, and Download Data Tables" section of user
guide for more information).
0 Partial Hit Protein Sequence
OPrimary Report bhhhbbbhhhbB
Level 2 Data - Full
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Susceptibility Cutoff Box for Level 2
The susceptibility prediction is set by identifying ortholog candidates, sequences above a defined
susceptibility cutoff, or by identifying those species below the susceptibility cut-off from an organism
class above the susceptibility cutoff. The default susceptibility cut-off is set by plotting the distribution of
percent similarities calculated for each hit protein. From this plot, the critical points are identified and the
local minimums and maximums reported. Using the ortholog candidate data, a susceptibility cut-off is
automatically determined by identifying the first ortholog candidate at an equal or higher percent
similarity than the first local minimum percent similarity. The user can view this graph by clicking the
"View Cutoff' button in the "Susceptibility Cut-off' box. Radio buttons located to the right of the
graphical display indicate which Cut-off has been applied for the evaluation of susceptibility in the report.
These radio buttons can be selected to change the cut-off in the table to the 2nd local minimum, where the
2nd local minimum is identified in the density plot and the first ortholog candidate at an equal or higher
percent similarity than the second local minimum percent similarity is used to set the cut-off. Or the user
can define the local minimum by clicking on the "User Defined" radio button. Alternatively, the user can
view the closely examine the density plot and manipulate the cut-off by clicking the "View Cutoff'
button.
Level 2 Query Domain Information
Hit domains are identified for the following query domain. Use the main button to go bade to the SeqAPASS Reports list.
SeqAPASS ID: 581 Query Accession: NP 000116.2 Ortholog Count 281 Protein and Taxonomy Data: 01/04/2017
Query Species: Homo sapiens BLAST \fersion: 2.5.0
Query Domain: (310) cd06949 , NR_LBD_ER , Ligand binding domain of Estrogen receptor, which are activated by the hormone 17beta-estradiol (estrogen) ODD Data: 04/25/2015
Query Protein: estrogen receptor isoform 1
Susceptibility Cut-ofF
• Cef9L.lt
QSecond Local Min
OlJser Defined
Software Version: 2.0
Primary Report Settings
Sorted by Taxonomic Group: | class
~H
Species Read-Across:
Update Report Use Default Settings
Upon clicking "View Cutoff' button, a new page is displayed with a drop-down that allows the user to set
the susceptibility cut-off using the first local minimum and the identified ortholog candidate, the second
local minimum and the identified ortholog candidate, or by the "User defined cut-off' (where the user
selects the cutoff). To update the cut-off in the Level 2 data report and/or return to the Level 2 page, click
"Update Cut-off' button.
Note: The user should have direct empirical evidence that species above the user defined cutoff are
susceptible via the protein of interest, or that the species below the user defined cutoff are not susceptible.
40
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Upon selecting the User defined cut-off from the dropdown, the Enter Cut-off text box becomes active
and the user can enter a number 1-100.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level 2 Susceptibility Cut-off
Local mini mums are identified and susceptibility cut-off is set based on % similarity of next ortholog candidate. Use update cut-off button to go baa to Level 2 data.
SeqAPASS ID: 581 Query Accession: MP 000110.2 Ortholog Count 281 Protein and Taxonomy Data: 01/04/2017
Query Species: Homo sapiens BLAST Version: 2.5.0
Query Domain: (310) cd06949 , NR_LBD_ER . Ligand binding domain of Estrogen receptor, which are activated by the hormone 17beta-estradiol (estrogen) CDD Data: 04/25/2015
Query Protein: estrogen receptor isoform 1 Software \fersion: 2.0
Select Cut-off:
^Jgdat^C
Density Plot
Default: Identify 1st local minimum and find next ortholog candidal" j Enter Cut-off
2 minimum: Identify 2nd local minimum and find next ortholog candidate
User defined cut-off
Cut-off Based on Ortholog Candidates
Cut-off
Susceptibility
#
Cut-off
1
56.0
$ 1.3
2
79.8
1.0
¦i Density
A
¦ Local Max
* Local Min
¦ Inftprtinn Pmrct
/
1
1
»
1
f
7j
J
£
Percent Similarity
Note: In the case that 0 orthologs are identified, the cutoff will be set by the local minimums only,
therefore the susceptibility prediction will NOT take into account ortholog candidates. It is
recommended that the user identify a different query sequence for the susceptibility predictions.
Here, the susceptibility predictions will be highlighted in dark pink to indicate that 0 orthologs were
detected and the susceptibility cutoff was determined from plotting the distribution of percent similarities
and identifying the local minimum
All potential susceptibility cut-offs generated by the data distribution and ortholog candidate
identification are reported in the table with columns "Cut-off #" and "Susceptibility Cut-off '. The user
can use these numbers to define a cut-off if empirical evidence suggests that the "Default" or "2
minimum" are not supported.
41
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Note: In the case that 0 orthologs are identified from the hit data, the cutoff will be set by the local
minimums only, therefore the susceptibility prediction will NOT take into account ortholog candidates. It
is recommended that the user identify a different query sequence for the susceptibility predictions.
Here, the susceptibility predictions will be liighlighted in dark pink in the Level 1 data table to indicate
that 0 orthologs were detected and the susceptibility cutoff was determined from plotting the distribution
of percent similarities and identifying the local minimums.
Home Request SeqAPASS Run
SeqAPASS Reports
SeqAPASS Run Status View SeqAPASS Reports Settings
Version 2.0
Logged in as: LaLone.Carlie@epa.gov
Level 2 Query Domain information
Hit domains ere identified for the following query domain. Use the main button to go t
SeqAPASS ID: 1570 Query Accession: NP 001317544.1
Query Species: Homo sapiens
Query Domain: (133) cd06S55 , NR_DBD_Ppsr. DMA-binding domain of peroxisome proliferatcr-i
Query Protein: peroxisome proliferato»-activated receptor gamma isoform 3
SeqAPASS Reports list.
Ortholog Count: 0
Protein and Taxonomy D
BLAST Version: 2.5 0
receptors (PPAR) is composed of two C4-type zinc fingers COD Data: 02/05/2016
Software Version: 2.0
Susceptibility Cut-off
® Default
t- Seoond Local Minimum
Olteer Defined
View Cutoff
Primary Report Settings
Sorted by TaxonomicGroup:
u
Species Read-Across:
Update Report
® Primary Report
OFuII Report
Partial Hit Protein Sequence
Show Only EuSraryotes
Level 2 Data - Primary
Search:! Enter keyword
Filtered
Taxonomic
Data
Version
Taxonomic
Scientific Name
Protein Name
XP 011532145
PREDICTED p
XP 015498806.
XP 017356934
Paru3 major
>lif&rator-Bctivat&d
No Orthologs Detected
If no orthologs are detected from reciprocal best hit blast analysis "Ortholog Count" will be "0" at the top
of the "Level 1 Query Protein Information" page. Further, dark pink will highlight the entire query/hits
table indicating that the "Susceptibility Prediction" columns were determined from the local minimums
identified in the "View Cutoff' density plot, without consideration of orthologs.
Main
Level 1
Level 2
Level 2 Query Domain Information
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
SeqAPASS Reports
Home
Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
Version 2.0
Logged in as: LaLone.Carlie@epa.gov
42
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
By clicking on the "View Cutoff' button when no orthologs are detected, the "Cut-off #" and
"Susceptibility Cut-off' columns will report only the local minimum values.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
uaau
Level 2 Susceptibility Cut-off
Local minimums are loentifieo ana sus^c'ibility cut-off is set basea on % similarity of next crtholog candidate. Use update cut-off button to go back to Level
SeaAPASS ID: 1570 Querv Accession: NP 001317544.1 Ortholoa Count: 0
Query Species: Homo SBpiens
Query Domain: {138) cd06965 . NR_DBD_Ppar . DMA-binding domain of peroxisome proliferator-activated receptore fPPAR) is composed of two C4-type zii
Query Protein: peroxisome pro I iterator-activated receptor gamma isofcrm 3
Protein and Taxonomy Data: 01/04/2017
BLAST Version: 2.5.0
rtcfingere CDD Data: 02/05/2016
Software Version: 2.0
Select Cut-Off. [ Default: Identify 1st local minimum and find next ortholog Candida^ " | Enter Cut-off
Update Cutoff
Density Plot
Cut-off
1 100.0
Percent Similarity
The user can return to the "Level 2" data page by clicking the "Update Cut-off' button.
Cut-off Based on Ortholog Candidates
¦ Density
¦iLocalMax
M Local Min n
¦ Inflection Point:
/
/
If
/
43
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1 and Level 2: Data Visualization
From the Level 1 or Level 2 results page SeqAPASS users can access an interactive data visualization for
both the "Primary Report" or "Full Report" by clicking on the "Visualize Data" button.
Example of Level 1 page:
Home Request SeqAPASS Run | SeqAPASS Run Status View SeqAPASS Reports I Settings
SeqAPASS Reports Version TBD
Logged in as: lalone.carlie@epa.gov
Level 1 Query Protein Information
Hit proteins are identified for the following query protein. Use the main b
SeqAPASS ID: 588 Query Accession: HP 000116.2
Query Species: Homo sapiens
Query Protein: estrogen receptor isoform 1
Susceptibility Cut-off
Second Local Minimum
Q User Defined
View Cutoff
This will open in a separate
Primary Report Settings
Sorted by Taxonomic Group: | ciass
IS
Common Domains:
Species Read-Across
is
Update Report Use Default Settings
:e Data This w
>n to go back to the SeqAPASS Reports list.
Ortholog Count 306
Protein and Taxonomy Data: 10/25/2017
BLAST Version: 2.6.0
Software Version: TBD
Level 2 Query Domain
NCBI Conseru-ea Domain Database
Functional Domains
-Select Domain -
View Level 2 Data
Choose Domain to View
[ -Select Completed Domain -
m
iew Level 2 Data
Level 3 Query Amino Acid Residue(s)
NCBI Protein Database
Select Template Sequence
Additional Comparisons (optional)
NCBI COBALT
Enter Level 3 Run Nami
NCBI Taxonomy Database
Choose Taxonomic Groupfs)
IB
Use table below to
0 species selected
select sequences
Request Residue Run
View Level 3 Data
Choose Query to View
-Select Level 3 Run Nai
View Level 3 Data
44
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Example of Level 2 page:
Home Request SeqAPASS Run
SeqAPASS Reports
SeqAPASS Run Status View SeqAPASS Reports Settings
Version TBD 3
Logged in as: LaLone.Carlie@epa.gov
Level 2 Query Domain information
Hit domains are identified for the following query domain. Use the main button to go bacx to the SeqAPASS Reports list.
SeqAPASS ID: 1042 Query Accession: NP 000116.2 Ortholog Count 305
Query Species: Homo sapiens
Query Domain: 1310) cd06949 , NR_LBD_ER , Ligand binding domain of Estrogen receptor, which
Query Protein: estrogen receptor isoform 1
Susceptibility Cut-off
f) Default
; Second Local IV
. User Defined
View Cutoff
This will open in a separate ts
Visualize Data This will open in a separate U
Protein and Taxonomy Data: 10/25/2017
BLAST Wrsion: 2.6.0
!7beta-esbadiol (estrogen) CDDData: 12/08/2016
Software Version: TBD_3
Primary Report Settings
Sorted by Taxonomic Group: j class
Species Read-Across: Yes
33
Update Report Use Default Settings
Lhe data visualization will then open in a new web browser tab, one for Level 1 and a different one for
Level 2. Lhe visualization will display for the report selected by the user on the Level 1 or Level 2 report
page and be identified as "Level One Visualization - Primary Report" or "Level One Visualization - Full
Report" and "Level Two Visualization - Primary Report" or "Level Lwo Visualization - Full Report."
Note: One report type at a time, either "Primary Report" or "Full Report," can be displayed in the
visualization tab for Level 1 and Level 2. Lherefore, if the user is viewing the "Level One Visualization -
Primary Report" page and returns to the Level 1 results page and clicks the radio button for "Full Report,"
the data visualization tab will update to "Level One Visualization - Full Report."
45
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1 Information Page
The initial page that opens upon clicking the "Visualize Data" button provides Level 1 query protein
information, including SeqAPASS ID, query protein, query species, ortholog count, and query accession
information. A link out to the NCBI protein database page corresponding to the queried accession is
available by clicking the query accession. Information on the visualization is provided in the
"Visualization Info" text box. To view the data visualization boxplots click the BoxPlot icon. The
BoxPlot will then generate below the Visualization Info box.
Level 1 Query Protein Information
SeqAPASS ID: 758 Query Accession: NP 000116.2
Query Protein: estrogen receptor isoform 1
Query Species: Homo sapiens
Ortholog Count: 306
Select to Open Information or Data Visualization
L©|
iiiJ
T I T
BoxPlot
Info
Visualization Info
The following data visualization is available for Level 1 and Level 2 data:
• BoxPlot - Boxplots depicting SeqAPASS data illustrating the percent similarity across species compared to the
query species examining the primary amino acid sequences (Level 1 Visualization) or functional domain (Level 2
Visualization).
° The open circle, o, represents the query species and closed circles, •, represent the species with the
highest percent similarity within the specified taxonomic group.
° The top and bottom of each box represent the 75th and 25th percentiles, respectively. The top and
bottom whiskers extend to 1.5 times the interquartile range.
o The mean and median values for each taxonomic group are represented by horizontal thick and thin
black lines on the box, respectively.
o The dashed line indicates the cut-off for susceptibility predictions (based on ortholog analysis).
46
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 2 Information Page
The initial page that opens upon clicking the "Visualize Data" button provides Level 2 query protein
information, including SeqAPASS ID, Query Species, Ortholog Count, Query Domain, and Query
information. A link out to the NCBI protein database page corresponding to the queried accession and
domain are available by clicking the Query Accession and Query Domain links, respectively. Information
on the visualization is provided in the "Visualization Info" text box. To view the data visualization
boxplots click the BoxPlot icon. The BoxPlot will then generate below the Visualization Info box.
Level Two Visualization - Primary Report
Level 1 Query Protein Information
SeqAPASS ID: 1042 Query Accession: NP 000116.2
Query Species: Homo sapiens
Ortholog Count: 305
Query Doinain: (310) cdC*3&49 . NR_LBD_ER , Ligand binding domain of Estrogen receptor, which are activated by the hormone 17bet3-estradio (estrogen)
Select to Open Information or Data Visualization
BoxPlot
Info
Visualization Info
The following data visualization is available for Level 1 and Level 2 data:
• BoxPlot - Boxplots depicting SeqAPASS data illustrating the percent similarity across species compared to the query species examining the primary amino acid sequences (Level 1
Visualization) or functional domain (Level 2 Visualization),
o The open circle, o, represents the query species and closed circles, •, represent the species with the highest percent similarity within the specified taxonomic group.
o The top and bottom of each box represent the 75th and 25th percentiles, respectively. The top and bottom whiskers extend to 1.5 times the interquartile range.
o The mean and median values for each taxonomic group are represented by horizontal thick and thin black lines on the box, respectively.
o The dashed line indicates the cut-off for susceptibility predictions (based on ortholog analysis).
I
47
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1 and 2 BoxPlot Page - Controls
Upon clicking the "BoxPlol" icon on either Level 1 or Level 2 Visualization Information pages, a box
the boxplot "Controls" and a box for the interactive boxplot will open, respectively.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Level One Visualization - Primary Report
Level 1 Query Protein Information
SeqAPASS ID: 1042 Query Accession: NP 000118.2
Query Protein: estrogen receptor isofarm 1
Query Species: Homo sapiens
Ortholog Count 305
Select to Open Information or Data Visualization
BoxPlot
Controls
Tsxonomic
{x-axis labels)
Petromyzortfiforrnes r|| Myitiniformes Enteropneusta IT" Gastropoda
Malacostraca • Inseota
• | Rhopaluridae « Anthoioa * Asteroidea « ' Appendicularia « Hydrozoa • Scyphozoa > Trichoplax »
• Bivahria « Branchiostomidae > Cephalopoda ¦ Priapulidae » | Ascidiacea « Lingulata • Polychaeta x
Maxillopoda ¦ Branchiopcda • Echinoidea ¦ Merostomata • Clitellata ¦ liliopsida • Eutardigrada »
«J Cubozoa • Peripatopsidae « "tricladida ¦
Chranadorea • Enoplea • Macrostomida « Trematoda • Cestoda « Diplopoda » Anopla «
¦ Testudmes > Aves « Crocodylia • Lepidosauria ¦ Amphibia » Cbondrichthyes *
Select
Species for
Legend:
Species Legend
Options:
0 Common Name
i 1 Group by Common Name
Q Scientific Name
Ortholog
Threatened
Endangered
Common Model
Candidates
Species
Species
Organisms
Optional Selections:
y
u
u
Q
Download BoxPlot.. Open Size Controls...
Boxplot
II 8 ! 111 S | O
•> ,® e\ "5. ^ c V 2
n * «> O 2®0.< Ou t5 u
>. 2 c "go 5 ^ "=
E Ui c *
ill
1 5 = '
i § I < g. w 1
2 en g;
Taxon
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Manipulating Taxonomic Groups on x-axis
The controls allow the user to edit the taxonomic groups that are displayed on the x-axis by clicking on
the "X" for the Taxonomic Group name (e.g., Aves). This action removes the selected group from the x-
axis. To the right of the "Taxonomic Groups" controls box is a drop-down that allows the user to remove
or add back taxonomic groups to the x-axis of the boxplot graphic, by deselecting or selecting check-
boxes in the dropdown. Similarly, unwanted taxonomic groups may be removed directly from the
BoxPlot by hovering the cursor over the taxonomic groups listed along the x-axis. The user will notice
that the selection arrow changes to a black arrow with a red 'x' next to it; clicking the taxonomic group
will then remove it from the BoxPlot and the "Taxonomic Groups" controls box. Additionally, that
taxonomic group will have the checkbox deselected in the "Taxonomic Groups" controls box drop-down
list.
BoxPlot
Controls
Taxonomic
Groups:
(x-axis
labels)
Select
Species
for
Legend.
Species
Legend
Options:
Enteropneusta * Gastropoda * Bivalvia
Mammalia |' Testudines * 11 Aves * 1 Crocodylia * | Lepidosauria * 11 Amphibia * | [ Chondrichthyes * 1
Ceratodontimorpha * Coelacanthiformes * Actinopteri * Cladistia * Petromyzontiformes x Myx'miformes *
Lingulata
Polychaeta * Arachnida
Branchiostomidae * Cephalopoda * Priapulidae * Ascidiacea _*
Malacostraca
* i Collembola * Enopla * Maxillopoda *
Branchiopoda * Echinoidea * Merostomata * Clitellata * Liliopsida * Eutardigrada * Monogononta *
Rhopaluridae * Anthozoa Asteroidea Appendicularia * Hydrozoa * Scyphozoa * Trichoplax *_
Chilopoda * Cubozoa * Peripatopsidae * Tricladida
Chromadorea
* Cestoda * Diplopoda * Anopla *
* Enople;
Macrostomida
j Common Name
) Scientific Name
[j3§ Group by Common Name
Optional
Selections:
Ortholog
Threatened
Endangered
Common Model
Candidates
Species
Species
Organisms
y
I J
y
y
Download BoxPlot... Open Size Controls...
Boxplot
80-
03 eo-
E
CO 50-
(D
CL
30-
I
II
fl
•
~
•
I
J
-
sl i §
~ £ § r
Hill:
! "i -6
I I t -1 :
- Arctic lamprey
- Sea lamprey
- Southern lamprey
Taxon
49
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Customize BoxPlot Legend
The user may customize the BoxPlot by adding a legend that will pinpoint species of interest on the
BoxPlot. Upon clicking the drop-down for "Select Species for Legend" in the controls box the user may
search in the text box for specific species to display in the boxplot legend. Upon identifying a species
from the drop-down menu and selecting the checkbox the species name will be placed in the boxplot
legend and a corresponding data point will be produced on the graph. The default settings display the
species common name both in the "Select Species for Legend" dropdown and on the boxplot. However, if
the species scientific name is desired, the user can select the radio button for "Scientific Name" in the
controls box for "Species Legend Options." This action will change the drop-down menu and species in
the legend to display the species scientific name.
Note: The database will take a brief moment to update the list upon changing between "Common Name"
and "Scientific Name."
Select
Species
for
Legend:
using:
C# Common
C 1 Scientific
Optional
Selections:
Downlo
Abalones * American beaver * Anna's hummingbird * Ballan wrasse * Turftey vulture * I j Zebrafish * Yesso scallop *
y Aardvark
g Abalones
~ Acorn worms
Q Adelie penguin
~ African clawed frog
Q African cotton leafwor"
»ed
s
Endangered
S pedes
~
Common Model
Organisms
~
© Aba forces
0 American beaver
¦ Anna's hummingbird
A Ballan wrasse
~ Turkey vulture
O Zebrafisi
0 Yesso scallop
co *)•
c
Q)
o -wJ
0)
CL
- "2
nj V!
"= 2
3 Q.
-S £ -O E
E- F
I i
K $
l"w Q r? cj re o Q
=. a »
P -
s
§ i ®
» s *
=> € S
SjE
fy rj rj o r;
O^-OO— o n"0 OTj 2 — 77i f
2 g & 9r
n s
ES
rs
8
rs
8
8
Q
a
aj
1
p
6
P
P
sz
o
i1
o
-Q
Cl
175
<
c
o
I
0"
I—
o
o
%
Cl
t
O
Cl
¦8 8 S -§¦§-§-§ -e_
T3 5 "s= .2 £ 2 o
.2 "p ° o 5
£ E ^ o p ° -
E 0 £ Q
50
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Change Species Display on Plot
Multiple scientific names can be represented by only one common name (e.g., Common name: Teleost
fishes; corresponding scientific names: Spinibarbus denticulatus, Sinocyclocheilus rhinocerous,
Sinocyclocheilus grahami, Sinocyclocheilus anshuiensis, Gobiocypris rarus, Thamnaconus
septentrionalis). Therefore, if a species common name that represents multiple species was used to create
the legend, and the user decides to instead select "Scientific Name," by default the boxplot legend will
change to display multiple scientific names that representing the individual common name and each
scientific name will be represented by a unique color/shape point on the plot. However, if the user selects
the checkbox "Group by Common Name" in the "Species Legend Options" control box, then the
scientific names that are represented by one common name will all display the same color/shape point on
the plot.
The user has the option of removing selected species from the legend either by removing them directly
from the "Select Species for Legend" drop-down box or by hovering the mouse directly over the species
name in the legend. The mouse will change to a black arrow with a red 'x' next to it. Clicking the name
while this arrow is displayed will remove the species from the legend and from the control box.
5T
x Spinibarbus denticulatus * Sinocyclocheilus rhinocerous * Sinocyclocheilus grahami *
Select
Species for
Legend:
Orycteropus afer afer
Haliotis diversicolor
Saccoglossus kowalevskii
Sinocyclocheilus anshuiensis * | Gobiocypris rarus * Thamnaconus septentrionalis "
Species Legend
Options:
Common Name
M Scientific Name
g Group by Common Name
Ortholog
Threatened
Endangered
Common Model
Candidates
Species
Species
Organisms
Optional Selections.
a
~
H
~
Download BoxPlot... Open Size Controls...
Boxplot
E
C/D i
©
Q_
• Orysteropus afer afer
0 Ha tiotis di versicolor
¦ Sacocglossus kowatevsltii
A Spinibarbus dentsculadis
A SinocySocheilus rhinocerous
A Sinocyiocheilus grahami
A Sinocjciocheilus anshuiensis
A Gobiocypris rarus
A Thamnaconusseplentrionalis
'—a ¦
mint
B E
g .3 E o Fo E £ £
I
o -3 g
¦= 5 m
o "S
EE"®
I I 5 s
I ? s s j! j r;
J E o "5 £
J § £ «
--£1 LO-c:- — "c r a O Si E 8 «
Hi1 !i|!sil = s!s
w _ „ £ S 2
5 " ® z «
¦a ¦% =
¦° i ¦=
3 g o
P — o m "o *a a co « ¦o -o d ¦?
a. Jz. tj co rap i _ -3
" - " E ® °
I *
S
E -
1i o
Taxon
51
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Customize the Legend to Display Species Groups of Interest
In the "Optional Selections" controls box, the user has the option of displaying "Ortholog Candidates,"
"Threatened Species," "Endangered Species," or "Common Model Organisms." Upon selecting one of
the checkboxes, red data points corresponding to species will be displayed on the boxplot. By hovering
the mouse over a single red point, a pop-up box will appear with the corresponding species name,
taxonomic ID, query protein, and percent similarity.
Note: The user can select to display either species common name or scientific name in the hover over
information box by selecting from the "Species Legend Options."
If the user selects either "Threatened Species" or "Endangered Species," clicking on an individual red dot
will open a new web browser tab and link to the corresponding species page on th US Fish and Wildlife
Service's Environmental Conservation Online System (USFWS, ECOS; e.g.,
lit1ps://ecos.fws.gov/ecp0/profile/speciesProfile?sId=1506').
Ortholog Threatened Endangered Common Model
Optional Candidates Species Species Organisms
Selections: , .—, == _,
U U iyj ~
Download BoxPlot... Open Size Controls...
Boxplot
ns *1).
1
CO M.
c
3 g CO n. £
8 2 I < I
*" is
S II 9 2 | I
"O — — c
(TOOr-O.it
5 r fi O
§ 1 < 5*5
N > £ O E ®
£ S c o O
o -E >
•3 S. < -5
5 o J o
« fi £ m
¦s s o
2|M © x
| oz J:
Taxon
52
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
BoxPlot Controls Widget for Bar Width, Zoom and Pan
By clicking the "Open Size Controls" button, a "BoxPlot Controls" widget opens that allows the user to
adjust the size of the bars on the boxplot by increasing or decreasing the "Bar Width" using the up and
down arrows. The minimum and maximum size for bars are 6 and 60, respectively. To reset the bar width
on the boxplot to default size, click the "Reset" button to the right of the "Bar Width" adjustment box in
the "BoxPlot Controls" box. The user can also Zoom and Pan the boxplot by toggling the on /off button
under the "Zoom" heading. The user can then zoom in or out by clicking the up or down arrows or
entering a number in the text box and clicking enter. To reset the zoom on the boxplot to default size,
click the "Reset" button to the right of the "Zoom" adjustment box in the "BoxPlot Controls" widget.
The pan option is available when the "Zoom and Pan"option is toggled to the "on" position, which allows
the user to click on the boxplot and drag the plot around the screen to reposition. To reset all BoxPlot
Controls to default settings click the "Reset All" button.
Note: Upon exiting out of the BoxPlot Controls widget, the Zoom and Pan options are automatically
turned off.
BoxPlot Controls
Bar Width
18:
Reset
Zoom
287 C
Reset
Zoom & Pan
on
53
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Download BoxPlot Widget
To download the boxplot, click "Download BoxPlot" button in the controls box. A "Download Boxplot"
Widget will pop up. It will be necessary to specify which type of file (SVG, PNG, or JPG,) to
downloaded by clicking on the desired radio button for "Image Type." The user may customize the
resolution of the boxplot for PNG and JPG files prior to download by altering the "Width" and "Height"
of the BoxPlot. To change "Width" or "Height," enter the desired number in the text boxes. Click
"Download Image" button to download the file. To close the "Download Boxplot" widget, click the "x"
on the top right of the widget.
Download Boxplot
Image Q (•)
Type: SVG PNG JPG
Width:
Height:
*¦ Download Image
1.687
1,050
54
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Hover-over Features in the BoxPlot
By hovering over a taxonomic group name on the x-axis of the box plot, an information box will
pop-up listing the top three species in order by highest percent similarity. If only one or two
species are represented in the taxonomic group, then only those species will be displayed.
Hovering the mouse over any of the species in the legend will generate a pop-up box with the
corresponding species name, taxonomic ID, query protein, and percent similarity. The susceptiblity cut-
off is displayed in a pop-up text box upon hovering over the dashed horizontal cut-off line.
Summary Table for Species in a Specific Taxonomic Group
By clicking on a box representing a taxonomic group in the boxplot a table will pop-up providing
summary information for that particular group. The table header will provide summary statistics (i.e.,
mean and median percent similarity), including the Taxonomic Group name, number of species
represented in the box, the overall susceptiblity prediciton for the selected taxonomic group. Data table
includes protein and species information along with metrics for evaluated protein similarity and
predicting suseptiblity. Also inlcuded in the table are columns indicating if a species belongs to a certain
group of interest (e.g., Threatened Species; Endangered Species, Model Organism). Table can be
downloaded by clicking on the icon for excel or csv file.
Interactive Visualization with Level 1 Data Page and Level 2 Data Page
The data visualization is programmed to update with changes made to the Level 1 Data page and Level 1
Data page, respectively. Therefore, if the user updates the Susceptibility Cut-off (See user guide section
Susceptibility Cutoff Box for Level 1 and Susceptibility Cutoff Box for Level 2) to the "Second Local
Minimum" or "User Defined Cut-off," the previously opened data visualization boxplot tab will update
the cut-off accordingly. Similarly, the user modifies the Primary Report Settings (See user guide section
Level 1: Primary Report Settings and Level 2: Primary Report Settings), the data visualization will
update accordingly.
Note: If the user updates the "Primary Report Settings" for "Sorted by Taxonomic Group" the boxplot
will update to display the new taxonomic group selection that is present in the "Filtered Taxonomic
Group" column in the data table. The user should be aware that manipulating the "Sorted by Taxonomic
Group" to a different level in the taxonomic lineage (e.g., from class to order; from class to genus) adds a
larger number of taxonomic groups to the x-axis. Therefore, the plot may require greater user
manipulation using the BoxPlot Controls to view the data.
55
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3: Individual Amino Acid Residue Alignment
In the "View SeqAPASS Reports" tab, on the "Level 1 Query Protein Information" page, there is a
"Level 3" box for setting up the query for comparing individual amino acid residues to a template
sequence. It is anticipated that the choice of template sequence and residues that are selected to align will
be derived from the published literature in most cases. Publications evaluating homology models, protein
crystal structures, pesticide field resistance, or utilizing site-directed mutagenesis are a few examples of
the types of studies that may contain such information to guide a Level 3 SeqAPASS evaluation.
Home Request SeqAPASS Run
SeqAPASS Reports
SeqAPASS Run Status View SeqAPASS Reports Settings
Version 2.0
Logged in as: LaLone.Carlie@epa.gov
Main Level 1
Level 1 Query Protein Information
Hit proteins are identified for the following query protein. Use the mair
SeqAPASS ID: 581 Query Accession: MP 000116.2
Query Species: Homo sapiens
Query Protein: estrogen receptor isoform 1
k to the SeqAPASS Reports Ik
Ortholog Count 281
Protein and Taxonomy Data: 01/04/2017
BLAST Version: 2.5.0
Software Version: 2.0
Susceptibility Cut-off
^Default
Cj Second Local Minimum
QUser Defined
View Cutoff
Primary Report Settings
Sorted by Taxonomie Group [ class
13
Common Domains:
Species Read-Across:
lv« M
Update Report Use Default Settings
Level 2 Query Domain
NCBI Conserved Domain Database
Functional Domains
[ -Select Domain -
s
View Level 2 Data
Choose Domain tc
| -Select Completed Domain -
i
ze
View Level 2 Data
Level 3 Query Amino Acid Residue(s)
NCBI Protein Database
Select Template Sequen
Additional Comparisons (optional)
NCBI COBALT
Enter Level 3 Run Name
NCBI Taxonomy Database
Choose Taxonomie Groupls)
| All Groups
Use tabie below to select si
0 species selected
Request Residue Run
View Level 3 Data
Choose Query to View
I -Select Level 3 Run Name -
View Level 3 D
In the "Level 3" box, there is a link out to the "NCBI Protein Database" for identifying the template
sequence of interest. Below this link the user will find a text box where the user can enter an NCBI
Protein Accession with the version number (e.g., NP_000116.2) or a FASTA formatted sequence (e.g., <
>gil62821794lreflNP_000116.21 estrogen receptor isoform 1 [Homo sapiens]
MTMTLHTKASGMALLHQIQGNELEPLNRPQLKIPLERPLGEVYLDSSKPAVYNYPEGAAYEFNA
AAAANA
QVYGQTGLPYGPGSEAAAFGSNGLGGEPPLNSVSPSPLMLLHPPPQLSPFLQPHGQQVPYYLENE
PSGYT
VREAGPPAFYRPNSDNRRQGGRERLASTNDKGSMAMESAKETRYCAVCNDYASGYHYGVWSC
EGCKAFFK
RSIQGHNDYMCPATNQCTIDKNRRKSCQACRLRKCYEVGMMKGGIRKDRRGGRMLKHKRQRD
DGEGRGEV
GS AGDMR A ANT WPSPT MTKRSKKNST AT SI TADOMVS AT T .DAFPPIT YSFYnPTRPFSF.ASMMG
LLTNLA
DRELVHMINWAKRVPGFVDLTLHDQV).
56
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Upon clicking on in the "Select Template Sequence" text box, a pop-up message will appear to provide
examples for the proper format of Accessions or FASTA files to be entered. A link out to the NCBI
Protein Database is available for the user and found above the template entry text box.
Level 3 Query Amino Acid Residue(s)
Level 2 Query Domain
Select Template Sequence
-Enter NCBI Protein Accession OR FASTA Sequence-
Examples:
NP_QQ0116.2
OR
>Sequence description in first line
MTMTLHTKASGM ALLHQIQGN ELEPLNRPQLKIPLERPLGEVYLDSSKPAVY
View Level 2 Data
Additional Comparisons (optional)
NCBI Taxonomy Database
Choose Taxonomic GroupTs)
All Groups
Request
View Level 3 Data
Choose Query to View
Level 3
Level 2
Additional sequences can (this is an optional field the user can choose to fill in) also be incorporated into
the Level 3 alignment using the "Additional Comparisons (optional)" text box. Upon clicking on the
"Additional Comparisons (optional)" text box, a pop-up message will appear to provide examples for the
proper format of Accessions or FASTA files to be entered.
Note: In the "Additional Comparisons (optional)" text box, zero or more NCBI Protein Accession must
be entered prior to FASTA sequence(s) if they are to be included in the Level 3 alignment.
Level 2 Query Domain
NCBI Conserved Domain Databa;
Functional Domains
I -Select Domain -
View Level 2 Data
Level 3 Query Amino Acid Residue(s)
NCBI Protein Database
Select Template Sequena
Additional Comparisons {optional)
-Enter 0 or more NCBI Protein Accession(s) followed by 0 or more FASTA Sequence(s)-
Examples
NP_000116.2
1JLY_A
>Sequence description of first FASTA
MTMTLHTKASGMALLHQIQGNELEPLNRPQLKIPLERPLGEVYLDSSKPAVY
>Sequence description of second FASTA
XAGLPVIMCLKSNNHQKYLRYQSDNIQQYGLLQFSADKILDPLAQFEVEPSKTYDGLV
NCBI COBALT
inter Level 3 Run Name
tCBI Taxonomy Database
Choose Taxonomic Group(s)
^Jse table below to select sequs
0 species selected
Request
Residue Run
View Level 3 Data
Choose Query to View
I -Select Level 3 Run Nam
View Level 3 Data
57
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Below the text box where the user can choose to add additional sequences for comparison, is a link to
NCBI COBALT (Constraint-based Multiple Protein Alignment Tool). The NCBI COBALT allows the
user to align multiple sequences and is the alignment tool that SeqAPASS algorithms utilize to set up the
query of individual amino acid residues across species.
Note: The user does not need to use the COBALT link to run a Level 3 evaluation, however the link is
available in case the user chooses to further evaluate or compare multiple potential template sequences.
Under the text "Enter Level 3 Run Name," there is a text box where the user can enter a user defined
name for the run. The user may only enter letters or integers as text for the name. The user defined name
will appear in the "View Level 3 Data" dropdown upon completion of the Level 3 sequence alignment.
Level 3
Level 3 Query Amino Acid Residue(s)
CEI Frctsi" ^stsbsos
Select Template Sequence
Additional Comparisons {optional)
NCBI COBALT
Enter LevEl 3 Run Nams
Ma
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
To complete the set-up for a Level 3 query the user must select which sequences to compare to the
identified template sequence. Listed in the Choose Taxonomic Group(s) drop-down are all Taxonomic
Groups that were identified as hits in the "Level 1" primary amino acid sequence alignment data. Because
COBALT is used to align all sequences that are selected, it is recommended that the user selectively
identify sequences from the hit table below to align. For example, selecting sequences with low similarity
to the template sequence along with sequences sharing high similarity to the template sequence can skew
the alignment because COBALT is trying to align all of the sequences together. It is recommended that
the user select sequences by first selecting a taxonomic group from the "Choose Taxonomic Group(s)
drop-down. The user can also use the NCBI taxonomy link to type in the name of the "Taxonomic
Groups" found in the drop-down to look up which species fall in that group.
Level 3 Query Amino Acid Residue(s)
Select Template Sequence
Additional Comparisons (optional)
NCBI Taxonomy Database
Choose Taxonomic Group(s)
All Groups
Level 3
Aetin opted
Amphibia
Anthozoa
Appendicularia
Arachnida
Note: llie "Choose Taxonomic Group(s):" drop-down will display the level of the taxonomic hierarchy
being displayed in the "Filtered Taxonomic Group" column of the Level 1 Data table. For example, if the
user changes the default option from "class" to "order," then "order will be displayed in the dropdown.
Susceptibility Cut-off
^Default
'.-¦'Second Local Minimum
©User Defined
Primary Report Settings
Sorted by Taxonomic Group: [ order J * j
Common Domains:
Species Read-Across:
Update Report Use Default Settings
Level 2 Query Domain
d Domain Database
-Select Domain -
View Level 2 Data
Choose Domain to View
| -Select Completed Domain -
View Level 2 Data
Level 3 Query Amino Acid Residue(s)
NCBI Protein Database
Select Template Sequen
NP_00016.2
Additional Comparisons (optional)
NCBI COBALT
Enter Level 3 Run Name
j Order not Class
NCBI Taxonomy Database
Choose TBXonomicGroup
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
By choosing a group from the drop-down menu, the "Level 1" table below will be filtered by the selected
Laxonomic Group (see column "Laxonomic Group" in Level 1 data table). When a "Laxonomic group is
selected from the drop-down, it can take up to a few seconds for the Level 1 data table to filter
completely, depending on the size of the table. The user can then examine each hit protein in the Level 1
table and select those that they would like to compare to the template sequence. Lo select
sequences/species from the filtered Level 1 data table, the user will select the check boxes in the first
column of the table. Although it is not typically recommended, the user may also select the header check
box in the first column to select all sequences/species in the filtered table.
Note: The user can also type the "Laxonomic Group" of interest in the text search box at the top of the
drop-down for quick filtering.
Below is an example where the user selected the "Laxonomic Group" Actinopteri from the drop-down
and then selected individual sequences/species to align with the template sequence. The number of
selected species will be shown in the text above the "Request Residue Run" button.
Susceptibility Cut-off
{P Default
OSecond Local I
i^JUse* Defined
View Cutoff
Primary Report Settings
E-value:
Sorted by Taxonomic Group: | class
Common Domains:
Species Read-Across:
Update Report
Use Default Settings
Level 2 Query Domain
NCBI Conserved Domain Database
Functional Domains
| -Select Domain -
View Level 2 Data
Choose Domain to View
j -Select Completed Domain -
View Level 2 Data
Level 3 Query Amino Acid Residue(s)
NCBI Protein Database
Select Template Sequence
I Comparisons (optional)
NCBI COBALT
Enter Level 3 Run Name
j Actinopteri
NCBI Taxonomy Database
Choose TaxonomicGroup(s}
| Actinopteri
Use table below to select sequ
4 species selected
Request Residue Run
View Level 3 Data
Choose Query to View
| -Select Level 3 Run Name -
View Level 3 Data
IB
(?] Partial Hit Protein Seqi
® Primary Report
QFuII Report
Level 1 Data - Primary
Search:
Proteir
Count i
NCBI Accession
Scientific Name
Common
Protein Name
(See "Search, View, and Download Data Tables" section of user guide for more information)
Lhe user can choose to align sequences/species from multiple taxonomic groups with the template
sequence, by going back to the "Choose Laxonomic Group" drop-down and selecting another group,
which filters the Level 1 table based on the group selected, and then the user can select additional species
from the newly filtered table. As before, the number of selected species can be tracked in the text above
the "Request Residue Run" button that reads "X species selected."
60
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
When the user has selected all sequences they would like to align, then click the "Request Residue Run"
button. Upon successful submission of a Level 3 query the user will see the following pop-up message. If
submission is unsuccessful, a message will appear describing the reason for the unsuccessful submission.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Loo out
T Level 3 Run Requested
Home
Request SeqAPASS Run
SeqAPASS Run Status View SeqAPASS Reports
Settings
\
SeqAPASS Reports
Version 2.0
Logged in as: lalone.carlie@epa.gov
Lo update the View Level 3 Data, "Choose Query to View" drop-down menu with the completed Level 3
alignments, the user can click on the top left "Level 1" button.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Home Request SeqAPASS Run
SeqAPASS Reports
SeqAPASS Run Status View SeqAPASS Reports Settings
Version 2.0
Main
Level 1
Additionally, the user can check the status of the Level 3 run by clicking the "SeqAPASS Run Status" tab
and the radio button for "Level 3 Status," Lypically, Level 3 alignments complete in a few seconds. When
the Level 3 query completes and the Level 1 page has been updated, the user defined Level 3 Run Name
will be available in the "Choose Query to View" drop-down menu. After selecting the desired Run Name
from the drop-down, click "View Level 3 Data" button to view the aligned sequences and set up the
individual amino acid residue alignments with the selected sequences/species.
View Level 3 Data
Choose Query to View
-Select Level 3 Run Name -
Actinopteri
Amphibia
Chondrichthyes
COBALT vl to COBLAT v2
View Level 3 Data
Choose Query to View
Actinopteri
"TTj
View Level 3 Data
Upon a successful Level 3 query submission a pop-up message will be displayed as follows in the upper
right-hand side of the screen:
I Level 3 Run Requested
Status queued
61
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
View Level 3 Individual Amino Acid Query and Data Page
Clicking the "View Level 3 Data" button, the Level 3 data page opens. The "Level 3 Template Protein
Information" box contains the SeqAPASS Run ID, Query Accession (with link out to NCBI), Ortholog
Count (# of hits identified as ortholog candidates to the query species protein sequence), NCBI Data
(displays the date that NCBI databases and executables were downloaded and incorporated into
SeqAPASS), Level 3 Run Name (defined by user), Template Species (Entered by user in Level 3 query),
Template Protein, and Query Residues (this field is populated with residues upon selection and successful
table update).
Level 3 Template Protein Information
Individual amino acid residue;?) aligned with template sequence. Use the m
SeqAPASS ID: 1042 Query Accession: NP 000116.2
Level 3 Run Name: Actinopteri
Template Species: Homo sapiens
Template Protein: [NP_000116.2j estrogen receptor isofafm 1
Query Residues: No Residues Selected
o go bacfc to the SeqAPASS Reports lii
Ortholog Count 305
Protein and Taxonomy Data: 10/25/2017
BLAST Vers
Cobalt Data
Cobalt Vers
2.0.0
07/09/2010
on: 2.1.0
Software Version: TBD_3
Enter Amino Acid Residue Positions
Copy to Residue Li
Update Report
<18) Primary Report
Q) Full Report
Level 3 Data - Primary
keyword
Data j NCBi Accession : | Pro,e'n _f'>e°!e5 I Taxonomic Group S I Scientific Name ; | Common Name i Protein Name c
Version Count 5 Tax ID I
NP_000119.2
XP_014061037 1
XP_020570152.1
XP_021454037 1
Pimephales oromelas
Oncorhvnchus mv*iss
estrogen receptor iscform 1
Fathead minnow estrogen receptor alpha
PREDICTED: estrogen receptor isoform X2
estrogen reoeptor
estrogen receptor isoform X3
(1 Of 1)
10 v Download Table:
The user can view the "Level 3" data page, which includes the Data Version, NCBI Accession, Protein
Count, Taxonomic information, Protein Name, and date/time the Level 3 run completed. The data table
remains in order of percent similarity, with those sequences having the highest percent similarity to the
template sequence, on the top, to those with the lowest percent similarity on the bottom. (See "Search,
View, and Download Data Tables" section of user guide for more information).
62
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
For additional information on Amino Acid Residues, including definition of the acronym, the amino acid
residue name, the classification for the amino acid side chain and the size of the amino acid residue based
on molecular weight, the user can click the "Show Amino Acid Info..." button. A pop-up table, "Amino
Acid info," will be displayed providing this information.
Level 3 Template Protein Information
Individual amino acid residue{s) aligned with template sequence. Use the main button to go baoc to the SeqAPASS Reports list.
SeqAPASS ID: 1042 Query Accession: MP 000116.2 Ortholog Count 305
Level 3 Run Name: Actinopteri
Template Species: Homo sapiens
Template Protein: [NP_000116.2] estrogen receptor isoform 1
Query Residues: 351D. 353E, 362K. 364V. 394R. 524H
Show Amino Acid Info...
Amino Acid info
Select Amino Acid Residues
1M
A
2T
3M
4T
5L
6H
7T
8K
V
Q
H
® Primary Report
£) Full Report
Level 3 Data - Primary
Data
Version
NCBI Accession 5
Protein Species
Count £ Tax ID C
NP 000116.2
AAU87498.1
Protein and Taxonomy Data: 10/25/2017
BLAST Version: 2.6.0
Cobalt Data: 07/09/2010
Cobalt Version: 2.1.0
Software Version: TBD_3
ID 5
Name 5
Side Chain 0 Size 0
A
Alanine
Aliphatic
89.094
C
Cysteine
Sulfur-Contain i ng
121.154
D
AsparticAcid
Acidic
133.104
E
Glutamic Acid
Acidic
147.131
F
Phenylalanine
Aromatic
165.192
G
Glycine
Aliphatic
75.067
H
Histidine
Basic
155.156
1
Iso leucine
Aliphatic
131.175
K
Lysine
Basic
146.189
L
Leucine
Aliphatic
131.175
M
Methionine
Sulfur-Containing
149.208
N
Asparagine
Amidic
132.119
P
Proline
Aliphatic
115.132
Q
Glutamine
Amidic
146.146
R
Ajginine
Basic
174.203
S
Serine
Hydroxy 1 ic
105.093
T
Threonine
Hydroxylic
119.119
V
Valine
Aliphatic
117.148
w
Tryptophan
Aromatic
204.228
Y
Tyrosine
Aromatic
181.191
Protein Name c
estrogen receptor isoform 1
estrogen receptor alpha
XP 014061037.1
Atlantic salmon
PREDICTED: estrogen receptor isoform X2
XP_020570152.1
Japanese medaka
estrogen receptor
XP_021454037.1
Oncorhvnchus rnvkiss
Rainbow trout
estrogen receptor isoform X3
Estrogen receptor 1
(1 Of 1)
Download Table:
63
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
To obtain individual amino acid residue alignment data in the Level 3 data table, the user must use the
shuttle in the "Level 3 Template Protein Information box to select positions and amino acid residues from
the chosen template sequence to align with the sequences/species that were selected by taxonomic group.
Single letter abbreviations are used for the amino acid sequences.
G: Glycine A: Alanine S: Serine T: Threonine C: Cysteine V: Valine
L: Leucine I: Isoleucine M: Methionine P: Proline F Phenylalanine
Y: Tyrosine W: Tryptophan D: Aspartic Acid E: Glutamic Acid
N: Asparagine Q: Glutamine H: Histidine K: Lysine R: Arginine
The user can select one residue at a time by clicking and highlighting the residue of interest and then
clicking the top right arrow shuttle button to move the residue to the right-hand box for inclusion in the
alignment. Each time a residue is added to the right-hand box, the left-hand box resets itself to the 1st
residue. Or the user can select multiple residues at the same time by holding the Ctrl button, clicking on
residues, and then clicking the top right arrow shuttle button to move the residues to the right-hand box.
The user can choose to remove selected residues by using the left arrow button to clear one at a time or
the double left arrow button to remove all selected residues at once. When residues of interest (likely
defined from the literature as described above) have been selected, click the "Update Report" button,
which then updates the Level 3 Data table with the individual residue alignment data.
Select Amino Acid Residues
1M
>
2T
~
4T
5L
6H
7T
8K
9A
219Y
Update Report
Alternatively, the user can enter the amino acid positions in the "Enter Amino Acid Residue Positions"
text box (e.g., 351,353,362) and click the "Copy to Residue List" button.
Enter Amino Acid Residue Positions
351,353.362,304,394,524
Copy to Residue List
Upon clicking "Copy to Residue List" the "Select Amino Acid Residues" shuttle box is populated with
the position and residues typed. The user can then click the update Report button to produce Level 3
results in the table below.
Select Amino Acid Residues
1M *
351D
2T
~
353E
3M
362K
4T
364V
5L
l-J
394R
6H
7T
Q
524H
8K v
Update Report
Enter Amino Acid Residue Positions
I 351.353,362.364,394,524
Copy to Residue List
The individual amino acid residue alignment data will then be updated on the right most columns of the
Level 3 Data table. The user can submit a maximum of 50 individual amino acid residues from the
template sequence to compare to the other selected sequences. The individual amino acid residues will be
listed in numerical order starting with the 1st position in the template sequence to the last position in the
template sequence.
64
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3 Data - Primary Report
The default report is the "Primary Report" and can be recognized as such because the radio button for
"Primary Report" above the "Level 3 Data" table is selected.
The "Primary Report" columns for the alignment will be titled "Similar Susceptibility as Template" ("Y"
or "N" for yes or no, respectively), followed by Position 1, Amino Acid 1, Total Match 1, Position 2
Amino Acid 2, Total Match 2, Position 3, Amino Acid 3, Total Match 3.... The template sequence will
always be in the top row of the Level 3 Data table followed by the previously selected sequences. Further,
the residues selected in the shuttle will also be displayed in the top row corresponding to the template
sequence. Each Position and Amino Acid in the following rows are those corresponding to the Protein
Accession identified in that row and aligning with the template sequence. The Total Match X, describes
whether the amino acid residue matches the template based on side-chain classification and molecular
weight, "Y," for yes, or "N," for not a match to the template. The user can evaluate this data to understand
how well conserved an amino acid residue is across species or in a species of interest: to add an additional
line of evidence to support (or question) susceptibility predictions.
• Primary Report
Level 3 Data - Primary
Search:| Enter keyword
Common Name z
Protein Name z
Analysis Completed c
Similar
Susceptibility as
Template 3
Position 1
Amino Acid
Total Match
Position 2
Amino Acid
Total Mate
Human
estrogen receptor isoform 1
2018 02 27 14:37:54
Y
351
D
Y
353
E
Y
nelas
Fathead minnow
estrogen receptor alpha
2018 02 27 14:37:54
Y
320
D
Y
322
E
Y
Atlanticsalmon
PREDICTED: estrogen receptor isoform X2
2018 02 27 14:37:54
Y
316
D
Y
318
E
Y
S
Japanese medao
estrogen receptor
2018 02 27 14:37:54
Y
355
D
Y
357
E
Y
vfriss
Rainbow trout
estrogen receptor isoform X3
2018 02 27 14:37:54
Y
319
D
Y
321
E
Y
Zebrafish
Estrogen receptor 1
2018 02 27 14:37:54
Y
319
D
Y
321
E
Y
< >
(1 of 1) 1 10!^ Download Table:
65
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 3 Data - Full Report
The user may choose to view the Full Report for Level 3 data by selecting the radio button above the
"Level 3 Data" table for "Full Report." The table below will automatically update to display all of the
alignment details.
The "Full Report" columns for the alignment will be titled "Similar Susceptibility as Template" ("Y" or
"N" for yes or no), followed by Position 1, Amino Acid 1, Direct Match 1, Side Chain 1, MW1, MW
Match lTotal Match 1, Total Match 1, Position 2, Amino Acid 2, Direct Match 2, Side Chain 2, MW2,
MW Match Total Match 2, Total Match 2 The template sequence will always be in the top row of
the Level 3 Data table followed by the previously selected sequences. Further, the residues selected in the
shuttle will also be displayed in the top row corresponding to the template sequence. Each Position and
Amino Acid in the following rows are those corresponding to the Protein Accession identified in that row
align with the template sequence. The Total Match X, describes whether the amino acid residue matches
the template based on side-chain classification and molecular weight, "Y," for yes, or "N," for not a
match to the template. The user can evaluate this data to understand how well conserved an amino acid
residue is across species or in a species of interest to add an additional line of evidence to support (or
question) susceptibility predictions.
Q Primary Report 1
(i) Full Report
Level 3 Data - Full
Search: Enter keyword |
Completed 5
Similar
Susceptibility as
Template S
Position 1
Amino Acid 1
. Side Chain
Direct Match 1 Side Chain 1 ....
Match 1
MW 1
MW Match 1
Total Match 1
Position 2
Amino Acid 2
Direct M
27 14:37:54
Y
351
D
Y
Acidic
Y
133.104
Y
Y
353
E
Y
27 14:37:54
Y
320
D
Y
Acidic
Y
133.104
Y
Y
322
E
Y
27 14:37:54
Y
316
D
Y
Acidic
Y
133.104
Y
Y
318
E
Y
27 14:37:54
Y
355
D
Y
Acidic
Y
133.104
Y
Y
357
E
Y
27 14:37:54
Y
319
D
Y
Acidic
Y
133.104
Y
Y
321
E
Y
27 14:37:54
Y
319
D
Y
Acidic
Y
133.104
Y
Y
321
E
Y
(1 of 1) ^ 1100 Download Table: •
The "Direct Match X" column describes whether the hit amino acid is an exact match to the template
amino acid, providing a "Y" or "N" for yes or no, respectively. The "Side Chain X" column indicates the
side chain classification for the amino acid residue (click on "Show Amino Acid Info.. .for more
information on classifications). The "Side Chain Match X" column indicates whether the hit side chain
has the same classification as the template amino acid, providing a "Y" or "N" for yes or no, respectively.
The "MW X" column indicates the molecular weight (g/mol) of the amino acid residue and the "MW
Match X" column indicates whether the hit molecular weight has a difference in molecular weight greater
than or equal to 30 g/mol compared to the template amino acid, providing a "Y" or "N" for yes or no,
respectively. For the "Total Match X" to be "Y," both "Side Chain Match X" and "MW Match X" should
be either "Y" and Y" or one "Y" and one "N," respectively. Only if both "Side Chain Match X" and
"MW Match X" are "N" and "N," then the "Total Match X" is "N" for no. Ultimately, the Total Match 1,
2, 3, 4.... are used to inform the "Similar Susceptibility as Template" column. If there is one or more "N"
for Total Match comparing any amino acid residue to the template across a row for a given species, then
the "Similar Susceptibility as Template" is "N" for no, indicating that the hit species is predicted NOT to
have the same susceptibly prediction as the template sequence. However, if all "Total Match X" are "Y"
for yes, then the "Similar Susceptibility as Template" is "Y" indicating that the hit species is predicted to
have the same susceptibly prediction as the template sequence.
66
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Multiple Level 3 Runs Requiring the Same Amino Acid Residue Comparisons
Typically, Level 3 individual amino acid residue alignments are submitted repetitively, comparing species
from one taxonomic group at a time to the template amino acid residue(s).
Update Report
L
It Settings
Visualization
Visualize Data j This will open in s separate tab.
£ Primary Report
i : Full Report
Level 1 Data - Primary
Partial Hit Protein Sequence
@ Show Only Eukaryotes
NCBI Accession i
Protein
Species
Taxonomic
Count £
Tax ID I
Group S
Vie':.1 Level J uats
Choose Query to View
Select Level 3 Run Name
Actmopteri
Amphibia
Crocodyliadae
Lepidosauna
mammalia
Search:f Enter keyword
Testudmes
Filtered
Taxonomic
Therefore, to increase efficiency in submitting the same alignments in Level 3 over and over again, the
user can take advantage of the "Copy to Residue List" button. For the first alignment of amino acid
residues, the user would select the amino acid residues to align and click the "Update Report" button.
Select Amino Acid Residues
1M ^
351D
2T
~
353E
3M
362K
4T
364V
5L
394R
6H
I I
524H
7T
L_J
SK v
Update Report
By clicking "Update Report" the residues that were selected will be copied into the "Enter Amino Acid
Residue Positions" text box. When the user selects a new Level 3 Run Name from the same Level 1 query
accession) to view by using the "View Level 3 Data" dropdown and clicking the "View Level 3 Data"
button on the Level 1 Query Protein Information page, the "Enter Amino Acid Residue Positions" text
box will be populated with the amino acid residues selected from the previous run.
Enter Amino Acid Residue Positions
I 351,353,362,364,394,524)
Enter residue positions as a comma separated list
Copy to
Residue List
The user can keep, add, or delete, residue positions in this box and click "Copy to Residue List" button.
The amino acid residues will then be moved to the "Select Amino Acid Residues Shuttle" and the user
can then click "Update Report" to view the data in the table below.
67
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Moving Between Level 1, Level 2, and Level 3 Data Pages
As a user chooses to view Level 1, Level 2, or Level 3 data in the "View SeqAPASS Reports" tab, new
buttons become available for allowing the user to move between Levels of an analysis. Please see
snapshot below.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
Log out
Home
Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
SeqAPASS Reports Version 2.0
Logged in as: LaLone.Carlie@epa.gov
Main
Level 1 Level 2 Level 3
The user can use the "Main" button to return to the list of completed Level 1 runs and select a different
query accession to view. The "Level 1" button brings the user to the Level 1 data page, where the user can
set up queries for Level 2 and Level 3, as well as select the button to view Level 2 and Level 3 data pages.
Open Level 1, Level 2, and Level 3 pages remain open until the user selects a different run to view on the
"Main" page. Moving between tabs, such as "Home," Request SeqAPASS Run," and "SeqAPASS Run
Status", does not close the Level 1, Level 2, or Level 3 pages that have been opened.
Note: If the user logs out of the SeqAPASS tool, upon logging back in, the data will reset to default
settings. Therefore, the View SeqAPASS Reports tab will not display the "Main," "Level 1," "Level 2,"
or "Level 3" buttons, until a query is chosen and Level 2 and Level 3 pages are opened.
68
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Search, View, and Download Data Tables
The user can use the "Search" box to enter text to search the table. Further, the user can use the arrow
buttons and page numbers on the bottom of the screen to view all data and the drop-down to expand the
table to 10, 20, or 50 rows. There are also left and right scroll bars at the bottom of the tables to allow the
user to view all columns of the table.
Search using text box on top of tables:
Search: Enter keyword
T
Options for viewing data:
< 1 m 1 ~
(1 of 81)
i1234 567 8910 ~
10[T] Download Table:
All data tables in the SeqAPASS tool can be downloaded as Excel or csv files. The icons for downloading
the files are present on the bottom right-hand side of all tables. Click the icon to down load data.
Download Table:
Upon selecting a .csv file, the user can choose to save or open the file. Each file is appropriately named
by Level of the SeqAPASS evaluation and report type.
I;
View Cutoff
Default
Second Local Minimum
OUser Defined
E-value:
Sorted by Taxonomic Group:
Species Read-Across:
Update Report
Opening SeqAPASS_Level2_Primary_Report.csv
[dass
Yes I'|
Use Default Settings
You have chosen to open:
ijQj SeqAPASS_Level2 Primary Report.csv
which is: Microsoft Excel Comma Separated Values File
from: https://seqapassstage.rtpnc.epa.gov
What should Firefox do with this file?
® PartiBl Hit Protein Sequence
® Primary Report ®
H Show Only Eutaryotes
® : Open with: | Microsoft Excel (default) '
0 Save File
D Do this automatically for files like this from now on.
.evel 2 Data - Primary
OK | | Cancel |
Data
Version
NCBI Accession S
Protein Species
Count ; lax ID ;
Taxonomic
St
Filtered
Taxonomic
arch: writer keyword |jj
Scientific Name 5
Commo
Name 5 Protein Name 5
"
2
NP 0001162
1058918
9606
Mammalia
Mammalia
Homo sapiens
human
estroaen receotor isoform 1
2
ABY64719.1
649
9580
Mammalia
Mammal
a
Hylobates lar
common gibbon
estroflen receDtor alpha
2
XP 008993525.1
64219
9483
Mammalia
Mammal
a
Callithrix iacchus
white-tufted-ear marmoset
PREDICTED: estroaen receDtor isofcrm X1
2
XP 017393087.1
53175
1737458
Mammalia
Mammal
a
Cebus caDucinus imitator
white-faced sapajou
PREDICTED: estroaen receDtor isoform X1
2
XP 018884801.1
68319
9595
Mammalia
Mammal
a
Gorilla aorilla gorilla
western lowland gorilla
PREDICTED: estroaen receotor isoform X2
2
XP 003811544.1
49145
9597
Mammalia
Mammalia
Pan paniscus
pygmy chimpanzee
PREDICTED: estroaen receotor isoform X2
2
XP 003311596.1
113964
9598
Mammalia
Mammal
a
Pan troglodytes
chimpanzee
PREDICTED: estroaen receotor isofcrm X2
2
ABY64724.1
64
9510
Mammalia
Mammal
a
Ateles oaniscus
black spider rnon&ey
estroaen receotor aloha
2
XP 011852190.1
38584
9568
Mammalia
Mammal
a
Mandrillus leucoohaeus
drill
PREDICTED: estroaen receotor isoform X2
2
XP 002817538.1
44332
9601
Mammalia
Mammal
a
Ponfto abelii
Sumatran orangutan
PREDICTED: estroaen receotor isoform X2
< I
m |
~
(1 of 82) 123456789 10 - - 100 Download Table: ^
69
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Upon selecting a .xls file, the user can choose to save or open the file. Each file is appropriately named by
Level of the SeqAPASS evaluation and report type.
Primary Report Settings
<& Default
QSecond Local Minim
WUser Defined
Sorted by Taxonomic Group: |0,355
33
View Cutoff
Species Read-Across: Ml
Update Report Use Default Settings
0 pen in g Seq AP ASS_Level2_P ri m a ry_Rep o rt.xl s
You have chosen to open:
§ SeqAPASS_Level2_Primary_Report.xts
which is: Microsoft Excel 97-2003 Worksheet
from: https://seqapassstage.rtpnc.epa.gov
What should Firefox do with this file?
®F, imaiy Report
OFuII Report
Partial Hit Protein Sequence
Show Only Euicaryotes
® Open with Microsoft Excel (default)
0 Save File
CD Do this automatically for files like this from now on.
Level 2 Data - Primary
Search: Enter keyword
Protein
Filterec
Version
Count C
Tax ID S
Group 2
Taxonomic
Scientific Name -
Common Name C
Protein Name i
2
NP 000116.2
1058918
9606
Mammalia
Mammal
a
Homo sapiens
human
estroaen receotor isoform 1
2
ABY64719.1
649
9580
Mammalia
Mammal
a
Hylobates lar
common gibbon
estroaen receptor alpha
2
XP 008993525.1
64219
9483
Mammalia
Mammal
a
Callithrix iacchus
white-tufted-ear marmoset
PRFDICTED: estroaen receotor isoform X1
2
XP 017393067.1
53175
1737458
Mammalia
Mammalia
Cebus csDucinus imitator
white-faced sapajou
PREDICTED: estroaen receotor isoform X1
2
XP 018884801.1
68319
9595
Mammalia
Mammal
a
Gorilla aorilla gorilla
western lowland gorilla
PREDICTED: estroaen receotor isoform X2
2
XP 003811544.1
49145
9597
Mammalia
Mammal
a
Pan oaniscus
pygmy chimpanzee
PREDICTED: estroaen receDtor isoform X2
2
XP 003311596.1
113964
9598
Mammalia
Mammal
a
Pan troalodvtes
chimpanzee
PREDICTED: estroaen receDtor isoform X2
2
ABY64724.1
64
9510
Mammalia
MammBl
a
Ateles paniscus
blac* spider monkey
estroflen receotor alpha
2
XP 011852190 1
38584
9568
Mammalia
Mammal
a
Mandrillus leucoohaeus
drill
PREDICTED: estroaen receotor isoform X2
2
XP 002817538.1
44332
9601
Mammalia
Mammal
a
Ponoo abelii
Sumatran orangutan
PREDICTED: estroaen receotcf isoform X2
< 1
It
1
,
(1 of 82)
112 3 4 5
6 7 8 9 10 - * 100 Download Table:^ —
70
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Log out
The user can log out from any page in SeqAPASS, by clicking the "Log out" link on the upper right-hand
side of the page. If a user clicks Log out and then Logs back in, all settings will be set back to default.
User can log out at any time by clicking the "Log out" link on the upper right-hand side. Any successfully
submitted queries that were requested prior to logging out will continue running and when completed,
will be available to the user in the "View SeqAPASS Reports" tab.
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Loo out
Home Request SeqAPASS Run SeqAPASS Run Status View SeqAPASS Reports Settings
Welcome to SeqAPASS Version 2.0 Logged in as: LaLone.Carlie@epa.gov
71
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Pop-up Messages
The Spinning Wheel pop-up is used as an indicator to alert the user that an action is taking place, where
the interface of the SeqAPASS tool is contacting the backend database. For example, upon clicking the
"SeqAPASS Run Status" tab, "Refresh Data" button, "View Level 2 Data" button, or "View Level 3
Data" button the Spinning Wheel will pop-up and disappear from the screen. There are multiple other
instances where the spinning wheel is used as an indicator to the user that an action is occurring.
Querying database ... Please wait
Pop-up messages are meant to guide the user to submit the correct information for a query, inform the
user of a successful or failed query submission, or otherwise inform the user of an error. All pop-up
messages will appear for 10 seconds on the upper right-hand side of the screen, and then disappear. If the
user would like to close the message before the 10 seconds is up, click on the message and an "x" will
appear of the upper right hand corner of the message box. Click the x to close the message.
In the "Request SeqAPASS Run" tab, Compare Primary Amino Acid Sequences "By Species" page, a
successful Level 1 query submission will display a pop-up message indicating that the query has been
submitted to the run queue or if "existing' message appears indicating that the accession has been ran
previously either by a user and is available to view.
| Success
Submitted NP_064393.2:
submitted
OR
j Success
NP 000116.2: existing
mm I ' " mi
User did not select any query proteins from the "Request SeqAPASS Run" tab, Compare Primary Amino
Acid Sequences "By Species" or "By Accession" page, and clicked "Request Run" button.
Error
Must select query
proteins
OR
©
Error
Must enter NCBI
accession
72
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
If the user enters non-sense text (or any text that is not an NCBI accession) into the "NCBI Protein
Accession" text box for submitting a Level 1 query in the "Request SeqAPASS Run" tab, in the Compare
Primary Amino Acid Sequences "By Accession" page, and clicked "Request Run" button, the message
below will pop-up indicating that the Accession entered is not in the SeqAPASS database.
Success
fgafgaf. not in database
In the "View SeqAPASS Reports" tab, Level 1 page, if a user clicks "View Level 2 Data," a successful
Level 2 query submission will display a pop-up message indicating that the query has entered the run
queue.
] Level 2 Run Requested
Status queued
In the "View SeqAPASS Reports" tab, Level 1 page, if a user selects a domain that has already been
submitted (but not completed) and clicks "Request Domain Run" a message for successful Level 2 query
submission will display a pop-up message indicating that the query has entered the run queue
i Level 2 Run
Requested
Status Already run or
could not submit
In the "View SeqAPASS Reports" tab, Level 1 page, if a user clicks "View Level 2 Data" without
selecting a domain to view from the drop-down, the message below will pop-up to indicate that the user
must select a domain.
(x) Error
Must select domain from
drop-down
In the "View SeqAPASS Reports" tab, Level 1 page, a successful Level 3 query submission will display a
pop-up message indicating that the query has entered the run queue.
73
Level 3 Run Requested
Status queued
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
In the "View SeqAPASS Reports" tab, Level 1 page, if a user fails to type a user defined Level 3 Run
Name, the message below will pop-up to indicate that the user must do so.
(x) Error
You must specify a
Template Sequence and
Level 3 Run Name
In the "View SeqAPASS Reports" tab, Level 1 page, if a user fails to select species from the Level 1 Data
table to be compared with the template sequence, the message below will pop-up.
(x) Error
You must select
sequences from the
Level 1 Data table to
request a Level 3 Run
In the "View SeqAPASS Reports" tab, Level 1 page, if a user fails to select a Level 3 Run Name from the
Choose Query to View drop-down and clicks the "View Level 3 Date" button, the message below will
pop-up.
(x) Error
Must select level 3 run
from drop-down
In the "View SeqAPASS Reports" tab, "Level 3 Template Protein Information" data page, if a user fails
to select amino acid residues using the "Select Amino Acid Residues" shuttle and clicks the "View Level
3 Date" button, the message below will pop-up.
(H) No Residues Selected
User must select
residues
74
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Documentation
Query Species: The selection of the query species for a SeqAPASS analysis is dependent upon the
question the user is addressing. For example, the query species can be the target species (i.e., human or
companion animal in the case of drugs; or insect, plant, fungus, or pest in the case of pesticides) or,
depending on the application of the susceptibility prediction, the query species may be a species known or
hypothesized to be sensitive to a chemical acting on the protein molecular target of interest. There may be
instances where a protein for the species of interest has not been sequenced, in this case it may serve the
users purpose to identify another taxonomically related species from the same organism Class, Order,
Family, or Genus as a surrogate query species. In certain cases, when there is interest in the susceptibility
of a particular species (e.g., honey bee) and in the case that there are numerous potential target species
(e.g., neonicotinoids are intended to cause mortality in a number of pest insects) the species of particular
concern may serve as the query species.
Query Protein: SeqAPASS can be queried with any protein sequence available in the NCBI protein
GenBank database, by protein name, or NCBI Accession. It is suggested that the user of SeqAPASS
examines their query protein and species in the NCBI protein database prior to submitting a run to
SeqAPASS (use NCBI link on query page). It is not uncommon for a protein of a specific species to be
represented by more than one sequence. In such cases there are some guiding principles for identification
of the best sequence available for the SeqAPASS run.
• General guidelines: These guidelines describe best practices for identifying the most useful
sequence for a species susceptibility prediction in SeqAPASS, however, in some cases, limited
sequence information is available and therefore less desirable sequences may be used. It is up to
the user of SeqAPASS to recognize the quality and limitations of the sequence chosen for the
SeqAPASS query. The information about a particular protein can be found on the Protein page in
the NCBI database.
http://www.ncbi.nlm. nih. gov/protein/
Home - Protein - NCBI +
www ,ncbi .nlm .nih.gov/protein/
• e||H*
3
AJ Most Visited ft Getting Started : Customize Links Windows Marketplace
% NCBI Resources 0 How To 0
Protein
[[androgen receptor, homo sapiens]
Help
Protein
The Protein database is a collection of sequences from several sources, including translations from annotated coding
regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein sequences are
the fundamental determinants of biological structure and function.
Using Protein
Quick Start Guide
FAQ
Help
GenBank FTP
RefSea FTP
Protein Tools
BLAST
LinkOut
E-Utilities
Blink
Batch Eritrez
Other Resources
GenBank Home
RefSeq Home
ODD
Structure
75
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Search for a protein of interest using protein name and/or species of interest: For the example above,
multiple hit proteins were identified.
NCBI Resources © How To 0
Protein
[ Protein vi[androgen receptor, homo sapiens
Save search Advanced
G3E3
Help
Show additional filters
Species
Animals
Fungi
Bacteria
More ...
Enzyme types
Ligases
Oxidoreductases
Source
databases
DDBJ
EMBL
GenBank
PDB
PIR
RefSeq
UniProtKB / Swiss-Prot
Sequence length
Custom range...
Molecular
weight
Custom range...
Release date
Custom range...
Revision date
Custom range...
Display Settings: R Summary, 20 per page, Sorted by Default order
Results: 1 to 20 of 540
Send to: © Filters: Manage Filters
Page |l | c
27 Next >
~ RecName: FulNAndroqen receptor: AltName: Full=Dihvdrotestosterone receptor: AltNarne:
1. Full=Nuclear receptor subfamily 3 group C member 4
919 aa protein
Accession: P10275.2 Ol: 113830
GeriPeot FASTA Graphics Related Sequences Identical Proteins
~ androgen receptor [Homo sapiens]
2- 917 aa protein
Accession: AAA51772.1 Gl: 178882
GenPept FASTA Graphics Related Sequences Identical Proteins
3- 2 aa protein
Accession: AAD14959.1 Gl:4262811
GenPept FASTA Graphics
~ androgen-receptor [Homo sapiens]
906 aa protein
Accession: AAA51780.1 Gl: 179034
GenPept FASTA Graphics Related Sequences Identical Proteins
~ androgen receptor [Homo sapiensl
5. 917 aa protein
Accession: MA51771.1 Gl: 178872
GenPept FASTA Graphics Related Sequences Identical Proteins
~ androgen receptor [Homo sapiens]
~ Top Organisms ITreel
Homo sapiens (531)
Aspergillus niger (4)
Chlorocebus aethiops (1)
Cardiobacterium valvarum F0432 (1)
Streptococcus pneumoniae MNZ41 (1)
All other taxa (2)
More...
Find related data
Database: | Select
Search details
androgen receptor[All Fields] AND
("Homo sapiens"[Organism] OR homo
sapiens[All Fields])
Search ]
Recent activity
Turn Off Clear
androgen receptor, homo sapiens (540)
Select one of the proteins by clicking on the link shown above to see detailed information about the
protein
% NCBI Resources © How To ©
Sian in to NCBI
Protein | protein
Z3[
Advanced
Help
Display Settiims: © GenPept
androgen receptor [Homo sapiens]
GenBank: AAA51771.1
FASTA Graphics
Send to: 0
Go to: ©
LOCUS
DEFINITION
ACCESSION
VERSION
DBSOURCE
KEYWORDS
SOURCE
ORGANISM
REFERENCE
AUTHORS
TITLE
JOURNAL
PUBMED
REFERENCE
AUTHORS
JOURNAL
PUBMED
COMMENT
PRI 3l-OCT-1994
AAA51771 917 aa 1:
androgen receptor [Homo sapiens].
AAA51771
AAA51771.1 Gl:178872
locus HUMARA accession M21748¦1
Homo sapiens (human)
Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
1 (residues 1 to 917)
Tilley,W.D., Marcelli,M., Wilson,J.D. and McPhaul,M.J.
Characterization and expression of a cDNA encoding the human
androgen receptor
Proc. Natl. Acad. Sci. U.S.A. 86 (1), 327-331 (1989)
2911578
2 (sites)
Marcelli,M., Tilley,W.D., Wilson,C.M., Griffin,J.E., Wilson,J.D.
and McPhaul,M.J.
Definition of the human androgen receptor gene structure permits
the identification of mutations that cause androgen resistance:
premature termination of the receptor protein at amino acid residue
588 causes complete androgen resistance
Mol. Endocrinol. 4 (8), 1105-1116 (1990)
2293020
[2] sites; androgen resistant mutation.
Draft entry and computer-readable sequence [1] kindly submitted by
M.J. McPhaul, 09-DEC-1988.
Method: conceptual translation.
Location/Qualifiers
1..917
/organism""Homo sapiens"
Change region shown
Customize view
Analyze this sequence
Run BLAST
Identify Conserved Domains
Highlight Sequence Features
Find in this Sequence
Protein 3D Structure
Targeting The Binding
Function 3 (bf3) Site Of The
Human Androgen Receptor
PDB: 4HLW
Source: Homo sapiens
Method: X-Ray Diffraction
Resolution: 2.5 A
See all 54 structures...
Articles about the AR gene
Repression of cell proliferation and androgen
receptor activity in prostat [Anticancer Res. 2013]
TALEN-engineered AR gene rearrangements
reveal endocrini [Proc Natl Acad Sci USA 2013]
Androgen receptor (AR) positive vs negative roles
in prostate cancer cell d [Cancer Treat Rev. 2014]
Identical proteins for AAA51771.1
76
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Guiding principles: On the NCBI protein page, rows to examine include: "DEFINITION,"
"REFERENCES," COMMENTS," and "FEATURES." The information provided in these rows can aid a
SeqAPASS user in the identification of an ideal query sequence for SeqAPASS.
It is desirable to:
a. Use accessions with the following prefix: NP_
b. avoid use of protein sequences labeled "partial," "PREDICTED," "PROVISIONAL," "INFERRED,"
or "hypothetical"
c. avoid using those labeled "TPA" (Third Party Annotation), however if TPA is all that is available
"TPA: experimental" would be preferred over "TPA: inferential"
d. Look at the date associated with the protein in the "LOCUS" row of the detailed protein page. A more
recent date can have the most up-to-date annotation of the protein. Under the "DBSOURCE" row of the
detailed protein page other accessions associated with past protein sequences can be viewed. Many times
if the "xrefs" row is heavily populated and has the most recent annotation update date, it is likely to be the
best sequence to use as a query sequence in SeqAPASS.
d. Short sequences should be avoided when possible as query sequences. Many times if one selects the
protein from the protein output derived from the NCBI protein database query, they will find that the
short sequence is actually a partial sequence described in the "DEFINITION" row of the Protein page.
e. Unless there is reason for doing so (based on the question the user is trying to address), splice-variants
labeled in "FEATURES" rows of the Protein page as "alternatively spliced" would be less desirable
f. It is important to check the references associated with the selected query protein. In some cases, certain
sequences are associated with sensitivity to a given chemical. This can be particularly useful when
predicting susceptibility to pesticides, where certain strains of insects are produced to be readily sensitive
or insensitive to a chemical.
g. A secondary check of the sequence used in the SeqAPASS run would be to look at the output derived
and see whether ortholog candidates were detected. Ideally a preferential sequence would have more
ortholog candidates identified.
Important Note: To identify which query protein has the greatest number of Ortholog Candidates the user
can choose to submit multiple proteins with the same species and protein. Upon the Level 1 runs
completing for those similar proteins, the user can then select the "View SeqAPASS Reports" tab and
look at the table for "Ortholog Count" the protein with the highest number is likely to be the most
appropriate query species for a SeqAPASS evaluation.
77
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Example: Androgen receptor, Homo sapiens
511
/advanced
Display Settings: GenPept
androgen receptor [Homo sapiens]
GenBanki AAA51771.1I
FASTA Graphics
fcOBDS
DEFIMITIOM
ACCESSION
UERSIOH
DBSOUPCE
KEYWORDS
SOURCE
ORGANISM
-AAA51771 91? aa
Lndrogen receptor [Homo sapiens] .
AAA51771
AAA51771.1 H:178874
locus WMftEA accession M41748.1
PRl|31-0CT-1994 1
sapie:
JiEi"
: (tajn)
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mjiifiu 111; Eutberia; Euarchontogl i re s ; Primate s; Haplorrhini ;
fatarrhini; Hominidae; Homo.
(residues 1 to 91V)
Tilley,W.D. , Marcelli ,M. ,• Wilson,J.D. and McPhaul,H. J.
Characterization and expression of a cDHA encoding the hunan
androgen receptor
Proc. Natl. Acad. Sci. U.S.A. 86 (1), 9*7-931 (1989)
£91157$
AUTHORS
TITLE
(si
!S)
Marcelli ,M. ,• Tilley,W.D. , Wilson,C.M., triffin,J.E.
and McPhaul ,M. J.
Definition of the hunan androgen receptor gene
the identification of mutation; that cause androgen resistanci
premature termination of the receptor protein at amino acid ri
5S8 causes complete androgen resistance
Hoi. Endocrinol. 4 (8), 1105-1116 (1990)
£ 2 9 3 0 £ t
12] sites; androgen resistant mutation.
TJraft entry and computer-readable sequence [1] kindly submitted by
M.J. McPhaul, 09-DEC-1988 .
rethod: conceptual translation.
Loc ati on/Quali fi e rs
Proteir
Pegion
1..917
/organism1"Homo sapiens"
/db_xref = "taxon:9606"
/map= "Xqll. 4 - ql4 "
/sex="male"
/ti ssue_type = "prostate"
1..917 ~
/product="androgen receptor"
6..446
/region_name="Androgen_recep"
/note="Androgen receptor; pfam04166"
/ db_xref="CDD:111097"
552. .699
/re gion_name = "NR_DBD_AP"
/note="DMA-binding domain of androgen receptor (AP) is
composed of two C4-type zinc fingers; ed07173"
/db_xre f = "CUD:143547"
orde r(5 5 7,5 6 0,5 7 4,5 7 7,5 9 3,5 9 9,6 0 9 ,614)
/s ite_type-"othe r"
/note="zinc binding site [ion binding]"
/ db_xre f = " CDD: 143547"
order(566..569,576,578..579,584..583,591,606..607,610,613)
/site_type = "DHS. binding"
/note = "DHA binding site [nucleotide binding]"
/db_xre f = "CDD:143£47"
order(594..596,598..600,605,608)
/s ite_type-"othe r"
/note="dimer interface [polypeptide binding]"
/db_xre f = " CDD:143547"
670..915
/re gi on_name = "NR_LBD_AR"
/note="Ligand binding domain of the nuclear receptor
androgen receptor, ligand activated transcription
regulator; cd07073"
/db_xre f = "CDD:134758"
order(699,704..703,705..706,709,739..740,743..744,747,750,
764,778,785,871,875)
/site_type="other"
/note="ligand binding site [chemical binding]"
/db_xref="CDD:134758"
order(711,714,718,744,748,734,736,891..894,895..896)
/s ite_type = "othe r"
/note="coactivator recognition site [polypeptide binding]"
/db_xre f = "CDD:134758"
1. .917
/gene="AP"
/code d_by="H41748.1:163..4916"
/db_xref="&DB:&00-140-556"
L mevqlglgi
L qqqqqqqqqq
L echpergcvp
L ilseastmql
L svsmglgvea
L edtaeyspfk
L yynfplalag
L sgspsaaass
L trppqglagq
L rdhvlpidyy
L idkfrrkncp
L hiegyecqpi
L pgfmlhvdd
L qcvimrhlsq
L ackrknptsc
L ypkilsgkvk
yprppsktyr
qqqqqqqets
epgaavaask
lqqqqqeavs
lehlspgeql
ggytkglege
pppppppphp
smhtlftaee
esdftapdw
fppqktclic
scrlrkcyea
gnaviqyswm
efgnlqitpq
srrfyqltkl
piyfhtq
gafqnlfqsv
prqqqqqqge
glpqqlpapp
egsssgrare
rgdcmyapll
slgcsgsaaa
hariklenpl
gqlygpcggg
ypggnvsrvp
gdeasgchyg
^htlgarklk
gwcaghdnn
glrnvf amgwr
eflcmkalll
reviqnpgpr
dgspqabrrg
deddsaapst
rs gapts skd
gvppavrptp
gssgtlelps
dygsawaaaa
ypsptcvkse
altcgsckvf
klgnlklqee
qpdsfaalls
sftnvnsrml
fsiipvdglk
lhqftfdlli
hpeaasaapp
ptgylvldee
lsllgptfpg
rrylggtstis
caplaeckgs
tlslyksgal
aqcrygdlas
gggggggggg
mgpwndsysg
fkraaegkqk
slnelgerql
yfapdlvfne
nqkf f delrm
ksbmvsvdfp
gasllllqqq
qqpsqpqsal
lsscsadlkd
dnakelckav
llddsagkst
deaaayqsrd
lhgagaagpg
eagavapygy
pygoforleta
ylcas mdct
eettqkltvs
vhwkcoakal
yrmhksrmys
nyikeldrii
errmaeiisvq
Change region shown
Anal yze thi s sequence
Run BLAST
Identify Conserved Domains
Highlight Sequence Features
Find in this Sequence
Protein 3D Structure
Targeting The Binding
Function 3 (J>f3) Site Of
The Human Androgen
PDB: 4HL1W
Source: Homo sapiens
Method: X-Ray
Diffraction
2.5 A
See all 5i stncti t«...
Articles about the AR gene
Repression of cell proliferation and androgen
receptor activity in pn [Anticancer Res. 2013]
TALEN-engineered AR gene rearrangements
reveal endo [Proc Natl Acad Sci U S A 2013]
Androgen receptor (AR) positive vs negative
roles in prostate C3f [Cancer Treat Rev. 2014]
See all-
Identical proteins for AAA51771.1
androgen receptor [Homo sapiens]
Pathways for the AR gene
Integrated Breast Cancer Pathway
SIDS Susceptibility Pathways
Nuclear Receptors
Reference sequence information
RefSeq genomic sequence
Seethe genomic reference sequence for the
AR gene (NG_009014.2).
RefSeq protein isoforms
See 4 reference sequence protein isoforms
forthe AR gene.
More about the AR gene *
The androgen receptor gene is more than 90
kb long and codes for a protein that has 3
major functional domains: the N-terminal
domain, DNAb...
Aso Known As: RP11-383C12.1. AJS. DHT..
Homologs of the AR gene "
The AR gene is conserved in Rhesus
monkey, dog. cow, mouse, rat, and chicken.
Link Out to external
A selection of literature about the proteins
[GoPubMed Proteins]
Transcript/Protein Information
[PANTHER Classification System]
Transcript/Protein Information
[PANTHER Classification System]
biochemicals
[EoctAntigeri/Labome]
antibody review
antibody
cDNA clone
protein and peptide
ELI S A and assay kit
[Exact Antigen/Labome]
[Exact Aitigen/Labome]
[Exact Artigen/Laborne]
[Exact Avtigen/Labome]
[Bract Aitigen/Labome]
[Exact Aitigen/Labome]
78
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
b. If multiple proteins appear to be the best query protein for SeqAPASS, the sequences can be aligned
using NCBI's COBALT. Enter (copy and paste from NCBI protein search list) accessions and align.
'Lj cobalt
Constrat
int-based Mul
triple Alignmer
11 Tool
MyNCHI 1
Home Recent Results Help
Cobalt Constraint-based Multiple Protein Alignment Tool
COBALT computes a multiple protein sequence alignment using conserved domain and local sequence similarity information. Q) Reset page
Enter Query Sequences
Enter at least 2 piotein accessions, gis, or FASTA sequences M Clear
P1027S.2
AAAS1772.1
AAAS1780.1
AAAS1771.1
AAAS1729.1
AAD4S921.1
AAA51886.1
Or. upload FASTA file
1 Browse... | No file selected.
Job Title
Align ~ Show results in a new window
~ Advanced parameters
Alignment page will be generated
^ COBALT
Constraint-based Multiple Alignment Tool
My NCBI
Home Recent Results Help
lul [Reel
Phvloaenetic Tree Edit and Resubmit >Download
- Cobalt RID EMV7SF1X211 (7 seqs)
All queries form only one cluster. No domain information was used for generating constraints. Decreasing maximum in-cluster distance or turning off query clustering option may improve
results.
T Descriptions 0 Select All Re-align >Alianment parameters
Legend for links to other resources: ~ UniGene Q GEO E Gene S Structure Map Viewer
Accession
Description
Links
0 P10275 2
0 AAA517721
0 AMS1780.1
0 AAA51771.1
0 AAA51729.1
RecName: FulNAndrogen receptor; AltName: Full=Dihydrotestosterone receptor; AltName: Full=Nucle
androgen receptor [Homo sapiens] >gb|AAA51771.1| androgen receptor [Homo sapiens]
androgen-receptor [Homo sapiens]
androgen receptor [Homo sapiens] >gb|AAA51772.11 androgen receptor [Homo sapiens]
androgen receptor [Homo sapiens] >sp|P10275.2|ANDR_HUMAN RecName: FulMAndrogen receptor;
androgen receptor [Homo sapiens]
androgen receptor [Homo sapiens] >sp|P10275.2|ANDR_HUMAN RecName: FulMAndrogen receptor;
1*1 .M
E
EE
EE]
LiiulPubChem BioAssav Info linked to AAA51729.1
0 AAD45921.1
0 AM51886.1
M1
EPubChem BioAssav Info linked to AAA51886.1
~ Alignments 0 Select All Re-aHgn Mouse over the sequence identifer for sequence title
View Format: | Compact v w. Conservation Setting: | 2 Bits v w,
0 P10275
1
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQHPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET
80
0AAA51772
1
MEVQLGLGRVYPRPPSKTYRGAFQWLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQ-ET
79
0AA&5178O
1
MEVQLGLGRVYPRPPSKTYRGAFQIJLFQSVREVIQITPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQ ET
75
0AAA51771
1
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQ-ET
79
0AAA51729
1
MEVQLGLGRVYPRPPSKTYRGAFQNLFQSVPEVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQcfET
80
0AAD 45921
1
MEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQ ET
75
0AAA51886
1
HEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPKHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET
80
0P1O275
81
SPRQQQQQQGEDGSPQAHRRGPTGYLVLDEEQQPSQPQSALECHPERGCVPEPGAAVAASKGLPQQLPAPPDEDDSAAPS
160
0AAA51772
80
5 PRQ Q Q Q Q QGED GS P QAHRRGPTGYLVLDEE Q Q P S Q P QSALECHPERGCVPE PGAAVAASKGL P Q 0 L PAP PDEDD SAAP S
159
0AAA51780
76
SPRQOQQQQGEDGSPQAHRRGPTGYLVLDEEQQPSQPCiSALECHPERGCVPEPGAAVMSKGLPQQLPAPPDEDDSAAPS
155
0AAA51771
80
SPRQ QQ Q Q QGED GS P QAHRRGPTGYLVLDEE Q Q P 3 QPQ SALE CHPERGCVPE P GAAVAA5KGL P Q Q L PAP PDEDD SAAP S
159
79
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
To evaluate sequences, change settings for "Conservation Setting" from "2 Bits" to "Identity"
COBALT
Constraint-based Multiple Alig
nment Tool
MyNCBI I
Home Recent Results Help
^JlSmi^nURei
Phyloaenetic Tree Edit and Resubmit t>Download
- Cobalt RID EMV7SF1X211 (7 seqs)
All queries form only one cluster. No domain information was used for generating constraints. Decreasing maximum in-cluster distance or turning off guery clustering option may improve
' results.
T Descriptions 0 Select All Re-align ^Alignment parameters
Legend for links to other resources: m UniGene Q GEO E Gene Structure 0 Map Viewer
Accession
Description
Links
0 P10275 2
0 AAA51772.1
0 AAAK17H01
0 AAA51771.1
0 AAA51729 1
0 AAD45921.1
0 AAA5188S 1
RecName: Full=Androgen receptor; AltName: FulNDihydrotestosterone receptor; AltName: Full=Nucle EE
androgen receptor [Homo sapiens] >gb|AAA51771.1| androgen receptor [Homo sapiens] E
androgen-receptor [Homo sapiens] M I
androgen receptor [Homo sapiens] >gb|AAA51772.11 androgen receptor [Homo sapiens] MI
androaen receptor [Homo sapiens] >splP10275.2IANDR HUMAN RecName: Full=Androaen receptor: l£lldPubChem BioAssay Info linked to AAA51729.1
androgen receptor [Homo sapiens] Ed
androgen receptor [Homo sapiens] >splP10275.2IANDR HUMAN RecName: Full=Androqen receptor; EPubChem BioAssay Info linked to AAA51886.1
f Alignments 0 Select All Re-align
Mouse over the sequence identiferfor sequence title
View Format: | Compact
0P1Q275 1
0AAA51772 1
0AAA5178Q 1
0AAA51771 1
0AAA51729 1
0AAD45921 1
0AAA518S6 1
Conservation Setting:
3 Bits
Bit
1 Bit
12 E
3 Bits
MEVQ L GlfcRVYPRPPSKTYRGAFQHLI 4 gits
MEVQLGI tRVYPRPPSKTYRGAFQNL J]
mevqlgiIw
AASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET 80
'GPRHlflAASAAPPGASLLLLQQQQQQQQQQQQQQQQQClQQ-ET 79
:aasaappgasllllqqqqqqqqqqqqqqqq ET 75
MEVQLGLGRVYPRPPSIOYRGAFQHLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQ-ET 79
MEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET 80
HEVQLGLGRVYPRPP5KTYRGAFQHLFQSVREVIQNPGPPHPEAASAAPPGASLLLLQQQ0QQQQQ0QQQ0QQ ET 75
HEVQLGLGRVYPRPPSKTyRGAFQULFOSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQqET 80
Look for differences in the sequence (e.g., conserved residues, gaps) and start by eliminating sequences
that have gaps.
i. If after the suggested evaluations of the proteins are performed and questions remain as to which
sequence would be best to run in SeqAPASS, run all relevant sequences in SeqAPASS for the evaluation.
The individual residue differences between commonly named sequences will become most important
when evaluating residues known to be important for binding the chemical or activating the protein (Level
3 SeqAPASS analysis). After completing the SeqAPASS run, select the data that has the greatest number
of ortholog candidates for your evaluation of conservation and further predictions of cross species
susceptibility.
Depending on the protein of interest, multiple subunits may be associated with a protein. In this case, all
relevant subunits can be queried using SeqAPASS.
80
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Level 1 Calculated Percent Similarity
The SeqAPASS algorithms submit the query to NCBI's standalone BLASTp (using default settings,
including BLOSUM-62 matrix), which aligns the query protein with all proteins available in the NCBI
protein database and provides a variety of metrics associated with each pairwise alignment between the
query and hit sequences. SeqAPASS selectively captures output from BLASTp, including one sequence
per species with the highest bit score.
Detailed descriptions of metrics derived from BLASTp (e.g., BLASTP Bitscore, Evalue, Positives,
Identity, Hit length) can be found in:
The NCBI Handbook: (http://www.ncbi.nlm.nih.gov/books/NBK21106/);
BLAST® Help: (http://www.ncbi.nlm.nih.gov/books/NBK62051/)
And
NCBI Glossary Field Guide: (http://www.ncbi.nlm.nih.gov/Class/FieldGuide/glossary.html)
The top row of the Level 1 data corresponds to the queried protein selected by the user. For each sequence
queried, the Level 1, top row query sequence is used to determine the maximum bitscore for the analysis,
which is derived from aligning the query sequence to itself using BLASTp. To calculate percent
similarity, the bitscore for each hit sequence is normalized to the maximum bit score and then multiplied
by 100.
Note: SeqAPASS v2.0 and newer parse the BLASTP query and hit accessions to identify all the
species/accessions from the identical proteins. Therefore, if a hit sequence represents multiple species, all
species with the identical sequence will be found in the data table for Level 1 and Level 2. To determine
which sequence/species was identified from BLASTP as a hit and which sequence/species was parsed
from the identical sequence, view the "Full Report" for Level 1 or Level 2, column "Identical Protein,"
Where "N" is indicative of the original hit sequence and "Y" is the parsed sequence.
Common Domain Count:
Reversed Position Specific BLAST (RPS BLAST) is used to compare each query and hit sequence to
conserved domains defined in NCBIs Conserved Domain Database. A hit domain is considered in
common with the query domain if it contains the same domain accession as the query and it aligns with
the NCBI curated domain with the same or greater amino acid residue coverage than the query sequence.
81
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
Ortholog Candidate Identification
Ortholog sequences are those that have diverged from a speciation event and therefore are more likely to
maintain similar function. SeqAPASS uses reciprocal best hit (RBH) BLAST for ortholog detection by
automatically comparing each hit protein to all protein sequences available for the query species and if the
original query protein or one of its identical protein matches is identified to by the best match to the hit or
maintain the same bitscore, then the hit sequence would be considered an ortholog candidate. The
sequence is indicated an Ortholog Candidate or not with a yes (Y) or no (N) in the column.
Note: Many NCBI protein accessions represent multiple identical protein sequences in the BLASTp
output. This is due to BLASTp querying and presenting data from the non-redundant protein database.
Sometimes the identical sequences are from different species. This can be checked by following the link
for the top row "NCBI Accession" in the table to the NCBI protein page. Below the protein name
[species] title will be a link to "Identical Proteins."
Click the "Identical Proteins" link and look for a sequence in the list from the user defined query species.
1 <*2 NCBI Resources 0 How To 0
Sign in to Ncl
Protein Protein
Advanced
He
0-j) NCBI is phasina out sequence Gl numbers in September 2016. Please use accession.version! Read more...
GenPept-*
Send to: ~
Change region shown
estrogen receptor isoform 1 [Homo sapiens]
Customize view
NCBI Reference Sequence: NP_000116.2
Identical Proteins FASTA GraDhics
Analu7Q this cannanra
Note: If the top hit is a Protein DataBank (PDB) code (e.g., 1AHR_A) from RBH BLAST there will be
no ortholog candidates identified. BLASTP when ran against all accessions for a given species does not
return PDB codes. It is recommended that the user identify a similar/identical sequence to the PDB code
and use that sequence as the query sequence.
Susceptibility cut-off:
The susceptibility cut-offs listed on the "Level 1 (and Level 2) Susceptibility Cut-off' page are
determined by plotting the % similarity data from the "Primary Report" or "Full Report" and identifying
the local minimums in the data. The default cut-off is determined by taking the 1st local minimum and
moving up in percent similarity until the next ortholog candidate is found. The susceptibility cut-off
displayed in the list is the percent similarity of the identified ortholog candidate.
Criteria for Susceptibility Prediction (when " Primary Report Settings" is set to "Species Read-Across:"
Yes):
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate =Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but belongs to any organism class found above the
susceptibility cut-off the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for
82
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
"yes". This criterion allows susceptibility predictions to be made across taxonomic groups based on the
likelihood that the sequences above the cut-off are better matches to the query.
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate =N, for "no,") and does not belong to any organism class found above the susceptibility cut-
off, the hit is predicted to not be susceptible; therefore, Susceptibility Prediction = N for "no"
Note that the "Primary Report" may yield different Susceptibility Predictions than the "Full Report," as
the predictions are based on the data in the different reports. The Primary Report is filtered to only display
E-value < 0.01 and Common Domain Count > 1.
Criteria for Susceptibility Prediction (when "Primary Report Settings" is set to "Species Read-Across:"
No):
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate =Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate =N, for "no,"), the hit is predicted to not be susceptible; therefore, Susceptibility Prediction =
N for "no"
Level 2 Calculated Percent Similarity
Data obtained from the Level 1 RPS BLAST evaluation is used to assign sequence ranges that aligned
with a user selected domain (from the NCBICDD database) to each accessions from the Level 1 Full
report. BLASTp is then used to align the query domain range to each hit domain range. The percent
similarity is calculated based on the bit scores from the BLASTp alignment of the domain regions. For
each sequence queried, the Level 2, top row query species is used to determine the maximum bitscore for
the analysis, which is derived from aligning the query sequence to itself using BLASTp. To calculate
percent similarity, the bitscore for each hit sequence is normalized to the maximum bit score and then
multiplied by 100.
Susceptibility cut-off (same method as used in Level 1):
The susceptibility cut-offs listed on the "Level 2 Susceptibility Cut-off' page are determined by plotting
the % similarity data from the "Primary Report" or "Full Report" and identifying the local minimums in
the data. The default cut-off is determined by taking the 1st local minimum and moving up in percent
similarity until the next ortholog candidate is found. The susceptibility cut-off displayed in the list is the
percent similarity of the identified ortholog candidate.
Level 2 Criteria for Susceptibility Prediction (when "Primary Report Settings" is set to "Species Read-
Across:" Yes):
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
83
-------
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): User Guide
Updated 3/06/18; Contact Carlie LaLone with Questions: LaLone.Carlie@epa.gov
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate =Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but belongs to any organism class found above the
susceptibility cut-off the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for
"yes". This criterion allows susceptibility predictions to be made across taxonomic groups based on the
likelihood that the sequences above the cut-off are better matches to the query.
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate =N, for "no,") and does not belong to any organism class found above the susceptibility cut-
off, the hit is predicted to not be susceptible; therefore, Susceptibility Prediction = N for "no"
Note that the "Primary Report" may yield different Susceptibility Predictions than the "Full Report," as
the predictions are based on the data in the different reports. The Primary Report is filtered to only display
E-value < 0.01 and Common Domain Count > 1.
Level 2 Criteria for Susceptibility Prediction (when "Primary Report Settings " is set to "Species Read-
Across: " No):
All sequences identified above the susceptibility cut-off are predicted to be susceptible; therefore,
Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off, but identified as an Ortholog Candidate =Y, for
"yes," then the hit is predicted to be susceptible; therefore, Susceptibility Prediction = Y for "yes"
If the hit sequence is below the susceptibility cut-off and not identified as an ortholog candidate (Ortholog
Candidate =N, for "no,"), the hit is predicted to not be susceptible; therefore, Susceptibility Prediction =
N for "no"
Level 3 Sequence Alignments:
COBALT is used to align all user selected sequences (from Level 1 hits) with a user defined template
sequence. Because COBALT algorithms align all sequences, it is recommended that the user align the
template sequence with sequences that are most similar to one another. As a means to capture the most
similar sequences from the SeqAPASS data it is recommended that the user filter the Level 1 data by
taxonomic group and step through the Level 1 data pages one by one while selecting sequences. It is
recommended that the user look at the name of the sequence and exclude 'partial' sequences when
possible. Requesting a query from one taxonomic group at a time, breaks the data down in manageable
alignments.
Selecting Amino Acid Residues to Align:
The user may select up to 50 amino acid residues to compare across selected species in Level 3.
84
------- |