GenRA Virtual Training Chat Questions and Answers Below are responses to questions asked during the Generalized Read-Across (GenRA) Virtual Training hosted by the U.S. Environmental Protection Agency's Center for Computational Toxicology and Exposure (U.S. EPA CCTE) on May 23, 2023, presented by EPA's Dr. Grace Patlewicz, Dr. Esra Mutlu, and Dr. Imran Shah. Attendees submitted questions throughout the presentation. Though many questions were answered verbally during the presentation and in the Q&A box, there were some questions we were not able to answer during the training period. All remaining questions within the scope of the GenRA training are provided here. For more information on GenRA, visit the GenRA Resource Hub. Contents Chat Questions and Answers 1 TRAINING RELATED QUESTIONS 2 ABOUT GENRA 2 DATA SOURCES 3 FINGERPRINTS 5 SUBSTANCE TYPES 9 SIMILARITY CONTEXT 10 ANALOGS 12 DATA OUTPUTS 14 PREDICTIONS 16 APPLYING GENRA 17 OTHER 17 Appendix A: List of Acronyms 18 ------- TRAINING RELATED QUESTIONS Question 1: Will the slides be available with the recording? Question 1A: Can we receive the recorded presentation later? Question IB will you be giving us the answers in writing to these Qs? EPA Response: We will contact registrants when the training materials are ready. The slides, recording, and breakout activity (with and without answers) will be available on the NAMs training web site: https://www.epa.gov/chemical-research/new-approach-methods-nams- training Question 2: Do we get a certificate for this? EPA Response: We will share the survey shortly that will allow you to get your certificate for this training! https://epa.govl.qualtrics.com/ife/form/SV 5AMRHbXKdyCDbZI The survey will be open for couple of weeks following the training. Question 3: Registered only for the beginner session. Is it possible to get the worksheet for intermediate/advanced session also. It will be helpful to look at those exercises after completing the beginner exercises. EPA Response: We only have one worksheet with the same questions for all breakout rooms. The worksheet is the same for all sessions! We just matched folks who wanted more advanced guidance with our most experienced trainers! ABOUT GENRA Question 4: Is there a user manual? What is ATG, BSK, NVS EPA Response: Hi there- yes, there's a user manual, https://www.epa.gov/chemical- research/generalized-read-across-genra-manual. The most recent functionality is best described in our manuscript https://doi.Org/10.1016/i.comtox.2022.100258. ATG, BSK and NVS refer to 3 of the ToxCast platforms Attagene, Bioseek and Novoscreen. More information on the Assay Platform Sources can be found here https://www.epa.gov/chemical-research/generating- toxcast-data-toxcast-assays Question 5: Many thanks for your quick response. I went through user manual very quickly but the acrynoms are not very specific. Is there a publication or reference I can look into for the acronyms? EPA Response: We are working to update the manual to capture missing acronyms. Some of the common ones have been captured in the information icons that pop up under each panel in the application itself. Question 6: Is it possible to run GenRA for proprietary data? EPA Response: We don't recommend running GenRA on proprietary compounds. Please reach out to us if you wish to make use of a Docker image to instantiate GenRA behind your own firewall. Alternatively, the genra-py package will work to run on user-own datasets. See https://academic.oup.com/bioinformatics/article/37/19/3380/6194561 for more details. ------- Question 7: Would it be possible to run molecules as batch? (e.g., running multiple molecules as SD file or a .csv file format)? EPA Response: Please use the genra-py package for batch analysis: https://academic.oup.com/bioinformatics/article/37/19/3380/6194561 Question 8: I have a really basic question, read-across seems quite computational and if there's a potency function - what's the difference between read across and QSAR? Is there a context where one is better than the other? EPA Response: Read-across and QSAR are part of the same continuum of relating some aspects of a chemical to an activity response. The real difference is that read-across tends to be limited to a more limited pool of substances as part of an analogue or category approach. Question 9: How and with what chemicals GenRA was validated? EPA Response: This is described in more detail in our initial publication and the subsequent analyses that have followed. Our most recent publication https://doi.Org/10.1016/i.comtox.2022.100258 will provide a roadmap of how GenRA has evolved with the relevant citations to all our previous studies. Question 10: Does the website sit within a company's firewall? Can EPA see what structures are entered? EPA Response: We don't advise entering confidential information into GenRA. GenRA could potentially be provided as a docker image. Please reach out to the GenRA team via the web site to discuss further. DATA SOURCES Question 11: It looks for analogs in what database? CompTox? EPA Response: Yes, DSSTox which underpins the EPA CompTox Chemicals Dashboard Question 12: Is the analogue selection by GenRA restricted to the substances present in the CompTox/ToxRef DB? EPA Response: Yes, the analogues are limited to chemicals in the CompTox Dashboard/DSSTox database. However, the target can be any structure (one can input a new smiles string or draw a new chemical using the "Ketcher button"). Question 13: The dashboard is now including genotoxicity data that I welcome it. Can the GenRA used for read-across for genotoxicity endpoints? EPA Response: Thanks for the question. It is in the works. Question 14: Are the physchem data for the anologs predicted data or experimental data, or a mix of both? EPA Response: The physchem data is obtain from the OPERA tool: https://github.com/kmansouri/OPERA ------- Question 15: What technology is used for generating interactive graphs? EPA Response: here is the free tool: https://github.com/vasturiano/force-graph Question 16: Does ToxRef contain in vivo ecotoxicity endpoints? EPA Response: No, ToxRefDB only contains human-health related endpoints. We are planning to include EcoTox endpoints in the future. Question 17: The data matrix can bring information of the effect of these molecules in the envioroment ? or only on human health . EPA Response: Currently GenRA makes predictions for in vitro assay outcomes and in vivo toxicity endpoints. We have not explored ecotoxicity predictions as yet. Question 18: regarding the p-chem properties, it looks as though the properties are plotted relative to one another, but would also be helpful if the chems were relative to the target EPA Response: The PhysChem properties are obtained from OPERA https://github.com/kmansouri/OPERA https://icheminf.biomedcentral.com/articles/10.1186/sl3321-018-0263-l Question 19: ToxRefDB contains industry studies, correct? So, it will miss academic studies in peer- reviewed literature? EPA Response: More information on ToxRefDB can be found in the following article https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6944327/ Question 20: Are you increasing the tox data in your database form other global databases? EPA Response: Yes, our next priority is to include data from the Toxicity Values database which is broader in coverage than ToxRefDB. Question 21: Will you have a plan to incorporate more biological databases other than ToxCast and ToxRefDB? EPA Response: Great question. Please contact through the website to let us know which specific biological databases you were referring to. On the in vivo toxicity side, we are currently working to incorporate aggregate data from the Toxicity Values DB that exists in the Dashboard (ToxValDB). Question 22: Are there references available for the physchem data? is there a way to link those, where do the values come from? EPA Response: The physchem data are based on OPERA predictions. The software and methods are described in the following paper: Mansouri, Kamel, Chris M. Grulke, Richard S. Judson, and Antony J. Williams. 2018. "OPERA Models for Predicting Physicochemical Properties and Environmental Fate Endpoints." Journal of Cheminformatics 10 (1): 1-19. https://doi.org/10.1186/sl3321-018-0263-l. Question 23: What is the difference between ToxRef and ToxCast? EPA Response: ToxRef DB is for in vivo and ToxCast data filter is for in vitro studies ------- Question 24: Can user data be incorporated? EPA Response: Not currently but this is something we are exploring as a future release. Question 25: ToxRef is for in vivo data? EPA Response: Yes. That's correct. Filtering analogues on ToxRef Data only shows those analogues that have in vivo data for chemicals within the ToxRef database. Question 26: Are you working with FDA on this database? I know they are not big fans of this approach. EPA Response: No, though we have demonstrated the tool to colleagues at CFSAN. Question 27: Are you able to see the studies associated with ToxRef data? EPA Response: Not within the application. In a future version, we hope to provide a means of linking back to toxicity data presented elsewhere on the Dashboard. Question 28: Brilliant presentation, thank you! Usually in the "panel 4" we see in red some PoD related to a certain endpoint/effect. My question is: Is there a way to retrieve the reference/study for this data of the PoD? EPA Response: Thank you and thank you for your question. We hope to provide the source/study information from ToxRefDB as part of the "download" from Panel 4, Data Matrix view. Question 29: Any thoughts about incorporating artificial intelligence into making comparisons? EPA Response: Yes, we plan to incorporate additional AI/ML-based approaches in GenRA. In fact, the similarity-weighted activity algorithm used by GenRA is based on k-nearest neighbour (KNN) prediction, which is the simplest form of ML approach. FINGERPRINTS Question 30: This is great work. I wonder if there are plans to 'merry' it with OECD Toolbox to help inform custom fingerprints EPA Response: Helpful to understand what specifically this user has in mind. Happy to have a separate conversation about this. Question 31: 1. Step two: How to select the fingerprints? which one to be used when? Filter ToxRef vs Toxcast? 2. Step 3: Group and by selection? When to use drop-down 3. Step 4: GenraPy vs GenraPred? 4. Most confusing step is once I download the excel template with lot of data, I'm lost with 100+ endpoints, how can I derive a POD/NOAEL from this step? EPA Response: Selection of fingerprints is up to the end user and requires some investigation. If the end user is interested in making binary predictions of in vivo toxicity - then using the filter by ToxRef which is the default setting is the approach to go. This will ensure that source analogues returned are associated with in vivo data. In Panel 4, the recommendation is to use the GenraPred as this will ensure that some confidence is calculated for the predictions generated. ------- If ToxCast assay hitcall predictions are desired, then the end user needs to filter by ToxCast in panel 1. This will ensure that source analogues with binary hitcalls from ToxCast data are returned in panel 1. These will be used to make read-across predictions in Panel 4. Again, the GenraPred engine is recommended. The only exception to use of GenraPred for binary predictions of in vivo or in vitro outcomes is if the hybrid fingerprints are used in Panel 1. Panel 4 will default to only showing genra-py as the prediction engine. If POD predictions from in vivo toxicity data are desired, then filter by ToxRef in Panel 1 and switch the Group by in Panel 3 to Tox Dosage Fingerprint. When generate data matrix is pressed, a potency-based data matrix is presented in Panel 4. In this case, genra-py is the default prediction engine. From here the end user can filter the matrix to only predict specific study- toxicity effect PODs. Alternatively, the end user can predict across all study-toxicity effects and then sort by values to identify the most conservative predictions. Question 32: I didn't get Bishpenol A as the 1st analogue with any of the fingerprinting method. Which was the approach used in the demo? EPA Response: AIM fingerprints. Question 33: Jaccard similarity metric - this method is integrated in which fingerprint? EPA Response: The Jaccard similarity is used with all the fingerprints Question 34: Which set of fingerprints most correlates with metabolism and toxicity? EPA Response: Metabolism considerations are not currently implemented in GenRA but is a functionality we are working to incorporate into GenRA. Assessments of toxicity of analogues to base your read-across calculation on are made via the comparisons in Panels 3 and 4, and depend on whether you wish to assess that with in vivo data (ToxRef) or in vitro data (ToxCast). Question 35: How did you know the total # of features? EPA Response: Not sure what this question refers to - the total number of possible features in a given fingerprint representation? We will endeavor to document this in the manual. Question 36: Thank you. Is there a recommendation which of the approach for the similarity analysis is better than the other (for example: Chem: torsion Fingerprints or AIM etc)? EPA Response: No, some interactivity and judgment is required to make decisions on the best choice of fingerprint based on the overall quality of the source analogs that are produced. Question 37: In what cases will we need to use morgan fingerprints? For standard risk assessment what parameters do you recommend we select? EPA Response: Read-across is an interactive process where it is difficult to recommend hard- and-fast rules that are generalizable across all use cases. Choice of fingerprint will largely depend on the kind of similarity that matters most for your use case. For example, if it is critical that you find analogues that are structurally similar to your target, then using structurally based fingerprints (e.g., Morgan, Torsion, AIM) for the Jaccard similarity calculation that populates the radial plot in Panel 1 will be important for your use case. If measured in vivo data for specific endpoints matters for you, then making sure to filter by ToxRef in Panel 1 is critical, and you ------- may not mind if your analogues aren't as highly similar in terms of structure if they have a lot of available in vivo potency data across the analogue set for your endpoints of interest. It's likely that physicochemical parameters will also play a role in determining and filtering the best analogues for your use case. So, the process can be an iterative one where several sets of parameters are tested before the best compromise on a set of available analogues is settled on. Question 38: Could you explain more about the color coding and how does one determine the data quality. Thank you! EPA Response: We will answer this in more detail later however, it doesn't reflect the data quality. Each fingerprint is a binary bit vector reflecting the presence/absence of features (e.g., ToxPrints comprise 729 features, whereas Morgan fingerprints comprise 2048 features). The color density is scaled by fingerprint type from light to dark and reflects a measure of 'data availability'. The number of data records is reflected in each cell. Question 39: IS EPA thinking to develop AIM further from its beta version. It's another excellent tool. EPA Response: Great question. Do you have specific suggestions on what you would like to see here? Question 40: Step2: Is there a guidance which highlights, when to use which fingerprints/hybrid and the filter by toxref vs toxcast vs all? EPA Response: We have systematically evaluated the utility of different fingerprints for specific chemical clusters for inferring hazards. We hope to share this information with users in a manner that can suitably guide them on their usage. Question 41: What are the differences/limitations/advantages of Morgan Fgrprts vs Torsion vs ToxPrints vs AIM vs hybrid. Similarly EPA Response: Thank you for your question. Each of these fingerprints consists of a bit vector containing presence/absence of different structural moieties (such as in Morgan) versus structure and bond angle vectors (as in Torsion) versus other fingerprints. Some of these fingerprints are based on structural data, others are based on presence or absence of assay data such as the biology based fingerprints. It might be worth combining fingerprints in a hybrid format depending on exactly what you would like to base your similarity metric calculation on. If structure is your main consideration to initially generate source analogs, Morgan, AIM, Torsion, etc. would be best. Question 42: What's the difference between Morgan Fingerprints, Torsion Fingerprints, Toxprints and AIM? EPA Response: You can hover your mouse over the options in panel 1 to get brief summary descriptions of each fingerprint and its descriptors. Question 43: Please, can you explain again the Morgan fingerprint basis? EPA Response: It is a presence/absence bit vector of different common structural moieties found within organic compounds. ------- Question 44: What do the torsion and Morgan fingerprints represent? EPA Response: Morgan represents presence/absence of different structural moieties whereas Torsion capture structure and bond angle vectors Question 45: Can the AIM fingerprints be explained in more detail? How does this differ from the results generated in the AIM program? EPA Response: The AIM fingerprints are as faithful a reimplementation of the same fragments that exist in the AIM standalone tool. See https://doi.Org/10.1016/i.comtox.2022.100256 for a more detailed description of the work conducted to create these fingerprint representations. The results generated by the AIM program are likely to be different since the AIM tool relies on an internal AIM database of analogues tagged by data availability whereas GenRA relies on the CompTox Chemicals Dashboard database to identify analogues. Although the vast majority of the AIM's database of analogues overlaps with the Dashboard chemicals, our filter by in vivo data relies on ToxRefDB which is likely to be more limiting than a more general tag for data availability. We are working on extending the coverage of toxicity databases. Question 46: Are there descriptions of the options that can be used to sort the neighbors? e.g., what's a torsion fingerprint versus a Morgan fingerprint? EPA Response: Thanks for your question. You can hover your mouse over each fingerprint option in Panel 1 to get a brief summary description of the basis set of descriptors that each fingerprint consists of. Morgan fingerprints are based solely on the presence/absence of structural moieties. Torsion fingerprints contain information of the bond torsion angles as well as structural moieties. Each of the fingerprints uses different numbers of descriptors at varying levels of granularity to define the fingerprint. Hence, some interactivity and judgment are required to make decisions on the best choice of fingerprint based on the overall quality of the source analogs that are produced (based on the information Grace has shown in panels 1-4). Question 47: Is there a reference document for us to understand the different fingerprints? Question 47A: How do AIM FPs differ from other FPs available? EPA Response: AIM fingerprints are explained in more detail herein https://doi.Org/10.1016/i.comtox.2022.100256 ToxPrints are described in more detail in https://pubs.acs.org/doi/full/10.1021/ci500667v Morgan fingerprints are described https://pubs.acs.org/doi/10.1021/cilQ0050t and torsion fingerprints are described here https://pubs.acs.org/doi/10.1021/ci00054a008 Question 48: Does the dataset include ecotoxicology data? EPA Response: Currently, it only includes human health endpoints from ToxRefDB. We are planning to incorporate ecotox endpoints in the future Question 49: Is there any article that describes about AIM fingerprints? EPA Response: AIM fingerprint manuscript: https://doi.Org/10.1016/j.comtox.2022.100256 ------- SUBSTANCE TYPES Question 50: Does GenRA include much information / is it useful for metal containing substances (industrial catalyst type substances and so on)? EPA Response: Currently, chemical fingerprints capture the organic portion of substances. We are exploring approaches to develop fingerprints that cover metal-containing substances. The bioactivity fingerprints, on the other hand, represents substances based on assay results and may consider metal-containing substances. Question 51: If the first step is the structure, how does GenRA work for metals? EPA Response: Metals aren't currently supported Question 52: Does GenRA work for polymers? EPA Response: This is a good question. We do not currently treat polymers using any special structure notation, to capture monomeric units, etc. Question 53: Does GenRA allow you to search analogues by substructures? For example, if my target compound is a nitrosamine and I am only interested in other nitrosamines as potential analogues? EPA Response: I can see how this would be a very useful feature. Right now, the analogues are only identified by overall similarity, and we have not implemented a "substructural moiety" filter on the neighbourhood. Thanks for bringing this up. Question 54: We found some mixtures and salts (two or three structures) reported as individual neighbors and some problems of similarity (molecules containing fragments or structures very dissimilar to the target). Were these datasets curated before selection by fingerprints? The search is directly done in ToxCast and ToxRefDB? EPA Response: Please reach out to us directly and/or share the specific substances so that we can address this issue. We use QSAR-ready structures from the CompTox Dashbaoard / DSSTox to build chemical structure fingerprints. So the analogues may have some limitations when the substance is a salt. Question 55: It seems GenRA pulls up similar substances based on structural similarity primarily. What about the sorting of the structures based on other aspects (e.g., physchem, structural alerts, metabolic similarity etc?) EPA Response: We are working on multiple contexts of similarity. Since most chemicals have structure data, we started with chemical fingerprints. We have investigated bioactivity, phys- chem properties, gene expression/transcriptomics, phenotypic profiling and these are being introduced in GenRA. We are also actively researching how to incorporate metabolic similarity in identifying analogues. Question 56: By discrete organic chemicals in GenRA- does these mean stereoisomers for eg. are more difficult? I would just like a little clarification on discrete in this sense. Thank you! EPA Response: Unfortunately, the similarity search does not consider stereo information in the chemical structures. Let us know if this answers your question. ------- Question 57: Does EPA or others have visibility of what structures users enter? I'm thinking about whether confidential structures can be analysed using the tool? EPA Response: We don't advise entering confidential information into GenRA. GenRA could potentially be provided as a docker image. Please reach out to the GenRA team via the web site to discuss further. Question 58: Thanks, Imran. We tried from simple and small structures (i.e. 3-aminophenol) to more large and complex structures (a fluconazole-related compound). If you have mixtures and salts, the similarity and fingerprints calculations will be problematic, confusing halogens with fragments, and mixtures containing not related fragments. The interface and workflow is very interesting, but I think it would be a suggest curation steps of the data (normalization, canonization, mixtures exclusion) and data prepared by different fingerprints and descriptors (ECFP, MACCS, Morgan, etc.), endpoint, and other. The workflow is very logical and scientific-based, with a not complex workflow of data preparation and structuration it will be probably a very powerful tool. EPA Response: Thank you for the feedback! We developed the current GenRA workflow based on the use-case of conducting read-across for a single chemical. The complex mixture use-case is certainly very interesting, and we'd be happy to talk about it further. SIMILARITY CONTEXT Question 59: Does the GenRA tool "just" use chemical similarity to identify read-across target substances? Could this element of the tool be explicated at a high level?? EPA Response: As you're going to hear from Grace, GenRA is designed to consider multiple contexts of similarity: chemical structure, bioactivity, and more coming soon. Question 60: Any idea why the similar compounds feature in the Dashboard gives only a handful of analogs for chlorofluorocarbons? Seems like Tanimoto similarity doesn't work very well for these compounds. EPA Response: More likely that the fingerprint representations are not customised for such substances. We have been developing new PFAS specific ToxPrints see https://pubs.acs.org/doi/10.1021/acs.chemrestox.2cQ0403 which we are considering adding in a subsequent version of GenRA. Question 61: Is 0.39 maximum similarity worth pursuing? EPA Response: (Answered live) Thanks for your question. The Jaccard similarity based on structure is only one metric to consider in read across. Fingerprint choice can impact the magnitude of that similarity, depending on which structural aspects you wish to base similarity on. As Grace mentioned, we can also look at Physicochemical properties and their similarities as another means of isolating "good" versus "bad" analogs, and the presence or absence of relevant toxicological endpoint potency data for the read-across metrics of interest between the target chemical and the available source analogs, as we will see in future slides. ------- Question 62: How confident we feel if to select an analogue that has chemical similarity of only .2 or 0.3...I think we should select only those with >0.8 similarity score EPA Response: This is a very difficult question to answer in general. The suitable Jaccard index will vary from one group of chemicals to another. This is why we enable users to explore the potential analogues using different contexts (chemical, bioactivity) and evaluate the similarity in toxicity endpoints. Question 63: What should be ideal Jaccard score for selecting analogue? EPA Response: The Jaccard score is only one consideration when selecting an analogue. Consideration of the analogue toxicity data - concordance and consistency, physicochemical similarity are other considerations that should be brought to bear in making a selection. Question 64: what is the Jaccard similarity metric? EPA Response: It is the same as the Tanimoto similarity metric. Question 65: if the no filter is applied- how is the Jaccard similarity based on weight of evidence. I saw that changed the ranking completely with a higher similarity score for a diff chemical EPA Response: The purpose of the filters is to restrict analogues with toxicity (ToxRefDB) or in vitro bioactivity (ToxCast) data. Therefore, selecting a filter will change the number of analogues, and their level of similarity to the target. Question 66: when using a Jaccard similarity metric, is there a cutoff that is ideal? EPA Response: It is difficult to define a single Jaccard similarity threshold that will be ideal for all chemicals. When using chemical structure fingerprints, it is important to visually inspect the analogues and use expert judgement to compare them with the target to determine suitability for read-across. Question 67: Is the structural similarity always based on Tanimoto? Can other methods be selected? EPA Response: We have compared dice, Euclidean, and a couple of other similarity metrics without a substantial improvement in performance. If data suggest a particular metric would be more advantageous, we would be open to considering it. Question 68: What is a good similarity value? EPA Response: see earlier answer Question 69: When all analogues have low similarity, do we say read across cannot be done? EPA Response: (Answered Live) Thanks for your question. The decision whether to use different source analogs for a target should not be solely based on the fingerprint similarity metric. Depending on the toxicological endpoint of interest, one should consider data availability of the source analogs (panels 2-4), similarity of physchem properties (panel 1), as well as similarities based on structure (panel 1). Ultimately, it comes down to expert judgment whether analogs produced by each fingerprint are overall "good" for read-across based on all of these factors together. This can be an iterative process where different fingerprints and their resultant source analogs are chosen and compared on the aforementioned criteria, whether individually or as a hybrid. ------- Question 70: May you please review how the similarity score is computed? EPA Response: In simple terms, the Tanimoto similarity between two chemical fingerprints is calculated by dividing the total number of elements that are in common between the fingerprints by the total number of elements in the two fingerprints. For example, if chemical 1 has a fingerprint FP1 = {fl, f2, f3, flO, fll, fl2, f20} and chemical 2 has a fingerprint FP2 = {f 10, fll, fl2, f20, fl92, f243, f567}, where {fl, f2, f3,..., f567} are FP1H FP2 4 elements like structural features, then the Tanimoto similarity metric = = —= 0.4 ' ' FP1UFP2 10 On a separate note, the Tanimoto similarity and Jaccard index are equivalent when the fingerprints are the same as binary vectors. Question 71: What is similarity weight? EPA Response: GenRA uses the similarity-weighted activity to predict and endpoint for a target using analogues. The "weight" in this case, is the Tanimoto similarity. Question 72: What similarity index is considered good enough or more reliable? EPA Response: The similarity metric is just one consideration in evaluating and selecting analogues. Question 73: Is there a typical similarity threshold cut-off? EPA Response: see earlier answer Question 74: In vitro data will most often provide higher similarity scores? EPA Response: see earlier answer Question 75: Quite interesting to see that with a data rich compound like the one used in this demo session, the best analogue has chemical similarity of 0.39. Wonder how good is uncertainty score for this prediction? EPA Response: Please remember this is using neighbors by Morgan prints, depending on the end-user and the outcome, you can always look at alternative fingerprints or custom fingerprints ANALOGS Question 76: How can we best choose the "Neighbors by"? EPA Response: Currently, this requires interactive exploration, starting with the default options (Morgan fingerprints and filtered by ToxRef data). ------- Question 77: If we have potential analogues from other sources then how can we analyze those analogues through GenRa? Question 78: I know GenRA has ability to let you deselect the chemicals, but can it allow to select the chemical analogues that the assessor desired? EPA Response: This is something we are actively working on now. Allowing the end-users to define their own neighborhoods or select analogues from the network exploration tool. Question 79: Is there a best practice recommendation for evaluating choice of "similar" analogous, i.e., looking at multiple methods for finding nearest neighbors? EPA Response: Currently, this requires interactive exploration, starting with the default options (Morgan fingerprints and filtered by ToxRef data). We have systematically evaluated the utility of different fingerprints for specific chemical clusters for inferring hazards. We hope to share this information with users in a manner that can suitably guide them on their usage. Question 80: If an identified analogue, doesn't fit (not a good analogue despite high tanamito), can it be excluded? EPA Response: Yes, they can be excluded. We will show that soon. See the checkmark in Panel 4 next to the pairwise similarity. Question 81: Is there a way to know the analogues collected are actually good enough for the read- across? I guess sometimes, a new structure can be so different that there is no good analogue. In this case, will the system give some "analogues" anyway? EPA Response: Indeed, this is a value judgement by the end user. GenRA will return back the most similar analogues with data that the end user can review and evaluate relevancy for based on the predicted physicochemical properties and available toxicity data. Question 82: Are the read-across predictions taken directly from analogues, or are they weighted assemble values? EPA Response: The read-across prediction is a similarity weighted activity outcome derived from calculating the pairwise similarities of the analogues multiplied by their activity outcomes divided by the sum of the pairwise similarities. Question 83: if you have more than one source analogue for data, is the most conservative POD chosen? EPA Response: The focus within GenRA should be that the set of analogues that are kept after filtering are those that adopt the criteria that suit your use case the best in terms of structural similarity, type (in vivo/in vitro) and quantity of data available for your endpoints of interest, physicochemical properties, etc. Given that the read-across calculations that GenRA performs occur across the set of analogues kept after filtering, a few good representative analogues (e.g., possess the endpoint data you desire, structural similarity, chemical similarity, etc.) would be ideal to achieve trustworthy read-across results. Question 84: Is the analogue selection restricted to the substances present in CompTox? EPA Response: Yes, though you can introduce any chemical of interest using Ketcher. ------- Question 85: can we run read across for more than 10 analogues? EPA Response: Yes, you can select up to 20 analogues in Panel 1. Question 86: Does the presence of analogues without data affect the predictions (ie., weaken the prediction)? EPA Response: No, since these are not taken into account when making the prediction. Question 87: I did not see how to use the phys-chem properties to delete specific analogs? EPA Response: Expand the properties in Panel 4 and use that to guide whether an analogue ought to be deselected from consideration Question 88: Do you see GenRA having the option to input your own batch of chemicals to perform read across? For example, you have identified your own class of chemicals that you want to use to inform a data poor chemical. EPA Response: Yes, this is something we are working on right now - how to allow the user to identify their own analogues. You may be better off using genra-py for batch processing, if you are interested in batch processing - please contact genra.supportffiepa.gov for more details - we have an API that you can use to run predictions also. DATA OUTPUTS Question 89: Is metabolism considered in the predictions? EPA Response: Metabolism is not currently considered in the predictions; however, we are looking into this actively. Question 89: Does GenRA consider the similarity of metabolic/clearance pathways as part of the read- across? EPA Response: Not currently but this is something we are actively working on. Question 90: I don't see the Bio options online. Does anyone else? Audience Response: Bio options are only available for some chemicals. Question 91: What are the numbers and colors in Panel 2? EPA Response: The color density represents a measure of 'data availability' for the target - from light to dark. The number of data records is reflected in the box itself. Question 92: With this tool, can you access the specific tox effect (increase/decrease body weight or liver enzymes)? EPA Response: Panel 4 can be filtered by study type-toxicity effect e.g., CHR (chronic)-body weight. ------- Question 93: Is there a way to focus the GenRA results so that only in vivo data are included in the output matrix view? EPA Response: Yes, in Panel 3, select the "ToxRef Group" to see only the in vivo endpoints. We will go through this in the breakout session. Question 94: Is it possible to filter out substances in the inital step which do not have any relevant data? EPA Response: you can de-select analogues based on the endpoints shown in panel 4 Question 95: Is there a way to sort by lack of data? EPA Response: In panel 4, searching by observations will provide a view of which endpoints and source analogues are most data poor or not. Question 96: Ql: After clicking on "Run Read-Across", would it be possible to save the generated table (with red and blue boxes)? EPA Response: It is possible to download the information presented in Panel 4 as an excel file or CSV file but this presents the numeric data available not the heatmap view of red and blue coloured cells. Question 97: only in vivo data are shown in the data matrix (panel 4)? EPA Response: Either in vivo or in vitro data can be shown in Panel 4. One can choose to see the in vitro bioactivity data in panel 4 by selecting the "Group: ToxCast" in Panel 3. We will go over this in the breakout session. Question 98: how much data (in vivo and /or in vitro) opr what is the minimal data required to make a RAthat is highly probable? EPA Response: That's a really difficult question to answer generally for all chemicals. We have explored this systematically in our publications and have been able to find "sweet spots" for optimal performance in many cases. It continues to be a research problem. Question 99: I think it was mentioned that the predictions can be exported to excel. How do we do that? EPA Response: Predictions can be exported in panel 4 by clicking on the Download option. Most useful is to download the predictions in Excel format. Question 100: Can you filter by route of exposure of the studies in ToxRefDB? EPA Response: There are a limited number of inhalation exposure guideline studies in ToxRefDB, which is why we have mostly oral exposures in GenRA. As our sources of toxicity data grow, and we have additional information, we plan to include functionality for filtering by routes of exposure. ------- PREDICTIONS Question 101: What does ACT stand for and what is its use? EPA Response: similarity weighted activity = ACT. Act is the similarity weighted activity based on analogues. AIM was used in the demo as the fingerprint option Question 102: how to get AUC and p values? EPA Response: AUC and p values only appear after a prediction is run and typically only show up when there is a minimum of 2 positive and 2 negative chemicals. Question 103: In case there is residual uncertainty in the read across prediction, would you consider using an assessment factor to account for this when setting an acceptable limit? EPA Response: That is for an end-user to determine relative to the decision context they are interested in. Question 104: Results: ACT=1 pos effect, high likelihood of effect? Interpretation of Neg, ACT=0.32, AUC=0.75, p=0.13? EPA Response: There are several factors to consider in using GenRA predictions: The number of analogues: The more similar and greater number of analogues that are available, the more confident you can be in the GenRA prediction. This is because GenRA uses a statistical method called "nearest neighbors" to make its predictions. The more analogues there are, the more likely it is that GenRA will find a close match to the target chemical. ACT: The similarity-weighted activity (ACT) is a value between 0 and 1 that tells you how likely it is that the target chemical will have the same activity as the analogues. A value of 0 means that the target chemical is very unlikely to have the same activity as the analogues, while a value of 1 means that the target chemical is very likely to have the same activity as the analogues. AUC: The area under the receiver operating characteristic (ROC) curve (AUC) is a measure of how well GenRA can distinguish between active and inactive chemicals. An AUC of 0.5 means that GenRA is no better than chance at making this distinction, while an AUC of 1 means that GenRA can perfectly distinguish between active and inactive chemicals. An AUC of 0.7 or higher is generally considered to be good. p-value: The p-value is a measure of the statistical significance of the GenRA prediction. A low p-value means that the prediction is statistically significant, which means that it is unlikely to have occurred by chance. If p=l then it means that the ACT and AUC are unreliable. It is important to consider all four of these factors when interpreting the results of GenRA predictions. The more analogues that are available, the more confident you can be in the prediction. The ACT and AUC values can give you an idea of how likely it is that the target chemical will have the same activity as the analogues. The p-value can tell you how statistically significant the prediction is. ------- It is also important to remember that GenRA is a statistical method, and no statistical method is perfect. There will always be some uncertainty in any prediction. However, by considering all four of these factors, you can make more informed decisions about the reliability of GenRA predictions. Interpretation examples: I) ACT=1: "f AUOO.5 and p<0.1 there there is a high likelihood of an effect. II) ACT=0.32, AUC=0.75, p=0.13: there is a high likelihood that there is no effect as ACT<0.5, AUOO.5 and p~0.1 Question 105: is there any cut off value for AUC or p value above which the prediction is reliable? EPA Response: See detailed explanation above APPLYING GENRA Question 106: Is this tool accepted by reg agencies? EPA Response: Policy determinations by EPA or other Agencies are beyond the scope of this training. Question 107: Are there any criteria to exclude chemicals from getting a read-across in GenRA EPA Response: This is currently a judgement by the end-user based on the information that is presented in Panel 4. OTHER Question 108: Is GenRA considering the number of scientific publications/articles per endpoint? Because having a list of analogues without data is not very useful for read-across. EPA Response: Good question. There are a number of factors to consider while using literature information in read-across predictions. We are exploring a variety of text-mining approaches to extract information about chemical-effects from the literature. This feature may be included in future versions. Question 109: Is read-across from read-across advisable though (ie: include source analogues of source analogues of source analogues in the viewer)? EPA Response: Recursive read-across? This is an interesting research idea. Please feel free to reach out to us to discuss further. Question 110: In what cases, do we need to use Toxcast vs Toxref data? EPA Response: This depends on the end user, what outcomes they are interested in evaluating as well as depending on the target and analogues? if they don't have enough end points from toxref, they can evaluate toxcast? ------- Question 111: Why do you discard the Benzoic acid if it has a value of 3? EPA Response: We discarded Benzoic Acid because its logKow value was less than 2 even though it was initially kept based on its physchem properties when we were filtering on melting temperature since its melting temperature helped characterize that it was a solid. Question 112: By selecting Chem: AIM and ToxCast data, we may get high similarity score, however the analogue often lacks in vivo data. Is that a good approach? EPA Response: If you just want to pick analogues then perhaps. If you need toxicity data to infer hazard / POD, then you will probably need to include ToxRef. Appendix A: List of Acronyms Acronym Definition CPDat Chemical and Products Database EC50 50 percent effect concentration ECHA European Chemicals Agency ECOSAR Ecological Structure Activity Relationships ECOTOX ECOTOXicology Knowledgebase ENVIROFATE Environmental Fate Database EPI Suite Estimation Program Interface Suite GLP Good Laboratory Practice MEC Measured Environment Concentration NAMs New Approach Methodologies OECD Organization for Economic Co-operation and Development OPERA OPEn structure-activity Relationship App PCA/PLS principal components analysis/partial least squares regression QSAR quantitative structure-activity relationship REACH Registration, Evaluation, Authorisation and Restriction of Chemicals SAR structure-activity relationship SET AC Society of Environmental Toxicology and Chemistry ------- |