Evidence Mapping for Engineering & Exposure: Literature Search, Prioritization and Pre-Screening Strategy Ariel Hou, Yadi Lopez, Nerija Qrentas, Chantel Nicolas, Katherine Phillips, Yvette Selby-Mohamadu U.S. EPA, OCSPP/OPPT/RAD, Washington, D.C. Place holder for poster # Ariel Hou I Hou.ariel@epa.gov 1 202-564-5591 General Workflow for Engineering & Exposure Evidence Mapping I I nI sn J The Engineering & Exposure Evidence Mapping Workflow starts with a comprehensive search of peer-reviewed literature databases using chemical names (Tncl. synonyms) to identify the literature pool for systematic review. The search results are deduplicated using EPA's HERO database. After deduplication, the literature pool is prioritized using SWIFT Review to narrow down to a smaller set of references likely to be relevant for Exposure before they undergo Title/Abstract Screening in SWIFT Active Screener. Scope of Engineering & Exposure under TSCA Systematic Review: Engineering • Occupational exposure • Environmental releases Exposure Environmental exposure General population exposure Consumer exposure Databases searched for Next 20 High-Priority Chemicals: • Agricola ¦ Dissertation abstracts ¦ PubMed (National Library of Medicine) • Science Direct • TOXNET ¦ ECOTOX UNIFY ¦ Web of Science (Thomson Reuters) Step 1: Collecting Positive and Negative Seed References for Reference Prioritization SWIFT Review is a literature review classification software used by EPA for reference prioritization. The software requires both positive and negative seeds to "rank" the literature pool. References whose titles and abstracts most closely resemble the positive seed articles are ranked higher in the prioritization process. • Positive Seeds are the title and abstract of references known to contain relevant information for the discipline of interest * Negative Seeds are the title and abstract of references known to NOT contain relevant information To identify Positive Seeds, EPA used the exposure literature pool for the first 10 TSCA Risk Evaluations. The positive seed references were those that supported technical aspects of the exposure assessment forthe 1-bromopropane, cyclic aliphatic bromide cluster (HBCD), methylene chloride, N-methylpyrrolidone (NMP), perchloroethylene, trichIomethylene, and asbestos draft TSCA Risk Evaluations. Table 2. Number of Positive Seeds by Data Element Used for Reference Prioritization forthe Engineering and Exposure Disciplines | Number of Positive Seeds Engineering Data Type | Number of Positive Seeds | Chemical 1 Engineering 1 Exposure | Engineering | Exposure | 1-Bromopropane 7 9 General Facility Estimate 1 n.a. Asbestos 8 7 Occupational Exposure 40 n.a. Cyclic aliphatic bromide cluster 6 378 Environmental Release 4 n.a. Methylene chloride 9 8 Multiple 5 27 n-methylpyrrolidone 5 0 Consumer n.a. 75 Trichloroethylene 2 6 Dietary n.a. 24 Perchloroethylene 6 59 Environmental Exposure n.a. 311 Other (covers multiple chemicals) 7 16 Human Biomonitoring n.a. 46 50 483 50 483 Note: Engineering coverings Occupational Exposure and Environmental Release rs Environmental, General Population, and Consumer Exposure Negative Seeds were selected using the following method for Engineering and Exposure; Engineering - • 50 negative seeds for each set of references to be prioritized • Same number as positive seeds for most optimal prioritization • Manually selected based on review of title/abstract determined to be least relevant to the data element of interest Step 2: Assessing the Performance of Reference Prioritization Method To assess performance of the Reference Prioritization Method, validation test runs and/or analyses were performed to ensure that the positive seeds (and negative seeds) are capable of capturing relevant information. For Engineering (occupational exposure & environmental release), a total of 5 validation test runs were performed using the selected positive seeds to score a known set of literature references in SWIFT Review. Specifically: o Positive seeds were used to numerically score references tagged forthe draft 1,4-dioxane, HBCD, 1-BP, NMP, and methylene chloride Risk Evaluations in SWIFT Review o Scores were reviewed to make sure that the Engineering integrated references (I.e., those that supported technical engineering aspects of the draft Risk Evaluation) received a higher score relative to other references that were not used or were not integrated O Generally, the validation test runs show that all integrated references from the known datasets scored at the 80th percentile or higher, o From these results, EPA determined the 80th percentile score as the "cut-off score*. Prioritized references that score above this cut-off will move forward to Title/Abstract Screening For Exposure (environmental, general population, and consumer exposure), 5-fold cross validation was performed. The positive and negative seeds were split into five folds; SWIFT-Revlew scoring was carried out 5 times, each time the scoring is trained on 4 of the 5 groups of seeds and the held out group is scored: o Positive and negative seeds were reviewed to ensure they were properly scored (positive seeds had high scores while negative seeds had low scores). The lowest positive seed score was 0.7; the highest negative seed score was 0.37 o This cross-fold validation exercise shows that SWIFT-Reviewcan discern between the selected positive and negative exposure seeds o The "cut-off score" for deciding If a reference should be carried forward to SWIFT-Active Screener was determined by subtracting two standard deviations of the distribution of positive seed scores from the minimum positive seed score J Integrated Sources 50th percentlle(Medlan score) BOtti percentile Figure 1. Cumulative Frequency of SWIFT Review Scores from the 1-BP Validation Test Run (Reference Dataset from the draft l-BP TSCA Risk Evaluation) Figure 2. Distributions of SWIFT-Revlew scores for positive seeds spilt by different exposure scena rlos a nd the scores for the negative exposu reseeds.The dotted grey line shows the cutoff that can be applied to determine if a scored reference would be sent on to SWIFT-Active Screener. Exposure - • 473 negative seeds, selected from six compound of the next 20 compounds (one from each compound group) • Roughly the same total number of negative seeds as positive seeds ¦ Manually selected based on review of title/abstract to be Irrelevant to exposure Step 3: Screening References in SWIFT-Active Screener (Active Machine Learning) After Step 2, the prioritized references undergo Title and Abstract Screening in SWIFT-Active Screener. SWIFT-Active Screener is a web-based, collaborative systematic review software application that EPA adopted for the TSCA Systematic Review for the Next 20 High-Priority Chemicals. The software uses an active machine learning algorithm where, as screeners include or exclude references. It periodically computes which and how many of the remaining unscreened references are most likely to be relevant. Using this software allows EPA to manually screen only a portion of the prioritized references, focusing its resources on those that are most likely to be relevant to TSCA Risk Evaluations. Each reference is reviewed by two screeners against a chemical-agnostic Receptor, Exposure, Setting/Scenario, and Outcome (RESO) statement, and conflicts are resolved by a third, independent screener. The irlews expressed In this poster are those of the authors and do not necessarily reflect the views or policies of the U.S. EPA. U.S. Environmental Protection Agency Office of Research and Development ------- |