Methods for Automated Generation of Field Scenarios and Postprocessing of Pesticide Water Calculator Output U.S. Environmental Protection Agency Office of Pesticide Programs 12/31/2019 ------- Table of Contents Methods for Automated Field Scenario Generation 1 1. Introduction 3 2. Field Scenario Generation 3 2.1. Step 1: Spatial Data Overlay 4 2.2. Step 2: Population of Scenario Parameters 4 3. Step 3: PWC Output Postprocessing 6 4. References 7 2 ------- 1. Introduction As a part of the requirements for pesticide registration and registration review, the U.S. Environmental Protection Agency, Office of Pesticide Programs (OPP) conducts aquatic exposure assessments to determine whether pesticides that are applied according to label directions may result in concentrations in water that may adversely impact human health or aquatic organisms. OPP estimates pesticide concentrations in water with models that simulate soil, weather, hydrology, and management/use conditions that are expected to influence the potential for pesticides to move into water. These models - the Pesticide in Water Calculator (PWC) (Young, 2019), the Pesticide Root Zone Model (PRZM5) (Young and Fry, 2016), which is a component of the PWC, and the Spatial Aquatic Model (SAM) (USEPA OPP, 2015) - use scenarios to represent field, watershed, and waterbody properties that are important in pesticide fate and transport. Field scenarios refer to the field and watershed (environmental) inputs used in the models. USEPA OPP (2019) describes field scenario inputs used in the models. This document describes the code used to generate field scenarios and to compile the inputs for the aquatic models. 2. Field Scenario Generation Scenarios generation was automated through the execution of a series of scripts developed in the Python programming language (Figure 1). These scripts are managed and hosted in the GitHub version control system, in a public repository within EPA's GitHub account (https://github.com/usepa/opp- efed). The code was written as simply and cleanly as possible for clarity and brief execution times, and the GitHub repository and the code itself contain markup and documentation to explain the operations in the scripts. There are two steps in the automated generation of scenarios: 1. Spatial overlay of GIS layers to generate spatial index 2. Tabular join of input datasets to spatial index and collation of scenario groups A third and final step, Processing of PWC output files and scenario selection by estimated exposure, is explained in Section 3. 3 ------- Dataset Process(Python) Process (Other) Parameter Tables include: • Soil parameters (SSURGO) • Parameters linked to land cover • Crop plant/harvest dates • Curve numbers • Irrigation parameters • Parameters linked to weather grid • Parameters linked to watershed (NHD Plus) read_summary_files.py Figure 1. Aquatic model input generation flowchart 2.1. Step 1: Spatial Data Overlay The script overlay_raster.py generates a spatial index from the overlay of 3 raster (gridded) GIS datasets: (1) SSURGO map unit ID (USDA NRCS SSS, 2018), (2) Cropland Data Layer (CDL) crop ID (USDA NASS, 2014-2018), and (3) weather grid ID (Fry et al., 2016). All spatial data except SSURGO had a 30 m cell size and a national (contiguous 48 states) extent. The raster SSURGO datasets were disseminated at the state extent and a 10 m cell size; thus, the script performs a spatial merge and resampling on these layers with ArcGIS prior to the overlay. The script utilizes the ArcPy module (ArcGIS Pro 2.3) to perform a Combine operation on the input raster datasets. The result is a raster dataset, and corresponding attribute table, containing three identifiers (referred to hereafter as index fields) for each pixel in the contiguous United States: soil map unit ID (mukey), CDL crop ID (cdl), and weather grid ID (weather_grid), and an area field containing the spatial extent of each unique combination. The attribute table, named the combinations table in the code, constitutes the spatial index from which the scenarios are built. 2.2. Step 2: Population of Scenario Parameters The script scenarios_and_recipes.py generates scenarios from the spatial index. The script first aggregates combinations for all years by combining rows with common index field values. This aggregation also combines the values of the area field for grouped combinations. To account for double- cropped classes in the CDL (e.g., CDL class 26 - Winter Wheat/Soybeans), the script duplicates combinations with a double-cropped CDL crop ID and assigns each of the duplicated combinations a cdl_alias using the CDL crop ID for each of the constituent crops. An example of the use of this alias is where a single combination with cdl = 26 would be duplicated, with one of the duplicates assigned cdl_alias of 24 (winter wheat) and the other a cdl_alias of 5 (soybeans). Both rows would retain the Raster GIS Layers To Spatial Aquatic Model (SAM) Recipes Process Weather Station Grid Scenarios Weather Data (NCEP/CPC) Parameter Tables1+ NHD Plus (USGS/EPA) CDL Land Use (USDA-NASS) SSURGO Soils (USDA) Weather Array PWC Combinations overlay_raster.py scenarios_and_recipes.py 4 ------- original cdl value of 26. This cdl_alias is used as an indexing field for most crop-linked parameters. Another index field, state, is created from a one-to-one relationship with soil map unit ID. At this point, the combinations table has 5 index fields cdl, weather_grid, mukey, cdl_alias, and state. The script creates scenarios by joining several tabular datasets indexed by one or more index fields to the combinations table. These tables and the fields contained within are listed here (index fields in bold, full parameter descriptions in Estimating Field and Watershed Parameters Used in USEPA's Office of Pesticide Programs Aquatic Exposure Models (USEPA OPP, 2019): • cdl_params.csv o cdl, cdl_alias, gen_class, label_group, crop_intercept, max_cover, amxdr, usle_c_fal, usle_c_cov, cultivated • crop_dates.csv o cdl', cdl_alias, state, bloom_begin, bloom_end, plant_begin, plant_begin_active, plant_end_active, plant_end, harvest_begin, harvest_begin_active, harvest_end_active, harvest_end • curve_numbers.csv (indexed to gen_class parameter from crop_params.csv) ° gen_class, cn_fal_A, cn_fal_B, cn_fal_C, cn_fal_D, cn_cov_A, cn_cov_B, cn_cov_C, cn_cov_D • irrigation.csv o cdl', state, irrigation_flag, irrigation_type, irrigation_rate, depletion_allowed, leaching_fraction • met_params.csv o weather_grid, anetd, ireg The script takes parameters linked to soil directly from the SSURGO dataset. There were three tables within the state-level SSURGO data and one national-level table containing parameters used for scenario generation (index fields in bold, SSURGO field headers in parentheses, field descriptions in (USEPA OPP, 2019)): • muaggatt - Properties indexed to soil map unit o mukey, hydro_group_dominant (hydgrpdcd) • component - Properties indexed to soil component o mukey, cokey, component_pct (comppct_r), hydro_group (hydgrp), major_component (majcompflag), slope (slope_r), slope_length (slopelenusle_r) • chorizon - Properties indexed to soil horizon o cokey, bd (dbthirdbar_r), fc (wthirdbar_r), wp (wfifteenbar_r), orgC (om_r), sand (sandtotal_r), clay (claytotal_r), horizon_top (hzdept_r), horizon_bottom (hzdepb_r), horizon_letter (desgmaster), pH (phltolh2o_r), usle_k (kwfact) • valul - National-level data indexed to soil map unit o mukey, root_zone_max (rootznemc) In SSURGO, a map unit describes soils and other components that have unique properties, interpretations, and productivity. A single map unit may contain multiple components, one or more of which is designated as the 'major component' for the map unit. For scenario generation, the script chooses the major component comprising the largest area of the map unit (component_pct) to 5 ------- represent the entire map unit. Soil horizon data are linked to the component table. Quality control checks are applied to the soils data to identify missing or out-of-range values in soil horizons data. If an invalid or missing value is encountered in a soil horizon (e.g, horizon_bottom = 0), that horizon and all horizons below it in the map unit are deemed invalid, and the soil profile is truncated at the deepest valid horizon. The adjusted soil profile depth is applied as a maximum value to the root_zone_max parameter. Additional parameters are derived from SSURGO parameters (field descriptions in (USEPA OPP, 2019)): thickness, root_depth, evaporation_depth, n_horizons, usle_k, usle_ls, and usle_p. Additional parameters are also computed from combinations of other parameters based on the results of the join (field descriptions in (USEPA OPP, 2019)): • plant_date, emergence_date, maxcover_date, and harvest_date are derived from other crop date fields (e.g., plant_date_begin, plant_date_end) • cn_cov and cn_fal are selected from curve number fields (e.g., cn_fal_A, cn_cov_D) based upon the combination of CDL class (cdl_alias) and hydrologic soil group (hydro_group) Once the scenarios are fully populated with all required parameters, the script performs a quality control check to identify and remove scenarios with invalid or missing values. The expected values for the inputs were based primarily on documentation for the source data as described in USEPA OPP (2019). The table fields_and_qc.csv is provided as an input to the script and containing the name of each field, along with a set of QC parameters: range_min/max, range_flag, and blank_flag. These parameters are used to generate flags based on whether the values in each field are missing, fall out of range, or fall out of a more conservative 'general' range. The value of the flag represents the severity of the error. The//og parameters may have a value of 0, 1, or 2, where 0 represents no error, 1 represents an unusual value that does not invalidate the scenario, and 2 represents an error which renders the scenario invalid. Scenarios where any field has a flag of 2 are removed from the scenarios table. An output file (r[reg/on]_qc.csv) is generated that provides the flags for each scenario and field, and another (r[region]_report.csv) is generated that provides the number of scenarios marked for removal by a flag of 2 or higher for each field. The scenarios table with all valid scenarios is written to file (r[reg/on]_parent.csv). This table is then collated and sampled to produce a random sample of scenarios for each region and crop or crop group. 3. Step 3: PWC Output Postprocessing The final scenarios table for each region and crop (r[region]_parent.csv) was batch processed in PWC to simulate estimated drinking water concentrations (EDWCs). PWC generates a summary file (BatchOutputVVWM.txt) containing modeled pesticide concentrations at different exposure durations (e.g., acute, chronic and cancer), indexed by scenario ID. A post-processing script (read_summary_files.py) was used to read this output file, rank the scenarios by concentration and assign percentiles, and return a selection of scenarios corresponding to a selection percentile. This script also generates plots of the distribution of modeled concentrations. 6 ------- The script first reads the PWC output file, then performs a tabular join with the original scenarios file, retaining scenario ID, area, and hydrologic soil group fields. This table is then used to calculate percentiles for each scenario. To produce the rankings, the script sorts the estimated acute, chronic, and cancer EDWCs from the field scenarios for each HUC2-region/Koc combination simulated by PWC from highest to lowest and produces percentiles for the ranked field scenarios according to the following equation (NIST 2019, Wikipedia 2019) v n A-—- Zii=i/1( 2 Pn = yN ~A Where p„ is the percentile ranking at a given scenario n, £f=1 A[ — ^ is the cumulative area of one half of scenario n and all scenarios ranked below scenario n, and YJi=i 's the total area of all scenarios. The interpretation of the 90th percentile would be that 90% of the cropped area, as represented by all field scenarios ranked below the 90th percentile, will result in lower estimated concentrations. The percentile values are appended to the combined scenario/PWC output table. After assignment of percentile values, the complete table is written to file ([run /c/]_summary.csv). The scripts performs scenario selection for each selection percentile, which is the target percentile for the representative scenario. The scenarios with percentile values closest to the selection percentile are selected and written to file ([run /deselection.csv). The script generates plots written to files (plot_[c/i/rat/on].png) for the distributions of concentration and percentiles for each exposure duration (i.e., acute, chronic, cancer). 4. References Fry, M.M., G. Rothman, D.F. Young, and N. Thurman. 2016. Daily gridded weather for exposure modeling. Environmental Modelling & Software, 82, 167-173, doi.org/10.1016/j.envsoft.2016.04.008 NIST/SEMATECH e-Handbook of Statistical Methods, 2019. https://www.itl.nist.gov/div898/handbook/prc/section2/prc262.htm USDA National Agricultural Statistics Service Cropland Data Layer. 2014-2018. Published crop-specific data layer [Online], Available at https://nassgeodata.gmu.edu/CropScape/ (accessed Feb 2019). USDA- NASS, Washington, DC. USDA Natural Resources Conservation Service Soil Survey Staff (USDA NRCS SSS). 2018. Gridded Soil Survey Geographic (gSSURGO) Database. United States Department of Agriculture, Natural Resources Conservation Service. Available online at http://datagateway.nrcs.usda.gov/. October 11-12, 2018 (FY2018 official release). U.S. Environmental Protection Agency Office of Pesticide Programs (USEPA OPP). 2015. Development of a Spatial Aquatic Model (SAM) for Pesticide Risk Assessments. Presented to the FIFRA Scientific Advisory Panel, September 15-17, 2015. Available in the public e-docket, Docket No. EPA-HQ-OPP-2015-0424, accessible through the docket portal: http://wv.rw.regulations.gov. U.S. Environmental Protection Agency Office of Pesticide Programs (USEPA OPP). 2019. Estimating Field and Watershed Parameters Used in USEPA's Office of Pesticide Programs Aquatic Exposure Models - 7 ------- The Pesticide Water Calculator (PWC)/Pesticide Root Zone Model (PRZM) and Spatial Aquatic Model (SAM). Draft. Young, D.F. 2019. The USEPA Model for Estimating Pesticides in Surface Water, in Pesticides in Surface Water: Monitoring, Modeling, Risk Assessment, and Management. American Chemical Society, editors Goh, Kean, and Young, American Chemical Society, Washington DC. Young, D.F. and Fry, M.M. 2016. PRZM5: A Model for Predicting Pesticide in Runoff, Erosion, and Leachate, Revision A. USEPA/OPP 734S16001, U.S. Environmental Protection Agency, Washington, DC. Available for download with the Pesticide in Water Calculator from the USEPA OPP Water Models web site at https://www.epa.gov/pesticide~science~and~assessing~pesticide~risks/models~pesticide~risk~ assessment#aquatic Wikipedia contributors. Percentile. Wikipedia, The Free Encyclopedia. 2019. https://en.wikipedia.Org/wiki/Percentile#The weighted percentile method 8 ------- |