STORET Tips and Tricks for Setting Up STORET Data U.S. Environmental Protection Agency March 2004 ------- Tips and Tricks for Setting up STORET Data Introduction One of the most important features of STORET as a data management system is its ability to maintain data of documented quality. Properly documenting your data by providing rich metadata (i.e., data about data) can ensure that monitoring results will be used appropriately. In STORET, much of this metadata needs to be set up in the system up front, before you begin to enter monitoring results. STORET was designed to be very flexible so that varied ways of documenting and describing data can be accommodated. Because of the many ways STORET can be set up, the task can seem daunting to first-time users. To provide useful insight to new users, we conducted a survey of current users asking how they implemented their initial setup and entered metadata. Thirteen users from different types of monitoring organizations responded to the survey. The purpose of this document is to provide helpful ideas for setting up various aspects of STORET. This information was compiled from the survey responses and is designed to help you customize STORET for efficient data import, retrieval, and display. We asked the respondents to comment on various types of STORET data, including Water chemistry, Sediment chemistry, Tissue chemistry, Physical measurements, Habitat, Biological information (both raw and metrics data), and Data loggers. Although the survey respondents enter data in all of these categories, most submitted input on water chemistry, physical measurements, and raw biological data. Before you enter information into STORET, it is important to take the time to analyze your data and decide how you want to enter it. You will also want to consider what data you want to get out of STORET. Determining how to organize and enter data before you start may save you many hours of reformatting and processing the data. While it is useful to determine how to organize your data before entering them into STORET, it is also important to be flexible, since you may have different types of data to put in and get out of STORET. In general, it is helpful to structure the data you plan to enter into STORET in a fashion similar to your current data analysis systemthis minimizes time to rearrange or process data. You can use the information in this document to help you find a data situation similar to your own and determine how someone else approached the problems and set up their data for STORET. It is up to you to define a system that is useful and efficient for your situation. This document is divided into sections according to the main information categories in STORET: projects, stations, station visits, trips, activities, characteristic groups, lab/field analytical Tips and Tricks for Setting Up STORET Data ------- procedures and equipment, and sample collection procedures Each of these information categories provide important metadata for results you will enter into STORET. The next sections cover each of these categories, describe options for defining them in STORET based on the experiences of the survey respondents, and list advantages and disadvantages of the various approaches. For more details, see Appendix A, which contains the complete text of all responses to the survey. Tips and Tricks for Setting Up STORET Data ------- Projects Project descriptions contain essential information about purpose, procedures, standards, and methods related to the project. The advantage of using the projects in STORET is that they allow you to retrieve data sets that are specific to a project. For example, by associating sampling stations with a lake monitoring project, a staff member can go into the database and retrieve all sampling data associated with the specific project for analysis. A project in STORET must include the following elements at a minimum (note that other metadata can be stored as well): Unique Identifier (up to 8 characters long), Start Date, Duration (enter "ongoing" for continuous monitoring), Name (e.g., "Alameda Creek Volunteer Monitoring"), and Purpose (e.g., "Monitor effects of urban development on water quality of Alameda Creek"). There are many ways to set up projects in STORET, each with advantages and disadvantages. The survey respondents reported setting up anywhere from 2 to 1,000 projects, based on how their projects are defined. Table 1 shows the different ways the survey respondents have set up projects for STORET. Table 1. Options for Defining Projects in STORET Options for Defining Projects Program Name or Study1*'.2'4'9'13- Year3 Program Purpose5'8'10 Program Name and Year6 Hydrologic unit code (HUC) or Watershed3'7'12 Collection Entity - Location and Date -1 1 Advantages Unambiguous-1 Easy to retrieve data from STORET-1' 13 Less initial formatting 2 Data owners can define data organization4 Quick to retrieve project-specific data4 Matches USGS data "water year" books3 Easy to retrieve data from STORET3 Can track separately 5 Easy to retrieve data from STORET6 Easy to see any changes to stations 6 Forces a QC check of HUCs7 Works well when multiple entities are involved in one project9 Tracks environmental problems by location and time-11 Data can be sorted different ways -1 1 Disadvantages May limit retrieval options9 Difficult to deal with multiple years3 Difficult to categorize data that fulfill more than one purpose5 Does not take advantage of STORET metadata capabilities7 Requires more cooperation between counties12 Numbers in the tables correspond to the number assigned to each survey respondent and identify the respondent(s) associated with an option or response. The full text of the specific comments can be found in related sections of Appendix A. Tips and Tricks for Setting Up STORET Data ------- Stations Stations, also referred to as sites, identify or describe the physical location at which monitoring occurs. All data collected in the field are linked to a specific location, or site, at which the field work was conducted. Recording station data links water quality measurements to the place they represent. Precise location definition is very important to environmental analysis, and EPA data standards for locational data are strictly followed in STORET. All applicable federal standards (e.g., Federal Information Processing Standard [FIPS], National Institute of Standards and Technology [NIST], and others) are adhered to wherever possible. Stations may be part of external reference schemes, and may carry a multitude of identifiers from each of these schemes. For example, a station in STORET might have a National Pollutant Discharge Elimination System (NPDES) number or a state regulatory program code. Each monitoring station must include the following elements at a minimum (note that other metadata can be stored as well): ID Code (up to 15 characters, e.g., for Alameda Creek Volunteer Monitors, stations could be labeled "AC-001, AC-002, etc.), Station or Waterbody Type (e.g., Stream, Lake, Well, Estuary), Latitude and Longitude (does not have to be accurate to nearest square inch but try to be as precise as possible), Geopositioning Method (e.g., GPS, map interpolation), Datum (e.g., North American Datum 1983, etc.), and State and county. The number of stations reported by survey respondents ranges from 100 to over 18,000 depending on how the stations are set up. Table 2 shows different ways survey respondents have set up naming conventions for STORET stations and the advantages and disadvantages users described for each option. Please note that, in the examples, site IDs and station numbers are synonymous. Tips and Tricks for Setting Up STORET Data ------- Table 2. Options for Defining Stations in STORET Options for Defining Stations Each individual program decides how to input data in STORET l- 10 Program and Site ID 2 Reach IDs from Legacy STORET 3 County and site 4 Ecoregion5 Project, county, sequential number6 HUC, waterbody ID, number7'10 Stream names 8 ID from state Laboratory Information Management System (LIMS)9 Project ID, waterbody ID, site ID n Sequential numbers 12 Waterbody ID, site ID 13 Advantages Easy to import into STORET 2 Consistent with Legacy STORET3 If site IDs are set, easy for data collectors 4 Helps with biocriteria development5 Can track by watershed 7 Can easily link to source data9 Easy to understand n Can track by waterbody : : Easy to assign numbers to new stations12 Easy to determine location13 Disadvantages Difficult to locate information : No consistency10 Some state agencies prefer direction10 Difficult to calculate and retrieve 3 Easy to have duplicate IDs for streams 4 Difficult to separate combined biological, chemical, and habitat data from chemical-only data5 Easy to have duplicate IDs 6 Must verify and QC each HUC 7 Easy to have duplicate IDs 7 Too long for STORET 8 Not useful for querying based on topics 9 Difficult to track multiple waterbodies11 Not always possible to assign sequentially, so one waterbody may have multiple numbers12 Tips and Tricks for Setting Up STORET Data ------- Trips A field trip is a method of grouping actual visits to monitoring stations. One trip could involve a single visit to a single station or multiple visits to several different stations. Trips also provide a framework for storing "blank" samples and other QC activities. Trip information includes the following at a minimum (Note that other metadata can be stored as well): An ID code for the trip, Date the trip began, and List of projects supported by the data collected on the trip. It may be useful to define trips by a combination of geography and time. For example, the Nevada Division of Environmental Protection (NDEP) monitors throughout the State of Nevada. Nevada is divided into seven major river basins. A trip is defined as all visits to the stations located within one basin for an entire year. For example, the trip labeled "Carson 1999" included all the monitoring stations visited in the Carson River Basin for the year 1999. Table 3 shows different ways survey respondents have set up trips for STORET. Table 3. Options for Defining Trips in STORET Options for Defining Trips One year of sampling for a particular program 1? 10> 12 "T" plus the station number2 Year3'4'7'8 Field crew, year, and week 5 Project and date6 One trip per day, month, or year9'11'13 Advantages Easy to understand : Users often want to view one year of data 3 Limits the number of trips for easy data management3 Helps organize and load data7 Easy to see if data is already in STORET 7 Easy to find date by year and field office 5 Easy to find data for corrections 6 Accurate representation of actual trips 9 Can be auto-generated11 Easy to find data from a particular time period 13 Disadvantages May need to reformat data 2 IDs can be too long5 Time consuming to look up the week a trip started 5 Tips and Tricks for Setting Up STORET Data ------- Station Visits Station visits are the events that occur when a particular site or station is visited to conduct monitoring activities. Any number of activities can be done during a single visit. For example, during a site visit, one field observation activity could include measurements of water temperature, dissolved oxygen, and pH. Another field observation activity could involve measuring vegetation cover as part of a habitat assessment. Sampling activities could include collection of a water sample or the collection of fish for tissue analysis. Station visits in STORET allow the user to track the frequency of visits to sampling stations. Visits consist of the following information at a minimum (Note that other metadata can be stored as well): Date and time of the visit, Station being visited, and Visit number. A station can be visited any number of times during a single trip. However, the visit number must be different for each visit to a station. Table 4 shows different ways survey respondents have set up station visits for STORET and the advantages and disadvantages users mentioned for each choice. Table 4. Options for Defining Station Visits in STORET Options for Defining Station Visits One station visit per day1'4'10'13 Sequential number2' 12 Allow STORET Import Module (SIM) to assign3' 6' n Date and station ID 5 Assign number based on visit spanning several days 7> 8 Station, date, time 9 Advantages Easy to understand : Creates a unique ID when combined with date and time 2 Can assign by sampling date and time 3 Easy to understand5 Good for tracking multiday visits 7 Accurate tracking 9 Disadvantages May be duplicates later2 Resampling will create duplicates 3 Various activities on different days may not appear to match7 Tips and Tricks for Setting Up STORET Data ------- Activities Activities define a task accomplished during a visit to a monitoring station. Activities include collecting samples, taking field measurements (including habitat assessments), and making field observations. Activities also document information about the sampling process, including collection methods, sample preservation procedures, and personnel performing the activities. Activities can be associated with specific monitoring projects. Activity information should include the following at a minimum (Note that other metadata can be stored as well): ID code of the activity (for a sample, it may be helpful to make the ID the same as the sample code), Activity type (e.g., sample, field observation/measurement [includes habitat assessment], or automatic data logger results), Medium (e.g., air, water, soil, biological), Date and time of the activity, Activity category (e.g., other sample information, such as routine sample, composite, or replicate), Activity location (monitoring station where activities occur), and Collection procedure (for samples only). Table 5 shows different ways survey respondents have set up activities for STORET. Tips and Tricks for Setting Up STORET Data ------- Table 5. Options for Defining Activities in STORET Options for Defining Activities Combine field measurements and lab analyses and assign numbers : Month, day, and year (field and lab results are separate) along with parameter code 2 Field and lab results combined for U.S. Geological Survey (USGS) data and separate for other data3 Lakes: Sample location, depth, field or sample (F or S), replicate (R) Streams: Lab sample number and descriptive code 4 Biological (B), chemical (C), or field (F) plus a unique number5'11 Three different activity IDs: water conditions, atmospheric conditions, and water samples 6 Unique activity ID: year, trip ID, sequential number, and a suffix representing a medium code or sample type 7 Field measurements separate from lab analyses 8> 9 One activity for date, time, depth, activity type, and category10 Assign field measurements the lab ID for a station with an up, 12, 13 Advantages Efficient for data entry : Keeps activities separate and unique Easy to determine which data were analyzed in the field or the lab, and whether a result was created from a sample or a field measurement 2 Separate results helps track and match field and lab data3 Streams: system works well 4 Clear5 Easy to group field or chemical data : : Can help find data in STORET 6 Works well7 Can track field and lab data separately 9 Easy to determine what field conditions existed when samples were taken 13 Disadvantages Difficult to track parameters in STORET1 Must know if trips and visits are new or existing for entry into STORET3 Lakes: depth difficult to make unique 4 ID may be too long for the activity field4 Field and chemical samples are separate in STORET, but must be considered together during analysis 5 More complicated to keep separate 9 Tips and Tricks for Setting Up STORET Data 10 ------- Characteristic Groups Characteristics are things that are actually measured and analyzed, for example, water temperature, pH, arsenic, lead, DDT, total nitrogen, etc. You can set up characteristic groups to help group together the characteristics you use frequently. Characteristics can be grouped by medium, activity type, or any other useful category. Using characteristic groups allows you to enter data with similar metadata as a group rather than providing the metadata for each piece of information. Characteristic groups can be most helpful when using the batch entry function of STORET. In addition, you can use characteristic groups to assign different metadata to a characteristic that is collected in two different ways. For example, if dissolved oxygen is analyzed by two different procedures, you can use a characteristic group to assign different metadata to each set of dissolved oxygen results. The information associated with each characteristic is dependent on the type of characteristic, but generally the following are needed at a minimum to set up a characteristic group (Note that other metadata can be stored as well): Group ID, Group name, Medium (e.g., water, biological, habitat assessment), Activity type the characteristic will be associated with (e.g., sample, field observation, automatic data logger), Characteristic name (select from STORET), Units of measurement for the characteristic (e.g., Mg/L, count, percentage), and Analytical method used with the characteristic. One way to determine how characteristic groups may be helpful to you is to list the names of all characteristics you will be analyzing, and then group the names into logical categories. Categories could include pesticides, field measurements, or biological measurements (e.g., taxonomic abundance). Note that if the same characteristic is analyzed in more than one medium, it is necessary to set up a separate group for each medium. This is common with toxics, which may be measured in water, sediment, and tissue. In general, it is helpful not to be too restrictive when setting up characteristic groups. For example, if you only measure a few characteristics, you may find it easier to set up a single characteristic group, even though it might combine characteristics from a field observation, (e.g., water temperature) with those from a sample (e.g., total nitrogen in water). STORET will prevent illogical groups, such as combining characteristics for taxonomic abundance with those for automatic data loggers. Table 6 shows different ways survey respondents have set up characteristic groups for STORET and the advantages and disadvantages of each option. One survey respondent suggests using old STORET parameter codes as row IDs since other systems, like the Permit Compliance System Tips and Tricks for Setting Up STORET Data 11 ------- (PCS) and USGS/National Water Information System (NWIS), use them. Some respondents do not recommend using characteristic groups because it complicates getting data back out of STORET. Table 6. Options for Defining Characteristic Groups in STORET Options for Defining Characteristic Groups Activity type : Standard Analysis Code (SAC) 2 Biological: by order/family Chemical: by lab (except for USGS data) Fish: all in one group 3 One sample group (use lab codes as row IDs) and a few groups for field measurements 4 Medium; activity type (biological, chemical or field) 5> 7 Medium, activity type, and analysis groups 6 Collection, sample preservation/transport, field or lab 8 One group for lab results, one group for field analysis 10> 13 Do not use characteristic groups n Advantages Easy to separate types of measurements : Easy to set up database : If already associated with the data, easy to set up for STORET2 Can associate characteristic groups with specific samples through database links4 Easily corresponds to monitoring data 5 Can set defaults for methods and units 7 Quick to input data 13 Can associate metadata with each activity using other methods n Do not have to set up groups n Disadvantages SACs may overlap 2 Creates too many groups 3 Characteristic groups are less flexible than attributes 4 Using characteristic groups from outside sources can compromise defaults 7 Tips and Tricks for Setting Up STORET Data 12 ------- Lab/Field Analytical Procedures and Equipment Lab/field procedures and equipment provide information on how each piece of data was analyzed or measured. For example, for samples analyzed for total suspended solids (TSS), a lab may use EPA method 160.2, "Non-Filterable Residue - TSS." The lab or labs you use should have this information readily available. Although it is a good idea to have an analytical procedure for each analyte, it is not required for everything. This is often the case for items measured in the field, such as water temperature, dissolved oxygen, and pH. Lab and field procedures are connected to results in STORET. Table 7 shows the way survey respondents have set up analytical procedures and equipment for STORET. Table 7. Options for Defining Lab/Field Analytical Procedures and Equipment in STORET Options for Defining Analytical Procedures and Equipment Advantages Disadvantages National procedures and some state- developed methods .' 2< 3< 4<5' 6< 7< 8< 9<10< 11,12,13 Creating state-specific methods allows more lab-specific methods to be defined1'3> 4> 13 Can use SIM translation feature to map analytical procedures9 Some missing methods in the national list2 Finding the correct national procedure can be time consuming7 Tips and Tricks for Setting Up STORET Data 13 ------- Sample Collection Procedures Sampling collection procedures describe how samples were collected, including information on sampling gear, gear configuration, sample preservation, and storage. For measurements made at the station, such as water temperature, this does not apply. You can create as many procedures as necessary. Collection procedures should include the following information at a minimum (Note that other metadata can be stored as well): Sample collection procedure ID (up to 12 characters), Name of the procedure, and Type of sampling gear used (e.g., water sampler, electroshock, net). Gear/equipment configurations describe the types of field measurement or sampling gear that are used. Once you select gear ID STORET, you can enter the gear configuration. This can include serial numbers, size, manufacturer, and any other information. Sample preservation, transport, and storage describe how a sample was preserved and transported to the lab for analysis. This basically consists of descriptions of containers, (e.g., glass bottles), and preservation, if any (e.g., dry ice, 864, etc.). Table 8 shows different ways survey respondents have set up sample collection procedures for STORET. Table 8. Options for Defining Sample Collection Procedures in STORET Options for Defining Sample Collection Procedures For gear configuration, describe tools and methods? 1P For sample preservation, describe bottle used? 1P Based on type of bottle used? 2' 8P National procedures and some state- developed procedures? 3P Five collection procedures and six gear configurations that cover all cases? 4P Biological or grab sampling? 5P Procedures unique to each project? * Designate collection procedure but not gearPlop Do not define sample collection procedures? 7P Advantages Methods are linked to data? 1P Easy to separate data? 2P Easy to determine whether sample is a grab or spatial composite? 3P Captures necessary level of detailP 4P Easy to trackP 5P Quicker to set up STORETP 7P Disadvantages May be difficult to ascertain what bottle type was used? 2P Less descriptive metadata? 7P Tips and Tricks for Setting Up STORET Data 14 ------- More Information If you would like more information about setting up STORET, contact the EPA STORET assistance hotline at 1-800-424-9067 or by e-mail at STORET@epa.gov. Tips and Tricks for Setting Up STORET Data 15 ------- [This page intentionally left blank.] Tips and Tricks for Setting Up STORET Data 16 ------- Appendix A Tips and Tricks for Setting up STORET Data Survey This appendix contains the survey questions sent to current STORET users and the full text of their responses. The first section, submitter information, includes the name and agency of each respondent, and some other general information. The remaining responses are organized by STORET data categories (e.g., projects, stations, etc.). In each section, the questions are listed first with an assigned letter. The responses are numbered to correspond to the submitter information in the first section and then further identified by the question letter. A. Survey Instructions The purpose of this survey is to gather helpful ideas for setting up various aspects of STORET. This information will be summarized, compiled and made available to the STORET community to assist others in making decisions on how to customize STORET for efficient data import, retrieval, and display. All setup information is helpful - even if it didn't work well. Setting up STORET can be daunting and a document summarizing how others have done it may save much time and frustration for new users as well as provide great ideas for experienced ones. Please fill out this survey and send it to: wilson.eric@epamail.epa.gov. Thank you in advance for taking the time to provide this valuable information and help others avoid "reinventing the wheel." Submitter Information 1. Name: Geoffrey Smith Agency: Delaware River Basin Commission Email Address: What type of data are you putting into STORET? Check all that apply. x Water Chemistry Habitat Data Other Sediment Chemistry Biological Data - Raw Tissue Chemistry Biological Data - Metrics x Physical Measurements Data Loggers Approximately how many projects and stations do you have in STORET now? 2 Projects, -100 Stations 2. Name: Carrie Wengert Agency: Pennsylvania Department of Environmental Protection Email Address: cwengert@state.pa.us What type of data are you putting into STORET? Check all that apply. x Water Chemistry Habitat Data Other Sediment Chemistry x Biological Data - Raw x Tissue Chemistry Biological Data - Metrics Physical Measurements Data Loggers A-l ------- Tips and Tricks for Setting up STORET Data Survey Appendix A Approximately how many projects and stations do you have in STORET now? 3-soon 4 Projects, 570 Stations 3. Name: Paul Morton Agency: New Jersey Dept of Environmental Protection Email Address: paul.morton@dep.state.nj.us What type of data are you putting into STORET? Check all that apply. Water Chemistry x Habitat Data Other x Sediment Chemistry x Biological Data - Raw Tissue Chemistry Biological Data - Metrics x Physical Measurements Data Loggers Approximately how many projects and stations do you have in STORET now? 37 Projects, 6000 Stations 4. Name: Jim Porter Agency: Minnesota Pollution Control Agency Email Address: jim.porter@pca.state.mn.us What type of data are you putting into STORET? Check all that apply. x Water Chemistry Habitat Data Other Sediment Chemistry Biological Data - Raw Tissue Chemistry Biological Data - Metrics x Physical Measurements Data Loggers Approximately how many projects and stations do you have in STORET now? 33 Projects, 4421 Stations (Many stations are transfers from Legacy with no new data. 1836 stations have visits in the new system.) 5. Name: Tavis C. Eddy Agency: Wyoming Department of Environmental Quality /Water Quality Division Email Address: teddy@state.wy.us What type of data are you putting into STORET? Check all that apply. x Water Chemistry Habitat Data Other Sediment Chemistry x Biological Data B Raw Tissue Chemistry Biological Data B Metrics Physical Measurements Data Loggers Approximately how many projects and stations do you have in STORET now? 2 Projects, Stations 6. Name: RickLangel Agency: Iowa Geological Survey (Iowa Department of Natural Resources) Email Address: rlangel@igsb.uiowa.edu What type of data are you putting into STORET? Check all that apply. x Water Chemistry Habitat Data Other Sediment Chemistry Biological Data - Raw Tissue Chemistry Biological Data - Metrics x Physical Measurements Data Loggers A-2 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A Approximately how many projects and stations do you have in STORET now? 45 Projects, 326 Stations 7. Name: Deb Borland Agency: MT Department of Environmental Quality Email Address: ddorland@state.mt.us What type of data are you putting into STORET? Check all that apply. x Water Chemistry x Habitat Data Other x Sediment Chemistry x Biological Data - Raw Tissue Chemistry Biological Data - Metrics x Physical Measurements x Data Loggers Approximately how many projects and stations do you have in STORET now? -200 Projects, 4000 Stations 8. Name: James Adkins Agency: Div. Water & Waste Management, WV Dept. Environmental Protection Email Address: jradkins@mail.dep.state.wv.us What type of data are you putting into STORET? Check all that apply. P Water Chemistry F Habitat Data Other P Sediment Chemistry F Biological Data - Raw Tissue Chemistry F Biological Data - Metrics P = Present P Physical Measurements F Data Loggers F = Future Approximately how many projects and stations do you have in STORET now? 10 Projects, 3500 Stations 9. Name: Dave Wilcox Agency: Gold Systems Email Address: Dwilcox@GoldSystems.com What type of data are you putting into STORET? Check all that apply. x Water Chemistry x Habitat Data Other x Sediment Chemistry x Biological Data - Raw x Tissue Chemistry Biological Data - Metrics x Physical Measurements Data Loggers Approximately how many projects and stations do you have in STORET now? Projects, Stations 10. Name: Julia Utter Agency: Florida Department Environmental Protection Email Address: What type of data are you putting into STORET? Check all that apply. x Water Chemistry Habitat Data Other Sediment Chemistry x Biological Data - Raw Tissue Chemistry Biological Data - Metrics x Physical Measurements Data Loggers Approximately how many projects and stations do you have in STORET now? 932 Projects, 18515 Stations A-3 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A 11. Name: Rich Hanson Agency: South Dakota Dept. Env. & Nat. Res. Email Address: Rich.hanson@state.sd.us What type of data are you putting into STORET? Check all that apply. x Water Chemistry Habitat Data Other x Sediment Chemistry x Biological Data - Raw Tissue Chemistry Biological Data - Metrics x Physical Measurements Data Loggers Approximately how many projects and stations do you have in STORET now? 66 Projects, 1105 Stations 12. Name: Joe Gross Agency: North Dakota Department of Health Email Address: What type of data are you putting into STORET? Check all that apply. x Water Chemistry x Habitat Data Other Sediment Chemistry x Biological Data - Raw Tissue Chemistry Biological Data - Metrics Physical Measurements Data Loggers Approximately how many projects and stations do you have in STORET now? 110 Projects, 1275 Stations 13. Name: [respondent left blank] Agency: [respondent left blank] Email Address: [respondent left blank] What type of data are you putting into STORET? Check all that apply. x Water Chemistry Habitat Data Other Sediment Chemistry Biological Data - Raw Tissue Chemistry Biological Data - Metrics x Physical Measurements Data Loggers Approximately how many projects and stations do you have in STORET now? 3 Projects, 383 Stations Projects Questions: A. How did you define your projects? (By type (water treatment plants, volunteer river monitoring etc.), by calendar year (all data for a specific type broken out by calendar year), by length of applicable QAPP (data for type is added as long as under the same QAPP), by site or facility (i.e., one project = one facility's data), etc.) B. What were the advantages and disadvantages to setting up projects this way? A-4 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A Answers: 1 A. Project Name was defined by the program that the data is collected under. B. Advantage - Unambiguous results, ease of sorting once out of STORET. 2A. I identified my projects based on the programs already designated. For example, one of our projects is WQN, which is all data associated with Pennsylvania's Fixed Water Quality Network program. Similarly, WQF is associated with our Fish Tissue sampling program and GWN is associated with our Groundwater Network. B. The major advantage was that the data for these programs were already divided based on the program itself and that was one less step that I needed to perform when formatting the data. I have not found any disadvantages of having my data broken down into these projects. 3 A. Mix: USGS by "Water Year," multi-year projects by Watershed, Bio data by calendar year. B. USGS data matches "Water Year" books published by USGS, bio data matches reports, 303(d) data easy to pull out by trip, but doesn't match QA Plans (which were by year). 4A. A project may be a defined agency program, a more general ongoing monitoring effort, or a specific data provider, such as a local project that sends us its data. Examples: North Shore Load Project, Lake Trend Monitoring, Pipestone Creek TMDL Project. If staff want to be able to query data for a particular monitoring effort, we suggest they set up a STORET project for it. B. Data owners (monitoring and related staff) define how the data are organized, so it seems intuitive to them. It makes it easy to query data for a particular program or data provider. 5A. Our program is specific enough in its use of STORET; for watershed sampling we have two projects: our REFERENCE Project which entails healthy systems that are used in our biocriteria analyses for determining stream condition in general. Our BURP (Benficial Use Recon. Project) Project is for the ambient monitoring of surface water across the state to determine if designated uses are being met. B. Some sites, or sampling events, are used for both projects, and we also have cases where a given sampling event (such as a water quality complaint) does not seem to fit into either category. The BURP project ends up being pretty all encompassing. The advantage is that these projects are also (somewhat) aligned with budget history of the same name. 6A. Our projects are a combination the water-quality project and water year that the data is collected. For example, data collected as part of the Sny Magill 319 Monitoring A-5 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A project during Water Year 1999 would be assigned a project of SNY1999. Our ambient data collected in Water Year 2002 would be assigned a project of AMB2002. B. With our web-based retrieval, we can quickly retrieve project-specific data (for people only interested in a one project) and see any changes to our stations (for example, if an stations have been added or removed). 7A. Historic data migrated into the modernized STORET were migrated using projects categorized by type of project. The current data being generated and input into STORET is largely data collected for the TMDL program. Although this could be considered one project with common DQO and data collection methods, the effort includes roughly 100 HUCS across Montana. For this reason, the projects were broken down by HUC or Watershed. (We wanted "Project" to represent a long-term, ongoing grouping for results). B. One advantage was that the project is always "known" to data management staff. Project corresponds to HUC, a required field for station establishment. (QC on the site location must be performed up front to determine HUC and Project, so it forces a QC check early on site locations). A disadvantage is that it does not fully utilize the available metadata associated with the project designation. Note that the use of watershed-based project groupings does not preclude the creation of projects that are more consistent with STORET metadata capabilities, but did serve to simplify the STORET start-up for our TMDL program. 8A. Usually, by purpose. Have Total Maximum Daily Load, Regular Ambient, Intensive Survey, etc, projects. Duration is ongoing for most of our projects. B. [respondent left blank] 9A. We have managed projects in one of three ways: 1. One project for the entire organization. 2. Some organizations have pre-defined projects (studies, etc). In this case we simply use these projects. 3. For some organizations that have several different entities entering data into one org, we have assigned one or more projects to each entity. B. 1. The obvious advantage of a single project is simplicity. The disadvantage is that you limit your retrieval options. 2. Using existing projects is probably the best option for most programs as it is easy to manage and provides a valuable way to retrieve their data. 3. This final option works well for programs managing data from many different entities. A good example of this would be a state hosting several volunteer monitoring groups. If it is determined that all of the volunteer monitors will share a single org, then project becomes a good way to segregate their data. 10A. 1. The type of monitoring plan the agency is using often defines project. 2. Abbreviate of the existing project name is often used. A-6 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A B. No apparent disadvantages. 11 A. A project is a specific sampling effort delineated by location(s) and sampling date(s). We created an ACCESS table containing projectlDs as well as project duration, start date and a brief description of the project. This table is linked to other tables containing station IDs, sampling results, etc. Our project IDs are somewhat descriptive of the project with the last character identifying the type of project. Example The Lake Alvin Assessment Project is ALVINZZ1. The number "1" designates it as an assessment. An implementation project will be designated by a "2" (Example - the Lake Alvin Implementation project ID would be ALVINZZ2. Dredging projects end in "D." All following assessments on the same project end in a different odd numbers all following implementation will end in a different even number. If we have something unique that doesn't fit the three typical projects we plan to add a different letter as the 8th character. B. Most environmental problems, assessment projects, or control efforts are usually defined by where and when they occur so grouping data by project appears to be the most practical way to group data. ID Set-up: This way we can sort data different ways. All Alvin data can be found by truncating the last letter off the ID. Or all assessment projects can be found by looking for the odd numbers, all implementation projects can be queried by searching for even numbers. 12A. For the most part they are defined by regionality (i.e., Watersheds, Counties, etc.). B. A watershed may extend into multiple counties, and if defined by county, the project sponsor (319 sponsors) may not be willing to implement the project across county lines. Takes more cooperation between counties/sponsors. 13A. A project is often a study; so for example, one project may be a group of WQM stations along a certain stream. It should be noted that a station might be in more than one project. B. Assigning data in this manner can be useful when pulling data from STORET. Stations Questions: A. What type of labeling scheme did you develop for your station IDs? A-7 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A B. What were the advantages and disadvantages to setting up station IDs this way? Answers: 1 A. It is dependent on who is program manager. We have everything from full numeric station id's to station id's that have a state abbreviation and streams position in watershed longitudinally to collection organization and river mile of that site. B. Extensive research to determine exactly where site is. 2A. As stated above, our programs are defined with defined goals, sites, etc. The WQN program has stations set up simply by the letters WQNO### and the three digit number of the WQN site. The fish tissue stations are set up a little differently. They begin with WQF indicating the type of station, followed by the five-digit stream code followed by the river mile where sampling began. So, an example of a fish tissue station would be WQF00002-17.8. This indicates a Fish Tissue station on the Delaware River starting at 17.8 RMI. Finally, the Groundwater Network stations were named in the following manner. They too are begun with a three-letter prefix in this case, GWN. The next identifier is a letter indicating the drainage, D=Delaware, S=Susquehanna, etc. The next grouping of numbers/letters indicates the sub-basin (based off of PA's State Water Plan) where the station is located (i.e., 02C = Lehigh sub-basin). Finally, the last digits indicate a numerical sequence of stations assigned in that drainage. So an example of a GWN station is GWND02C023 which tells us the station is a groundwater station located in the Delaware Drainage Lehigh Subbasin and it is the 23 site identified in this region. B. In all cases, the advantage was that the sites were very similar and easy to format so that they could all be imported into STORET following the same template. Again, I haven't found any disadvantages using this system. 3 A. Used IDs from Legacy STORET, based on old River Reach system. B. Consistent with LDC (for as long as it's up) and stations come out in downsteam order. Disadvantages: Numbers are a pain to calculate, because they depend on miles up a segment and I don't have a route built in my GIS coverage. I can't retrieve data on project specific station IDs (USGS, bio) as we could with old STORET secondary IDs. Project specific Station IDs used in 305(b) and 303(d) reporting, so lists on EPA Web sites don't match what can be searched in STORET. 4A. For lakes, we use commonly accepted IDs developed by the Minnesota Department of Natural Resources. Each ID consists of a county code and a serial number, e.g. 27- 0016. To track specific monitoring sites on lakes, we include separate site codes on activity IDs. The site coordinates are tracked outside of STORET, although we hope to add them to STORET as "actual activity locations" eventually. For streams, we use a serial number in the form SOOO-000. We record aliases for stations, including Legacy STORET primary and secondary codes, as external A-8 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A reference scheme station IDs. The goal is for each monitored point to have a single unique ID with separate aliases used by different data providers. B. Lakes: Almost all data collectors in the state use the same IDs. Because a station is defined as an area and not a point, it simplifies things and reduces the number of station establishments. Tracking the sites (actual activity locations) separately means maintaining another system and complicating our activity ID assignments. Streams: We do not currently have a good system in place to make sure that a monitored point gets established only once. It can be time-consuming to determine whether stations are co-located, and if so, how close is close enough to call it the same station. The IDs are meaningless, which is good data practice but less intuitive for monitoring staff who use the IDs. We used to use a stream name abbreviation and a mile indicator, such as CD-0.5. Increasingly detailed stream GIS coverages have rendered the mile values inaccurate. Assigning codes for new stations relative to old, inaccurate station codes got complicated. 5A. Ordinal by ecoregion, e.g., MRW26 means the 26th station for the Middle Rockies West ecoregion. B. This is helpful in that it is easy for use to organize data by ecoregion, which is critical in our biocriteria development. Our stations have traditionally been assigned for those locations where we have collected biological (benthic macroinvertebrates) chemical and habitat data, as a full triad. This means that when we collect only chemical data (which is frequently) we have added the WQ (for water quality) onto the front of it e.g., WQMRW05. This has been cumbersome. Another issue with our station IDs is for those early stations; sometimes a MRW5 and a MRW05 both pertain to the same 5th station established for that ecoregion. This discrepancy has appeared and makes organization difficult. 6A. We use an 8-digit numeric code. The first 2 digits are for project identification. The next 2 digits are the county number. The remaining 4 digits are a sequential number. B. Since stations can be used on multiple projects, we have to take extra time to double check our current stations to make sure we do not duplicate stations with multiple STORET number. 7A. Our station ID convention is HUC based, again. A three-character code designates the HUC. This is followed by a five-character acronym for waterbody, and then a two-digit numeric for a total of 10 characters. B. Disadvantage: The Cross-reference to HUC code must be known prior to assigning a station ID, and the HUC must have been verified as correct. Another disadvantage is that the station ID must be verified as unique in STORET prior to using to avoid assigning duplicate identifiers. Advantage is that all the stations in a single watershed group together in the STORET user interface and Report module, and the stations on a single waterbody group A-9 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A together within a given HUC if consistent acronyms have been used. Also people recognize their station IDs. 8 A. Most of all our streams have Alpha Numeric labels. Many labels are greater than 15 characters allowed by STORET. So for a site, which have Id as WVABC-1-X-123-P- 4-ZZ, Station Id could be ABC-001-0123, so it could fit in Station Id. Box. B. Need for STORET allow for more than 15 Characters for Station IDs. Then we can better link STORET to other databases in our agency. Now we only use EPA Key Identifiers for State Labels which can exceeds 16 characters allowed for Station Id for quite a number of our streams and waters. 9A. In most cases, the stations already have some type of ID from their use in the local LEVIS system. As there are many better ways to query a station that by ID, we typically use this existing ID or a simple numbering mechanism. B. The advantage is to keep it simple and to be able to tie it back to the source of the data. The disadvantage would be to not have a logical key (i.e., with extra info coded into the ID itself) that could be user for querying. In a robust database like STORET, however, you should simply be able to query a station by any of the many station fields and should not feel compelled to encrypt this information into the ID. 10A. 1. The station ids are based on delineation from a HUC code map used in the 1970s. 2. Each agency has developed their own system for labeling station ids. B. 1. The disadvantages are that the map is no longer in use and is not in an electronic format. It's hard to find and if you have gaps in your number set you're stuck having to create a new numbering system that doesn't fit with the old system. 2. Some agencies within Florida would like FDEP to mandate a consistent method for creating station_ids (i.e., lat/longs as the station_id). This would allow any user to understand the specific labeling system during retrievals no matter which agency they are interested in. 11 A. The first seven letters are the first seven letters the Project ID. Subsequent letters or numeric characters (the last 8) refer to type of waterbody and site ID. Example: ALVINZZLA01 might refer to the Lake Alvin Project (ALVIN), a lake sample (LA) and site 01 (a specific sampling location in the lake). ALVINZZLAT01 might refer to a tributary site (T) at a specific sampling location (01) on a tributary. These sites can then be used for different types of projects for Lake Alvin such as an implementation project can have the same site ID as the assessment project. If more than one project was done at Lake Alvin then a numeric character could be inserted following the project name. So ALVIN1LA01 might refer to an in lake site from a Lake Alvin Assessment Project and an ALVIN2LA01 might refer to the same site used during an implementation project. A-10 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A B. Simple logic and relatively easy to understand what the station ID refers to. This allows for quick recognition of which waterbody is involved and it is often a quick way to discern which project it is and what kind of project it is. It should be noted, however, that some projects involve more than one waterbody and so a project name may not always refer to a specific waterbody. In those cases, one would need more information (location) to figure out the project name. 12A. A 6-digit number beginning with the number 38. For example, our next station will be designated the number 385275. B. Advantage: Ease of assigning ID's to newly created stations. Disadvantage: Sequential numbers are not always possible when ids are assigned at different times. For instance, Lake Isabel may have 2 ids associated with it, 384207 and 384208. If a new id is needed, it would be designated the next available number (i.e. 385275). This is not a huge problem. It would be less difficult for the field personnel sampling to remember the code if they were sequential. 13 A. A 6-digit number/letter combination is assigned to each station. B. The letter portion of an ID is set up as an abbreviation of a specific waterbody or area of the state. This makes it easy to tell at a glance, in which region of the state, a station is located. The number portion is unique for each station. Station Visits Questions: A. How did you set up or define your station visits? B. What are the advantages and disadvantages to setting up the station visits this way? Answers: 1 A. One station visit per day. B. It tends to idiot proof the system... and you know that we need that. 2A. The main identifier with Station Visits was our sequence number that was associated with the original sample. This along with the date and time creates a unique station visit. B. The advantages to doing things this way is that you do not need to create a field or re- create existing data. The disadvantage, and I have not run into this yet, is that I am not sure what the program will do if a sequence is ever repeated. A-ll ------- Tips and Tricks for Setting up STORET Data Survey Appendix A 3 A. Sequentially or allow SIM to define. B. Allowing SIM caused problems if we had to go back to a site a second time for the same "round" of sampling. Allowing SIM to assign keeps us from going crazy trying to keep it straight. 4A. One numeric ID per day per trip, usually assigned automatically by SIM. The only exception is for one large volunteer monitoring project, in which we code visit IDs to keep each volunteer's data together. This project uses three-character alphanumeric codes. B. This system seems to work well. 5A. Chronological by Station ID; most other cells left blank within the Station Visit Menu. B. To be honest, we use this rarely to find/organize data; it does allow us to know how many times we have visited a site, but this ends being easy to discern in many ways. 6A. We let SIM automatically assign station visits into STORET when we upload data. For most of our projects, we will have only on visit per day. So, SIM is setup to assign station visits by sampling date. But, some projects will have multiple visits on the same day. So, SIM is setup to assign visits on sampling date and time. B. We do not use Station Visits with our web-based retrieval program. 7A. For our TMDL program, a station visit may span several days for practical purposes. Trips to distant part of the state may involve many hours of driving time. Monitoring staff may visit a number of sites repeatedly to complete assessment and sample collection tasks. A visit spanning several days allows us to group related activities on a single trip into one visit "event." B. One advantage is the ease of tracking the visit, and matching the visit when loading results for various activities for a given station on a given trip. Also the activities are grouped together. One disadvantage is that it is not always intuitive that various activities performed on different days involve the same visit. From the monitor's perspective, there may have been two or three distinct site visits during a trip to a given region of the state. 8A. For a given trip, we may visit many of our sites several times during a "trip." B. [respondent left blank] 9A. For most programs, we have created a single station visit per station and date that the sampling occurred. If the program identifies that they may visit a station several times in one day, we will set it up as one station visit per station per date and time. A-12 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A B. Again, keep it simple if the program is not collecting this data. Of course, as soon as a program starts tracking their actual trips and station visits; these trips and visits should be imported into STORET to provide the most accurate tracking of this information possible. 10A. Station visits are defined using the one visit per day default in SIM. B. None yet discernable. It is not an often-queried data layer. 11 A. Station visits are auto-generated. If we need to query some data, we can query by Project/Station/Date and Time. B. Advantage - One less number to try to come up with. It might be easy querying with more descriptive station ID however with how we organize and use our data our system works for us. 12A. By the visit number for that year. Station 384321 was visited in May. That would be visit number 1. Then visited again in June, that would be station visit number 2, and so on. B. Works well and is easy to understand. 13A. A station visit is any sampling performed on one day. B. It seems the logical way to set up station visits. Trips Questions: A. How did you set up (define) your trips (i.e. by day, month, year etc.)? B. What were the advantages and disadvantages to setting up trips this way? Answers: 1 A. We define trip as one year of sampling under that specific program. B. Advantage - Easy to parse out each sampling season. Disadvantage - Is a rather broad "trip" designation. 2A. Currently the trips are identified by T and then the station number. Activities are set up using date, month, year, etc.). A-13 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A B. Several times, the system rejected a trip ID for some reason unknown to me and I had to reformat the data to "make it work." I haven't found any other advantages or disadvantages, and I'm still confused as to the significance of the trip. 3 A. Usually by year. B. People usually want to see a whole year's worth of data and I don't have a billion trips in my list as I would if I used matrix or season as the grouping item. 4A. By project year. For example, the Milestone project's 2002 water year data is under the trip MILE-2002WY. The 2002 calendar year Lake Trend data is under the trip LAKETRND-2002CY. Project staff determine whether they want the data stored by water year or calendar year. B. This system seems to work well. The only advantage we see to using trips in the literal sense is to track trip blanks. We decided it was not worth the hassle of creating so many trips for that one function. 5 A. We created Trips for the sole purpose of STORET. They are specified by the field crew, year and week; for example: LA980831 pertains to the trip taken by the Lander Crew, in 1998, on the week starting 8/31. B. Advantage: we can isolate by year and field office. Disadvantage: the ids get long, and for previous years it is time consuming to look up the beginning of the week that a trip went out. 6A. Trips are a combination of project and date (dependent on the number of samples received in a month). For projects that we expect few samples each month, the trip would be a combination of the project and the year that the samples were collected. For projects that we expect many samples each month, the trip would be a combination of the project, month, and year. B. We use trips to find data in STORET that may need corrections. Otherwise, we do not use trips for data retrieval. 7A. Trips were defined by year. Generally we use a project year, except in the case of TMDL program where an entire major basin or region of the state is combined into a single Trip. For monitoring projects conducted external to our agency, a separate trip helps distinguish the data and allows us to designate a "trip leader" without maintaining external personnel in our Organization personnel list. B. The Trip by Year concept helps greatly in organizing and loading current and historic data, and ensuring completeness without duplication. If data from a given data source belongs in a designated Trip, we know right where to look to see if the data exists in STORET. Related data also tends to be grouped together, rather than a statewide grouping based on all samples collected in June, for example. A-14 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A 8A. If for a specific watershed during a given year, we consider a trip for 12 months for that watershed for one Project, like TMDL. Next year we will be sampling another watershed or body for 12 months for TMDL. For our Ambient Sites, we use quarter of a year in which all our sites are visited just once. B. [respondent left blank] 9A. For programs tracking trips, we simply import what they provide. For all others we have created either one trip per day, month, or year. B. For legacy data, one trip per month or year works well to organize their data without an enormous number of trips. The more years of data they have, the better the one trip per year option gets. For programs that do monthly monitoring, one trip per month works well as this provides a pretty accurate representation of their actual trips. 10A. Trips are defined by using the one trip per year default in SIM. B. The advantages are that the STORET interface can handle browsing the number of trips generated by the one-per-year default. Other than that it is not an often-queried data layer. 11 A. Trips are auto-generated by day. One trip per day per project. B. Advantage - Easy for us to generate trip ID. Disadvantage - None really for us. 12A. By the year. B. Have not discovered any major advantages or disadvantages with this method of setup. 13A. Each month's worth of WQM data is considered a trip. All data for all stations is included in that trip. B. Assigning data to monthly "trips" makes it easy to find data from a particular period of time. Activities Questions: A. How do you set up (define) activities (i.e., keep field measurements separate from laboratory analyses or combine together, etc.)? B. What were the advantages and disadvantages to setting up activities this way? A-15 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A Answers: 1A. We combine all together (I just give them arbitrary numbers) and the designation lies in the Activity type. B. Advantage - More time efficient to number this way for data entry. Disadvantage - When open screens in STORET you can't really tell what you are looking at in terms of parameters. 2A. Activities are defined by month, day and year. As far as specific activities, field and lab results are separated using our parameter codes assigned to the sample. B. The advantage to this is that it keeps all activities separate and unique and allows an end user to clearly see which data was analyzed in the field and which data was analyzed in a lab. 3 A. Mix again. Data from USGS is combined; data we do internally is separate. B. Separate forces me to keep track of Activity IDs so the field and lab stuff matches up, and also is unique. Also a pain to keep separate because I need to know if the Trip/Visit is new or existing to tell SIM to create a new one or use an existing one. 4A. Activities were more difficult to define than most other STORET concepts. Because field and lab data had to be separated by type into different activities, we had to devise a system to keep sampling events together. An explanation of our ID scheme follows. It might offer some insight into how we did this. Lakes: Activity IDs consist of a three- to five-digit site code indicating the sample location on the lake, plus a hyphen, plus a two-digit depth to the nearest meter, plus F for field msr/obs or S for sample, plus R if a replicate, plus a digit if necessary for uniqueness. For 2-meter-integrated samples, the suffix following the site code is - I2S. Examples: 201-OOF1, 102-03S, 102-03SR, 401-I2S. Streams: Activity IDs are usually the lab sample number followed by one or more descriptive codes. The field data corresponding to the lab data also uses the same sample number for the core of the ID. If a lab sample number is unavailable, we use a separate serial number in its place. The suffix codes for regular samples are F for field msr/obs, S for sample, plus R if it's a replicate. For QC samples, the suffix code is Q, plus either E for equipment blank or R for reagent blank. Examples: 200215678F, 200215678S, 200215678SR, 200215678QE. B. Lakes: We set it up this way to track site codes and to keep profile data organized. There are some significant disadvantages, however, due to the rounded-off depths. Say a person uses a Hydrolab or YSI to collect profile data at site 103, resulting in readings at 3.7m, 4.0m, and 4.4m, among others. Under our scheme, each would have an activity ID of 103-04F. I have to add a digit to each to make it unique: 103- A-16 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A 04F1, 103-04F2, and 103-04F3. The activity ID field is not long enough for us to use more decimal places and still accommodate the rest of our coding scheme. Streams: This system seems to work well. 5A. Our activities are as follows: we conduct biological, chemical and field data collection. Field pertains to that which was directly observed in the field, such as pH, Dissolved Oxygen, etc. So an activity ID of B67 refers to a biological activity, and the same station and date would have a C67 for the chemical data. B. There is clarity to our approach, although the chronological ID does not aid us in any major way, and by splitting field and chemical samples we treat them separately in database organization, while we need to consider them together in analyses. 6A. We keep activities separated. Currently, we use 3 different activity Ids. Two are for field measurements (one for measurements related to water conditions, the other for atmospheric conditions that are recorded) and the final one for water samples that are collected. B. Activities are only used to help us find data in STORET. Like trips, activities are not used in our web-based program for data retrieval. 7A. We ordered sheets of pre-printed, color-coded labels from (Shamrock). Each sheet may have, say 10 identical labels. These labels are used to identify the related activities for a given site visit. To ensure uniqueness for each activity ID, a suffix is added that represents a medium-code or sample type. For example, "M" for macroinvertebrate, "W" for water, "S" for sediment, "F" for Field Msr/Obs, etc. Typically, one label is used in the field book, and/or on the site form. This gives all activities related to a site-visit similar activity ID(s), and all group together. The activity ID has a two-digit prefix for year, an alpha character corresponding to a given trip, a three-digit numeric (sequential), then a hyphen and room for the medium code suffix. Example: 02-L127-M. Our activity ID is typically 9 characters, though the suffix for sample type, or the sequential numeric could be longer if necessary. B. This convention has many advantages, and is working well for us. 8A. Field measurements are separated from laboratory analysis. Activities, like Nutrients, are kept separated from Metals, etc. B. [respondent left blank] 9A. In almost all cases we have taken the effort to break out the field measurements from the lab analyses. B. This takes a considerable amount of work, compared to leaving them as a single activity, but the data is ultimately modeled much better. If this is not completed, then users finds themselves being forced to add a sample collection procedure for an A-17 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A activity that may have only had field measurements. Keeping field and lab results in a single activity also causes the problem that you lose an easy way to track if the result was created from a sample or a field measurement. A similar situation exists for medium. Often Air temperature is taken with a water activity. As the characteristic actually includes the word 'Air' in the characteristic description, we will sometimes include this with the water activity just to keep things simple. 10A. Activities are defined as a composite set; one activity for each sample date, time, depth, activity type and category. B. It seems that this is the only way to set up an activity short of ignoring the activity type data layer and defaulting everything to Sample or Field Msr/Obs. It would help if the Activity ID field were longer than 12 characters so that it would be easier to construct an ID given that there are so many pieces of data that have to be considered in one activity id. 11 A. The activity ID for lab analysis is the same as it is for Measurements/Observations except measurements and observation activities have an "F" after the Activity ID to designate as a field measurement. We use separate ACCESS tables/forms to group field data, chemical data, biological data, etc. The ID for contaminate data has "C" after the lab ID, Elutriate data has an "E" after the ID, the ID for the Algae data has an "A" after the ID and so forth. B. As an advantage we can group our field and chemical data easily by comparing similar activity IDs. As STORET dictates each kind of data (i.e. chemical, biological, field, etc.) has certain attributes and this uniqueness of each data type led us to group the data in our ACCESS table by sheet. In our ACCESS data entry forms we can then keep unique biological sampling or chemical sampling and link them to the field data sheet in the same form. 12A. The lab assigns a Chem-Log number for every sample (i.e. 03-R0001). This log number is used as the activity ID. To keep the field measurements associated with that sample separated, we place an "F" at the end of the log number to differentiate the field measurement results (i.e., 03-R0001F) from the lab results. B. No major advantages or disadvantages. 13 A. We assign all field measurements the ID number that the laboratory assigned the laboratory data for the station. We add an "F" to the end of that lab number. B. Using the lab ID number for field data makes it easy for us to tell that certain field conditions existed when certain samples for lab analysis were taken. Nevertheless, because field data has an "F" at the end of the number, STORET does not actually combine the lab and field data. Therefore, we hope to eliminate any confusion about what analysis was performed in the field and what analysis was performed in the lab. A-18 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A Characteristic Groups Questions: A. How do you set up characteristic groups? Do you group characteristics by medium, activity type, i.e., Field Measurements vs. Sample, or do you find it more convenient to combine different types of characteristics into a few or even one group. B. What were the advantages and disadvantages to setting up characteristic groups this way? Answers: 1A. We tend to group by activity type. B. Advantage - It allows us to separate out the different types of measurements quickly, eases database set up when using SIM. Disadvantage - When in STORET App. it is not possible to see all at once when done this way. 2A. Our characteristic groups are defined by the SAC (standard analysis code) used in analyzing the specific samples at the lab. So, for example, if a sample is sent to the lab labeled SAC 10, SAC 10 may consist of testing for Metals. So, under characteristic group SAC, all of the metals would appear as Characteristics. B. As is the case with many of the things I've used, this already existed and was associated with the data so therefore, I did not need to create a new way to group the samples. The disadvantage was that the SAC's overlap as far as characteristics so it was timely in setting up initially. 3A. Started when we were limited to 100 Row IDs. Bio by Order/Family. Chem by lab except for USGS, which is by groupings (field, routine, metals, organics, etc). Field/Lab separate. Fish all in one group (created after the 100 Row ID limit). B. Creates too many groups, hard to keep track of. Worried that a few big groups would slow SIM. 4A. We set up one large sample-type characteristic group for our main lab using the lab's analysis codes as row IDs. We also set up a few field-type characteristic groups. I had planned to set up one characteristic group per lab, but lately I have been moving away from characteristic groups. Instead, I created something like a characteristic group table in a separate database, and I import data through SIM using the characteristic-by-attributes method. I find it gives me more flexibility to handle special cases that may not exactly fit STORET's characteristic group templates. A-19 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A B. Setting up sample-type characteristic groups by lab organizes procedures logically. Characteristic groups in general are somewhat less flexible than storing characteristics by attribute through SIM. 5 A. We break them out by medium; biological, chemical and field parameters. B. This has worked well, and philosophically corresponds to our monitoring agenda. One initial problem, that presently needs to be fixed, is that at the time we developed our CGs only 99 characteristics were allowed in each group. Since our taxa list for invertebrates is over 400 taxa, we had to make 5 different biological CGs. Now that we can have more characteristics per group, I have to decide whether to retrofix the older data, or make a decision to choose a year to move forward with a single new CG to cover all bug data. 6A. We set up or characteristics on a combination of medium, activity type, and analysis groups. For example, we have one category for atmospheric field measurements (i.e., air temperature, cloud cover, etc.). We have a category for water field measurements (i.e., water temperature, pH, etc.). For water samples that we have analyzed in a lab, we have characteristic groups for nutrients, bacteria, pesticides, semi-volatiles, etc. B. [respondent left blank] 7A. Our characteristics are grouped by medium, then activity type. B. It is convenient for us to use characteristic groups for work where agency staff and set procedures ensure defaults like units of measure and methods will be consistent. We try to avoid the use of characteristic groups in receiving data from projects or data sources external to our agency, as we are concerned about having a number of defaults as "understood." A change in methodology or units cannot be easily accommodated in these cases. 8 A. Group characteristics, by collection, sample preservation/transport, or if they are field, or lab types. B. [respondent left blank] 9A. We have worked with an even mix of programs that have setup characteristic groups and those that just pass all of the characteristic information directly to STORET. When using characteristic groups and rows, the most important thing is to set them up in a way that can easily be mapped from the source data. You do not want to have to work backward and add the group and row to the source data set. B. [respondent left blank] 10A. Characteristic groups are set up with activity type as the dependency. All characteristics that are analyzed in the lab are in one group; all that are analyzed in the field are in another. A-20 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A B. Aside from not being able to default the characteristic group value when creating a batch file, nothing. 11 A. We do not set up characteristic groups. There are also physical limitations to the ACCESS data entry forms that predispose us to split the data into different data types. There is only so much information that can fit in a single data entry form. So our contaminant data while considered chemical data is on a different sheet. Special chemical parameters that are not sampled on a regular basis are on a different sheet. Even thought they are on separate sheets all water chemical samples will have the same ID with just a different letter at the end of the ID. In addition, our chemical data typically comes to us from the laboratory separate from the biological data and sometimes separate from all of the field data. So it also makes sense to separate the data types simply because we get the data separately anyway. We do not use characteristic groups, we use our own built sample collection procedures and all meta data is added to each activity ID via an ACCESS process. B. Advantages - Didn't have to set up Characteristic groups. More flexible. 12A. [respondent left blank] B. [respondent left blank] 13 A. We set up characteristic groups by laboratory (one for each lab) and by field data. B. The advantage to doing it this way is that when you enter data through SIM, all data from one lab can go in with one characteristic group. It makes data input go faster. Lab/Field Analytical Procedures and Equipment Questions: A. How do you set up lab/field analytical procedures and equipment? Do you rely on the "National Procedures" provided by STORET or do you create your own? B. What were the advantages and disadvantages to setting up lab/field analytical procedures and equipment groups this way? Answers: 1 A. National Procedures where apply. For gear and collection we designate our own. A-21 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A B. Analytical - very time consuming to search for and designate manually not being familiar with methodologies. Gear/Collection - designating our own allowed us to get more specific on our actual sample collection procedures. 2A. Our lab has given me a listing of all of the methods they use in analyzing the samples and I've in turn relied mostly on the National Procedures. However, DEP has several methods that they've developed or adopted that were not on these lists so I also created my own. B. I think the National Procedures are a good idea, but they need to be kept current with new releases of publications. The advantage to using them is that it took a lot less time to adopt and already existing procedure, but the disadvantage was that there were some methods missing. 3 A. Use national if they match, otherwise created my own (especially for USGS Denver lab methods). Set up both field and lab methods. B. Creating my own lets users know where we deviated from national procedures and add procedures not supplied by EPA-HQ. National save me time, especially with Citations. 4A. We started out by creating our own procedures, and we still do for our main lab. Increasingly we rely on the national procedures as we collect data from a myriad of data providers and labs. B. Using the national procedures is less hassle. Creating your own lets you reference lab-specific methods that may deviate from the national procedures. 5 A. Most of this was done in communication with our lab manager. We mainly used the National Procedures. B. It was very organized, and all the relevant material is in one place. 6A. We rely heavily on the national procedures. However, we will create the method only if the parameter does not have an established national method. B. Most parameters that we test for have established methods, so it is easiest to use the national procedures that STORET provides. 7A. Whenever possible we try to reference National Procedures in use. B. Tracking down the correct National Procedure in use can be time consuming and frustrating, especially if STORET makes a distinction between methods that does not seem to correspond to method numbers in a laboratory's method documentation. Nevertheless, we try to strive for a high level of data comparability, and approved national methods are an element of that. A-22 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A 8A. Mostly, we rely on National Procedures. A few we have created for visual inspection, in-stream flow measurement, etc, where we could not find any in National Procedures. B. [respondent left blank] 9A. Most programs have some field in their source data that can be mapped to the analytical procedure. We use the SIM translation feature to do this mapping. Some programs have utilized several of the National procedures, but just about everyone we have worked with has had to add some of their own as well. Very few programs have done much with Sample collection procedures or Equipment. In almost all cases, we have created a single sample collection procedure that does not require equipment and loaded all samples with this procedure. B. The big thing here is that we simply had to make due with the information available in the source data. By defaulting to a single Sample collection procedure that does not require equipment, we have solved the problem of not having the necessary data. In some cases, the program did provide multiple sample collection procedures and equipment, which is nice if you have the data. 10A. National procedures are used when possible, otherwise they are created as org specific metadata. B. None. 11 A. We have an Analytical Codes ACCESS table that contains information about a specific parameter, its STORET character, fraction analyzed, units, analytical procedure, and detection limits. The analytical procedures follow national procedures and use the same code numbers. All of this information is listed for each laboratory because different laboratories may analyze parameters differently and even have different detection limits for the same analytical procedure. We try to use only recognized methods. If we can't find a method that works we develop our own, enter it into STORET and then it gets added through our ACCESS automated process. B. Simple logic. 12A. Both. If our lab procedure deviates from the National Procedure in any way, it is noted by defining a new state defined analytical procedure. B. [respondent left blank] 13 A. We do it both ways. B. You can customize. A-23 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A Sample Collection Procedures Questions: A. How do you set up your sample collection procedures, i.e., sampling gear, gear configuration, sample preservation and storage? B. What were the advantages and disadvantages to setting up sample collection procedures this way? Answers: 1 A. Gear Config. - Just described the tools we used for collection and the method of sampling (Bottles/Bridge). Sample pres/storage - The bottle each sample was in (acidSOOml, icedlL, etc.). B. Advantage - Do not have to go back into STORET to look at our methods; it's ingrained in the data request. 2A. The sample collection procedures were set up based on what type of bottle is used to collect a certain type of sample. For example, CONT-5 is the bottle and method used to collect all metals, CONTlO-Phsphorus, Ammonia and TOC, etc. B. The advantages to this were that everything stayed separate based on the type of sample being collected. The disadvantage was that you needed to know what type of equipment was used when collecting the sample and that info might not be easily obtained. 3 A. Creating my own lets users know where we deviated from national procedures and add procedures not supplied by EPA-HQ. National save me time, especially with Citations. B. People basically want to know if it was a grab or a spatial composite, not as concerned about the exact technique. Since we use a certified lab, they're not as concerned about the preservation, transport and storage. 4A. We have five collection procedures that seem to cover all cases: composite (flow- weighted with auto-sampler), composite (multiple locations), grab, lake depth point, lake surface 2-meter integrated. We have six gear configurations. We set up one, catch-all, sample preservation and storage default which we plan to refine further later. B. It captures the level of detail we want without being too onerous. A-24 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A 5 A. We broke it out as being either biological sampling, or grab sampling, and the sampling procedures were described in the boxes provided. B. This works well for us, having a basic list. However, for when there are changes in our approach, we will need to be able to change this section easily. 6A. Sampling collection procedures are unique for each project and are based on the project's QAPP or standard operating procedures. B. [respondent left blank] 7 A. To speed start up, we chose to bypass some of the metadata that was not required here, such as detailed sample preservation descriptions. We did not associate gear with sample types (as for grab samples), and defaulted other gear types according to well-established sampling protocols. B. This was quicker, and placed less burden on monitors and data providers. Disadvantage is less descriptive metadata. 8A. Most of our samples are grab. Then the waters are put into different containersone for nutrients, another for total metals, dissolved metals, according to sample preservation/storage and transport. B. Need to have STORET to indicate date time that samples are delivered to labs. 9A. See above. B. [respondent left blank] 10A. Usually there is just a simple sample collection procedure with no associated gear. The dependency is activity type. B. Don't know yet. 11 A. We use an ACCESS table containing media, projectID, waterbody type, activity class, sample collection procedure, configuration name, and gear ID. Sample preservation and storage is not addressed, other than it is implied through the AnalyticalCodes table and the analytical procedures code number. B. [respondent left blank] 12A. [respondent left blank] B. [respondent left blank] 13A. Very general. B. [respondent left blank] A-25 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A One Final Question... Questions: Are there any other aspects of Modernized STORET that you set up one way originally but now wish you had done it differently or could start over? Answers: I. Yes and no. I have started out setting up my data entry in a SIM ready format which conflicts with data analysis format so I have to crosstab in access for analysis. What I've done lately is wait until all my data is in STORET so I can pull it out in a single format then get it ready for analysis. Its makes entry EASY but it compounds analysis... quite a little paradox. 2. Yes. Originally, I couldn't figure out the Activity ID aspect of things. It had to be a unique number for every sample. I talked with several different folks who assigned a random number for the activity ID. This is what I did also initially. I didn't like the fact that you could not determine which activities were what when downloading information from the internet so I actually did start over and now give my samples an activity ID based off of the parameter code and a random number. 3. Don't get me started. I would have dumped the migrated stations from LDC and imported new stations off of GIS coverages since I later found the coordinates were off. We did the former before SIM, when it took forever to create new stations. I would have created one Characteristic Group per lab and one for Field Measurements/Observations, one for fish and one for bugs "macroinvertebrates." I would have been consistent with Projects (match QA Plan) and Trips. I would have used STORET parameter codes for Row IDs since other systems (PCS, USGS) still use them. I would not have gotten so hung up about filling in every field I could on stations. I would have "broken" ties with LDC and established multiple Orgs, one for each physical Bureau in NJDEP, so they could administer their own data and I wouldn't have to look at their stations or Characteristic Groups when searching for information (and get faster response times). 4. Only as noted above, under advantages and disadvantages. Just as important as defining these concepts and IDs was to develop our "pre-STORET" Access database to create SIM files based on them. 5. I wish I could start over with the establishment of the Char. Groups for the Benthic Macroinvertebrates, now that the ITIS list has grown, and the C. Group can take a larger number of characteristics. But looking back, I would not have done anything differently, since I worked with what I had at the time. I wish we had utilized personnel names more frequently in our spreadsheet days. I have gotten confused with some of our data that was designated non-detect while other data was labeled present below quantification limit; I wish A-26 ------- Tips and Tricks for Setting up STORET Data Survey Appendix A I had labeled it all non-detect, since now I have to verify which is which, and confront what this difference truly means for when we serve the data. 6. [respondent left blank] 7. Not really. We are doing the best we can to implement the system with the resources and staff we have. As time goes on we hope to improve not only our business process, but also enhance the metadata we are providing with the data, and more fully utilize the potential of the STORET system. The "trade-offs" we have made to simplify use are seen as necessary to get us going on the system. 8. [respondent left blank] 9. Each migration gets a bit easier. We have loaded data many different ways depending on the needs of the program and the quality of their data. My big advice would be that you need to approach each project based on its merits, and not try to force a one-size-fits-all approach. 10.1 would not necessarily use characteristic groups because querying data from STORET becomes more complicated and there are errors in the database that are introduced by the use of characteristic groups on occasion. 11. [respondent left blank] 12. None as of yet. There may be things discovered in the future, as we get into this a little more. 13. No. A-27 ------- |