Investigation of Current SLT QA/QC Practices for Facility Inventories 1 Introduction The goal of this research project is to understand the types of Quality Assurance (QA)/Quality Control (QC) procedures State, Local, Tribal Authorities (SLTs) implement when collecting facility inventory (Fl) data to meet their air emissions program requirements, to ensure that the Combined Air Emissions Reporting System (CAERS) contains a complete set of QA/QC procedures. For compliance with the Air Emissions Reporting Rule (AERRj towards the creation of the National Emissions Inventory (NED. SLTs must report emissions from their facilities (point sources) to the US EPA through the Emissions Inventory System {EIS). This system applies QA/QC checks to incoming data. While EIS QA checks and procedures are thorough and comprehensive to maintain NEI data integrity and validity, in addition, SLTs set their own specific QA/QC procedures to guarantee the data quality for their emissions inventories, beyond EIS QA procedures. Those SLT-specific QA procedures are based on SLT rules and requirements for data fields included in NEI and for SLT-specific data fields not required for NEI by the federal program. Like EIS QA checks, SLT- specific QA checks will notify users of errors that will not allow them to save the data, and warnings that allow them to continue working but indicate there is a potential issue. The SLT emissions inventory reporting systems may also restrict user access to certain data fields to prevent reporting errors. By understanding the QA/QC procedures that SLTs apply beyond those required by the NEI, CAERS can be enhanced to include the same SLT-specific procedures or procedures that accomplish the same outcome in terms of data quality. Because CAERS must have a standardized set of QA/QC procedures that meet EPA and SLT requirements, this report provides guidelines on types of SLT-specific QA checks that CAERS should adopt and suggests that at least the most prevalent checks be incorporated into CAERS for SLTs. The inclusion of the most complete set of QA checks in CAERS will ensure that these are performed as early in the reporting process as possible: at the point where the facility is reporting the data, instead of once the data is being sent to EIS months later. When CAERS assists the facility in reducing or eliminating reporting errors early in the process, SLT and EPA staff can spend less time performing QA/QC procedures themselves, finding and correcting errors months after industry has sent in their reports, and sometimes after the data has already been submitted to EPA, and potentially sending reports back to industry for rework. Instead, both EPA and SLT staff can repurpose their time on performing more advanced QA and analysis of the data, and industry reporting time becomes more productive. The CAER Product Design Team (PDT) has started conducting research on specific State, Local, and Tribal (SLT) authority requirements on facility inventory (Fl) data. The Facility Research and Development (R&D) Team conducted a survey on Fl creation and maintenance practices that SLTs follow for air emissions inventory reporting. Results and analysis of survey data is being presented in multiple parts. 1 ------- In the first part, the team analyzed and reported on data sources and data flows that SLTs use for obtaining and updating Fl information (facility site, emission units, site controls, site control paths, processes, release points, and release point apportionment) for their respective emission inventories. Recommendations made for CAERS were documented in the report titled ""Investigation of current State, Local, and Tribal (SLT) data for facility inventories". Appendices A through D contain detailed information about the survey and response data. This report represents the second part of the Facility R&D team's work. Here, the team has focused on QA/QC checks that SLTs need to perform on Fl data for the purposes of ensuring CAERS has a complete set of SLT-specific QA checks, in addition to those required by NEI. 2 Background and Previous Work Previous work has been conducted on QA/QC procedures by the PDT and for CAERS as follows: • Starting January 2017, the CAER PDT conducted a study on QA/QC procedures for emissions inventory reporting. The conclusions of that study were documented in the final report on CAER QA/QC that can be found on the CAERS PDT website. One aspect that study focused on was which QA/QC checks SLTs performed that were automated given the value of having automated QA checks in reporting systems. That project provided a suggested list of QA/QC checks for CAERS, as well as the possibility of sharing these checks with SLTs for use with their own systems via a shared service. • The CAERS Minimum Viable Product (MVP), or first version of CAERS was released in 2020 and contained many of the suggested SLT QA checks that were added to the list from that study, as well as all EIS point source QA checks that were feasible to add, meaning, those that do not require a "call and response" between CAERS and EIS, a workflow that has not been built out yet but is planned for future releases. Georgia Department of Natural Resources (GADNR) staff assisted by providing feedback on necessary QA checks for their industry, as GA piloted the MVP with EPA. Since release of the CAERS MVP, through the Agile process, QA checks have been added to CAERS in response to feedback from industry and SLT users, as well as reporting errors that surfaced during reporting that can be prevented. In addition, every year when a new version of CAERS is released, it contains any new QA checks that EIS may have added where feasible to implement. See Appendix E for a detailed description of the EIS data submission process and a detailed analysis of EIS QA checks in Appendix F. • CAERS currently applies Fl QA/QC checks as follows (see Appendix G for details): o point source checks applied by EIS on incoming SLT reported data, so long as the check does not require a "call and response" between CAERS and EIS, a functionality that is still a work in progress, o additional QA checks requested by SLTs who participated in the PDT QA/QC R&D Team described above where feasible, o additional checks requested by current CAERS SLT users, including custom checks requested by some SLTs but not desired by others, and o QA checks that have been developed as use of CAERS has revealed their need. 2 ------- 3 Method In April 2022, the facility inventory team conducted a questionnaire survey among SLTs (see details in the document and appendices of the "Investigation of current State, Local, and Tribal (SLT) data for facility inventories" report). The survey contained only two questions related to QA/QC procedures. SLTs were asked: • if there are any SLT-specific QA/QC procedures or restrictions (in addition to EIS QA/QC checks) for facility inventory components, • if SLTs have encountered issues and problems with their data in preparing facility inventories. If the respondents answered yes to one of these questions, they were asked to explain further. The team collected a suite of SLT-specific QA procedures through the survey and communications with SLT El staff. Using the information from the responses, the team's tasks included: 1. analyzing the QA/QC procedures and practices that SLTs use for facility inventory data, 2. comparing the current types of quality checks that SLTs apply with those already in CAERS, so that CAERS may adopt any additional SLT checks that could improve the accuracy and quality of the data further. 4 Analysis of Current SLT QA/QC Procedures 4.1 SLT QA/QC procedures beyond EIS from Survey Responses Fifty-four jurisdictions responded to the survey (Appendices A-D). Thirty-four SLTs among the 54 SLTs that responded to the questionnaire indicated they conduct SLT-specific QA/QC procedures or apply data restrictions in addition to QA checks applied by EIS. Figure 1 shows the number and type of jurisdictions that have additional QA/QC procedures to those applied by EIS: 26 of 37 (70%) of State respondents, 7 of 16 (44%) local respondents, and the single tribal authority respondent all apply QA checks in addition to EIS. Since the questionnaire was not explicitly designed to ask detailed questions about SLT QA/QC practices, the results provided only general information from SLTs who provided additional explanation about their procedures. The following information was gathered from their comments in the survey, as well as follow up conversations, about the types of procedures SLTs utilize to QA their data. CAERS should be able to assist SLTs in applying customized QA procedures so that when an SLT uses CAERS, it does not lose these capabilities: 3 ------- Figure 1. Number and Type of Jurisdictions with QA/QC Procedures beyond EIS Checks 1. Restrict edits for certain data elements to prevent erroneous reporting from facilities. a. Do not allow facilities to change their facility site IDs and emission unit IDs to maintain the integrity between the El data and permitting data and the consistence of IDs across years. For example, Idaho does not allow facilities to enter any unit agency (IDDEQ) ID. The system notifies El staff every time a unit has been added. At that point, staff assign an IDDEQ ID to the new emissions unit. Once the ID is assigned, the state never changes it. However, ID has a field for the facility to add their own IDs, and these are allowed to change. While many SLTs that have QA/QC procedures beyond EIS procedures have this restriction, there are different restrictions for other data elements related to identification of data components. Currently, CAERS does not assign IDs automatically. However, facility reporters are not allowed to change agency IDs. This is because if they were to do so, EIS would receive these units and consider them new, because they wouldn't have an agency ID that EIS recognizes. The result would be duplicate units in that facility's inventory. However, CAERS does alert the SLT that a new unit has been created, by issuing the facility a warning that requests that the user check with their SLT as to if/what naming convention the SLT requires. The warning also alerts the SLT that a new unit has been created. b. Do not allow facilities to change their process IDs, control IDs, and release point IDs. 4 ------- • For example, Massachusetts and Illinois uses this restriction. • In Michigan, facilities can update the names for these data elements but not the IDs. • Wyoming's system generates noneditable IDs for all elements of the facility tree (facility, emission units, processes, controls, etc.), while a separate data field also exists for the operator to enter their internal, company IDs for each element. • Montana does not allow facility to update emission unit and process IDs. As with units, CAERS currently can issue QA warnings if a new component is created. It does not allow reporters to delete a previously reported component To remove it, it must be marked shut down. c. Do not allow facilities to change all or part of their facility-level information. • For example, OK, Forsyth County in North Carolina, Missouri, and Rhode Island do not allow facilities to change their facility name/address. • South Carolina does do not allow facilities to make facility-level (i.e., name, address, EIS category, location, contacts, etc.,) changes in the reporting. Currently, CAERS does not allow reporters to alter the facility level information. The SLT is enabled to edit most of those fields except those that EIS won't allow changes to (for example, the facility coordinates). Also, CAERS can be customized further, for example, GADNR does not allow industry to edit the facility NAICS. d. Do not allow facilities to change any process-level information, excluding those in the EIS Point submittal, within El systems. In Maine, a facility needs to call El staff for changes because these details are in the facility license issue and changes trigger a licensing update. e. Do not allow facilities to change process SCCs. Delaware, NC, and Texas are examples of this. For these two items, CAERS would be customized further to restrict specific process-level information, and/or SLTs can allow the facilities to edit these but get warnings so they may ensure that industry has made any edits correctly. 2. Set allowances or conventions for editable data elements, particularly, for entering a new component. • For example, in NC, facilities can enter a new emission unit or control device, but the El system adds a U- to the IDs of both to flag them as new and not in their air permit. • Minnesota's on-line emission reporting system allows facilities to add new emission units, processes, controls, and release points, and auto creates IDs in the sequences of the existing once. • In ID, process IDs are automated and are numbered 1, 2, 3, etc. • The restrictions or conventions in number 1 and 2 are controlled and enforced by the SLT emissions reporting systems. For example, Kansas uses a customized version of Windsor's State and Local Emissions Information System (SLEIS) that restricts what information 5 ------- facilities can change. Other SLTs using SLEIS (such as DE, SC, and Hawaii) or their own El reporting systems (such as MN and WY) have similar approaches to restrict editing data elements although the restricted data elements may vary from SLT to SLT. • If facilities want to make changes for the restricted data elements, they must get approval from the El staff or through permitting amendments/revisions. More detailed or customized restrictions or QA checks of this nature would have to be built out in the SLT CAERS module. 3. Require additional data elements to make sure information for critical data elements is properly reported. For example, Kansas allows facilities to change operation status (such as shutdown of a process), but changes must be quantified with dates to be reported properly. CAERS will be able to allow SLT-specific data fields and/or codes in future as part of the SLT's module. 4. El staff have full control of the El data submitted to NEI. • For example, in Southwest Clean Air Agency, Washington, all data submitted by facilities are on forms provided by the agency, and all data is independently verified by the agency staff. Facilities are not allowed to alter the forms or the provided information (e.g., stack info, EU info, etc.). • In Jefferson County, Alabama, point sources (major sources) are required to submit/update all emission units/stack info that generate emissions to our Permitting Section during permit applications/renewals to operate within the county. The Permitting section verifies and documents all information reported. The El group obtained this info by checking the Permitting database/spreadsheet for El purposes. For data tied into an SLT's permitting data base, see the facility inventory workflows that were discussed in the previous report by this team. See "Investigation of current State, Local, and Tribal (SLT) data for facility inventories" report. For the forms that the SLT might require, CAERS can intake the data that the SLT does allow the facility to edit, and can prevent edits of other data, per the required customization. CAERS may also generate reports in an SLT-required format if needed. 5. Analyze and run QA checks on data before the El cycle begins and after El reporting is complete. • For example, OK does an analysis to determine which facilities need to report to the state for the upcoming cycle, prior to collecting inventories from their permitted facilities. Many QA checks are run during this analysis. It also runs in-house created queries on SLEIS data to identify other errors after the reporting season is complete, such as for operating status. • Wisconsin runs SQL queries to QA the data and has several QA checks built into a QA report. CAERS uploads a previous year report into the new report as a starting point for the reporter to begin. At that point, any new QA checks that may be available in CAERS can be run so the reporter knows of any errors they may have. In addition, CAERS does not allow reporters to submit their reports unless they have passed all critical errors. All warnings can be observed by the SLT reviewer as well, so that if there 6 ------- is an issue, the reviewer can send the report back to the facility before sending the data to EIS. SLT- specific QA checks are included in the global checks both before and after reporting. Also, in CAERS an opt in/opt out questionnaire was created as a customization for the state ofGA. This means facilities can determine if they will be reporting for a specific inventory year, and GADNR requires them to attach an analysis to demonstrate that, if they have opted out, that this is appropriate for the facility. Such a process could be further customized for new SLTs desiring a similar approach. 6. Conduct special QA/QC activities for geographic data. In Illinois, records group staff conduct QA/QC for uniqueness at the address level. In addition, GIS staff are used to locate facility coordinates and check for potential duplicate entities. While CAERS does not have a "call and answer" capability with EIS at this time, to verify duplicates, a future functionality would allow this type of QA check to be applied before the report is certified and submitted. At this time, however, latitudes and longitudes of facilities are locked so that an SLT must request an unlock to change these. And the SLT can set the facility latitudes/longitudes for a new facility when it is entered in CAERS. 7. Build conditional QA checks in the El system. • In Kentucky, all processes must have a numeric value reported for annual throughputs (cannot be blank). CAERS could issue a QA check for an SLT that does not want blank values. 4.1.1 QA/QC from PDT Call Discussions Connecticut has many and complex QA/QC that they didn't specify in the survey, and other SLTs might also have additional QA checks not captured by the survey. Therefore, several conversations were held to discuss SLT QA procedures further with PDT members. These PDT call discussions provided additional QA measures as well as more details on the SLT QA procedures identified in the survey. This section presents the information collected from discussion with PDT members. Besides built-in EIS QA checks in the SLT emission reporting systems, there are also other techniques used in QA procedures. The analysis here focuses more on additional QA measures and techniques. 1. Calculate release point parameters by SLTs. In CERS V2.0, the units of measures for release point operation parameters must be specific Imperial units. For example, release point stack height and diameter must be in feet, exit gas temperatures must be in Fahrenheit, exit gas velocity must be in feet per minute or feet per second, and exit gas flow rate must be in actual cubic feet per minute or actual cubic feet per second. Facilities might have trouble with unit conversions. ID calculates the values for facilities. ID also performs the calculation of diameters for non-circle stacks. In future, additional conversions could be added so that more conversions are possible in CAERS. Also, SLTS may indicate that they would like to verify release point parameters and then have them locked so reporters may not edit them. 7 ------- 2. Special efforts on geographic coordinates. Geographic coordinates use an intersection of two lines of latitude and longitude to determine the geographical point of a facility site or a release point, for example, such as a latitude of 46.992611 and a longitude of -93.604936. Accurate identification of geographic coordinates is critical for using emission data in risk assessment and air quality planning. SLTs take different approaches to QA/QC these data. Examples are shown below: • TX locks out facilities off editing geographic coordinates. If facilities want to change coordinates, they must map the coordinates and make the coordinates more accurate. Therefore, facilities spend a lot of time correcting geographic coordinates. If facilities use the bulk upload, they will get an undone message for the coordinates, and El staff will have a chance to look at the coordinates. • NC sets reference points for geographic coordinates for the entrance point (front door) of a facility. Front door is the main office building that could be a substantial distance from the street address, for example, a facility with a long entrance drive. NC does not use street addresses to determine geographic coordinates. In NC, 3 facilities have the same street address but separate office buildings. As described above for other data, customizations would allow more data fields (coordinates for different location points of the facility) and these could be non-editable by the reporter as needed by the SLT. 3. Check the consistency between facility inventory in SLT El system with data in the permitting system. For example, if a permit lists controls, then ID El staff check to ensure they are listed along with emission units, and everything has been submitted to the emission inventory. In CAERS SLTs may enter data for a new facility, and thus, enter the data as it is shown in the permit. As described above, facility reporters could be prevented from editing certain data fields, or QA checks could be issued, per the SLT's preference, so the SLT may verify that any edits by the facility reporter are correct and align with the permit. 4. SLT El systems auto fill missing information. • The MN's emission inventory rule requires facilities that have certain types of state registration permits report facilities total emissions. Permitting data in the master database contain only facility-level information, no sub-facility-level information, such as emission units, processes, and release points. The state El system collects SCC-level emissions through online reporting from facilities and auto generates the sub-facility-level information based on SCCs. • NC and ID systems automatically assign a new process with an ID, where the ID number is assigned sequentially to each additional unit/process. While adding a capability that automatically assigns a new ID for a facility component, following an SLT naming convention, could be part of an SLTs module in future. Another option is to have QA checks that 8 ------- verify that the naming convention has been followed, or warnings that allow the SLT to see if a new ID provided by industry has been assigned correctly. 5. Restrict facility's ability to delete certain facility inventory records. ID does not let facilities delete controls, release points, emission units, or processes, unless facilities mark them as permanently shut down or a newly added emission unit. CAERS has the same process as ID and does not allow facilities to simply delete previously reported sub- facility components. These must be either temporarily or permanently shut down. 6. Use dropdown lists to enforce valid and acceptable reporting information. Many SLTs have their own codes for facilities to report dates in the El systems. MN provides dropdown code lists for SCCs, pollutants, site control types in the El reporting system. CAERS has the same process where a code cannot be entered directly but must be selected from a drop down both in the user interface and the bulk upload template. If the SLT were to enter an incorrect code in the bulk upload template, CAERS will not allow that data to be uploaded. CAERS will also not allow an outdated code (such as a retired SCC, for example) to be used and will force the user to choose a valid code. 7. SLT El systems perform QA check with start and end dates to ensure the validity of components. Oklahoma is using SLEIS to program the QA checks with start and end dates. When a component comes into play, e.g., a new emission unit, a start date must be entered for it. If retiring a component, an end date must be entered, and people cannot use these units past their end dates. This functionality is not available in CAERS yet, but SLTs could add their own start and end dates as SLT- required data fields associated with each sub-facility component if they wish to do so in their module. 8. Get notifications when facilities alter the existing facility inventory. ID El staff receive notifications when there are new units entered, so El staff can be sure to review what the facility did before the facility submits. CAERS already has this functionality and is applying it for reports from ID and ME. 9. Send an announcement for starting the current emission inventory reporting with notes for common issues observed in the last emission inventory. For example, facilities may use the 'End Date' for a process as the end date for the emission inventory. It should only be entered if no emissions are being reported for the process and the process will no longer be used. MN sent this note with the announcement of collecting the 2021 emission inventory. CAERS has the capability to allow SLTs to send email notifications to industry reporters through CAERS itself. As reporting progresses throughout the year, QA checks can be added to prevent prevalent and previously un-anticipated errors from propagating further. Finally, annual trainings address aspects of CAERS that may have been confusing or where errors in reporting were observed to prevent them from happening in future. 9 ------- 10. Contact facilities when problems are observed and cannot be explained by the existing in- house information. For example, in MN, information for site controls is from the state master database. However, some information could be incorrect, such as the end operation date. To confirm the correct information, El staff need to contact facilities with the operation status to make sure the control efficiencies from the site control could be included in emission calculation. SLTs using CAERS are still able to contact individual reporters to clarify questions. 4.2 Discussion and Recommendations In the previous section, CAERS capabilities that were available at the time of this study were mentioned. In this section, we summarize functionalities that would still be desirable to have in CAERS in its future functionality. 4.2.1 QA Checks from SLT Survey 1. Do not allow facilities to change process SCCs While it is currently possible to edit an SCC in a process in CAERS, this feature is not desired by many and thus, the CAERS team should explore preventing this ability for SLTs that want to avoid SCC changes, and/or issuing warnings when an industry reporter has modified an SCC to ensure the SLT is made aware of and agrees with that change. 2. Allow reporters to change their own facility component IDs without altering the agency ID's that the SLT and EIS recognize. In future, and given feedback provided by current SLT reporters as well as previous and current PDT SLT members, the CAER team would like to explore the ID option to allow for the facility to relabel their components without this action affecting the Agency and associated EIS IDs. This would give the facility reporter flexibility while allowing the SLT to maintain certain control over the desired Agency IDs. 3. Set allowances or conventions for editable data elements, particularly, for entering a new component. Currently, CAERS does not automatically create an ID for a specific SLT. However, given that the SLT may see warnings if a new component has been created, the SLT may require the facility to enter the data using their naming convention, and return the report if the convention has been violated. In following EIS convention, CAERS will also not allow a sub-facility component ID to be re-used if one already exists. For example, a unit ID may not be used more than once. So for example, a new Unit that is accidentally labeled with an ID already in use by another Unit, will generate a critical error. In future, the CAERS team would like to explore customization of ID's for SLTs. While it may prove difficult to incorporate specific functionality to create specific types of ID's automatically for each SLT, it may be possible to issue SLT-specific QA checks that, for example: alert the user that a specific 10 ------- number has already been used by comparing the Id's for specific components with those marked PS in previous year reports, or that an expected prefix or suffix has not been added, may be the most quick and easy way to incorporate such customizations. 4. Require additional data elements to make sure information for critical data elements is properly reported. In future, CAERS will allow SLTs to collect additional data they would like to enter for facilities, or that they require from facilities for their specific programs. 5. El staff have a full control of the El data and facilities cannot access -the El system. As mentioned above, while CAERS already restricts changes in facility level data, the previous document from this team describes more sophisticated workflows of facility inventory data that would have to be built to allow El staff more control if desired. 6. Conduct special QA/QC activities for geographic data. Future functionality where a "call and answer" workflow is possible between CAERS and EIS is desired so items such as potential duplicates, previously used ID's, for example, can be checked before the reporter submits their report, so that the report does not contain errors that could potentially trigger errors from EIS. SLT-specific geographic information and QA checks would have to be added as functionality in CAERS as well. 7. Build conditional QA checks in the El system. As described above, customized QA checks are possible in CAERS. Conditional QA checks can be applied in CAERS, as can be stricter QA checks than for the federal program, so long as they are not in contravention of federal program requirements, and so long as the SLT has the legal authority to impose any additional restrictions or require additional information from industry. 8. Calculate release point parameters by SLTs. CAERS allows for conversions in the system but does not include all UOMs. However, soon it will be possible for SLTs to retrieve their data automatically at which point necessary conversions can be performed. In future, CAERS could also issue certain reports to SLTs in specific UOMs for SLTs who prefer metric system UOMs. SLTs may customize CAERS and add data fields or codes to CAERS for their own uses. For example, in future, they may be able to collect additional data on release points as needed. 4.2.2 QA/QC from PDT Call Discussions 1. Allow SLTs to calculate release point parameters. An SLT may indicate they would like to verify and then lock the release point parameters. This customization could be built in CAERS. 11 ------- 2. Check the consistency between facility inventory in SLT El system with data in permitting system. In future, CAERS will allow SLTs to include the permit numbers and types for their facilities so that they may easily reference the permits, as needed. And for more sophisticated workflows between SLT Fl and CAERS, see this team's previous report as described in the Background and Previous Work section of this document. 3. SLT El systems auto fill missing information. An SLT wishing additional auto-calculated data from CAERS may not necessarily need to generate it in CAERS but may generate the needed information when CAERS pulls data back into its own system. For example, if certain facilities are not going to report emissions inventories, the SLT may be able to generate the sub-facility data in their own system by SCC. Conversely, the SLT may be able to autogenerate the relevant information at the SCC level for facilities that only report totals in an SLT system, and have that data sent to CAERS from where it can be sent to EIS with its other information. 4. SLT El systems perform QA checks with start and end dates to make sure the validity of components. In future CAERS would need to track start and end dates for QA checks. More discussion is needed as to which QA checks should be versioned and which checks should be applied retroactively to previous year reports even if the QA check is new. 5 Future Work on Facility Inventory Beyond the survey questions covered in this part of the research, the team identified future work that is needed to allow SLTs to have their facility inventories in CAERS. Such research should include: 1. Handling emissions for a parent facility with multiple child sites. • For example, a nonmetallic facility in MN has only one facility ID, but multiple operation sites. The MN emission inventory rule only requires those facilities to report total emission and pay emission fees based on the facility ID, not on the individual operation sites. MN cannot handle the situation in the current El system, therefore, takes hard copies from those facilities and manually enters total emissions for the facility IDs to the El system. • Alaska has the similar situation and uses the same approach as MN for nonmetallic facilities. • Portable facilities also present challenges, such as asphalt plants/rock crushers that can move all over the state, cannot be assigned a specific borough/census area. 2. Handing one facility where different parts of the facility are owned by different companies. In WY and CT, one location has two identically named facilities. For example, in oil and gas facilities in WY, the well is owned by one company but the natural gas dehydration unit(s) at the well site are owned and operated by another company. The dehydration facility owner gives it the same name as the well site facility name. 12 ------- Appendix A. Original Letter and Questionnaire See Appendix A. Original Letter and Questionnaire.docx Appendix B. EIS facility staging table requirements See Appendix B. Facility Staging Requirements.xlsx Appendix C. Responses to the Questionnaire See Appendix C. All Original Responses.pdf Appendix D. Data analysis See Appendix D. SLT Facility Inventory Research.xlsx 13 ------- Appendix E. EIS Data Submission EIS Data Submission Data submitted by SLTs for NEI reports via EIS must undergo QA checks. The EIS Quality Assurance (QA) Environment is used as a preliminary quality assurance step prior to making an official submission to the Production Environment. Users are encouraged to use the QA environment as frequently as necessary to help ensure that the production submission is of the highest quality. The QA Environment will apply checks to submitted data that ensure file integrity for submission purposes and will apply checks that may reference data stored in the EIS Production Environment. Most importantly, this is the QA stage that will give users advance notice that certain data will be rejected if they are submitted to the Production Environment. EIS issues: • critical errors that must be corrected for the report to be accepted, • warnings, which alert the SLT that, while technically correct, the submission may still contain issues that the SLT may want to review. Any errors in the data will be noted in feedback reports that will provide users with a listing of errors that need to be corrected (for example, missing data, inconsistencies in data, invalid files), and indicate how the submitted data would be integrated into the EIS Facility Inventory. After correcting all errors in data, users can make the official submission to the EIS Production Environment. In the EIS Production Environment, as part of the submission, the same checks as those used in the QA Environment are run during the batch submission process. The results of these checks are logged in EIS. Users again receive a feedback report that indicates critical errors and potential issues upon submitting their XML file to the Production environment. Users must correct the problems with data content, or the XML document structure listed in the report and resubmit the file to ensure all data submitted is checked. In the Production Environment, when users make additions, deletions, or edits on a limited data set, QA checks will be run only on the data associated with or related to the data which have been changed or added. Users will immediately see the impact that minor additions may have. In addition, EIS may prevent certain data from coming in entirely. For example, the geographic coordinate information for facility sites is protected and have been verified using Google Earth for accuracy, so that information cannot be overwritten or edited with an EIS submission. For data of that kind, the SLT must reach out to EPA and EPA may then unprotect the data to allow for the submission to overwrite it. When users make single record additions or edits to the EIS Facility Inventory data on the EIS screen, EIS will run checks only associated with the single record data that were changed or added by the online transaction. Besides schema validation checks and file validation checks, the EIS Gateway will also prevent the following cardinality errors: • Duplication of XML Elements within a Complex Type • Duplication of Complex Data Types 14 ------- • Duplication of Major Data Blocks QA checks in EIS As of 11/02/2022 EIS performs 896 QA checks for data content and format during SLT data submissions (see Appendix A). These checks are automatically performed when data are uploaded to the EIS by SLT submittals as well as by EPA loads. There are two levels of checks: critical and warning, for a variety of checking types. Table E 1 shows the number of EIS QA checks for each check type under each check level. Table E1. Statistics of EIS QA Checks Check Type Critical Warning Grand Total Calculation 5 2 7 Cardinality 18 6 24 Code 92 6 98 Comparison 9 2 11 Conditional 164 22 186 Duplication 3 92 95 Format 150 45 195 Present 140 8 148 Range 124 8 132 Grand Total 705 191 896 The Facility Inventory in EIS is the permanent, continually maintained inventory of large stationary sources and voluntarily reported smaller sources, which serves as the basis for all point emissions reported to the EIS. It contains information about facility sites and their physical location, emissions units, emissions processes, release points, controls, control paths, and regulations. While many QA checks are run to verify the facility inventory data quality, about 416 out of 896 QA checks are for point source emissions and sources other than point sources (nonpoint sources, mobile sources, and events). An analysis based on best judgement shows that there are 480 QA checks applied to facility inventory data. These are listed in Table E 2 and broken down by the simplified data components used in this study for the investigation of current State, Local, and Tribal (SLT) data for facility inventories. The 34 QA checks on the first row are for the document header table in the facility inventory data submission or for all data components, such as, for example, check 620 (FIPS County ID Code must match value in code list), check 621 (FIPS State ID Code must match value in code list), and check 874 (Program System Code must match value in list of registered codes). 15 ------- Table E 2. Number of EIS QA Checks for Each Sub-facility Component Component Number of QA Checks Included QA Checks Applicable to Other Component Other Applicable Component All and Document Header Staging Table 34 Facility Site 116 Emission Unit 47 Site Control 59 1 Control Path Control Path 73 Process 43 8 Emission Unit Release Point 85 Release Point Apportionment 23 Not for Facility Inventory 416 While most QA checks are unique to one data component, certain QA checks are common for a couple of data components. For example, 8 QA checks for the regulation staging table are applicable for both the emission unit data components and the process data components because regulations could be at either emission-unit level or process level. Check 838 (PM percent control measure reduction efficiency dependency), requires that PM2.5 percent control measure reduction efficiency cannot be larger than PM10 percent control measure reduction efficiency in either the site control data component or the site control path data component. In those cases, the common QA checks are only included in one data component but with additional information in the third and fourth column of Table 2 to avoid double counting the number of QA checks. After Data are Loaded to EIS EPA staff and contractors take additional steps to review the data used in the NEI. The QA/QC work performed after date loaded to NEI is basically for point source emissions that are emissions at facilities with specific latitude/longitude locations. The NEI is a composite of SLT- submitted data and EPA-generated data to use when SLT data are not available, mainly for hazardous air pollutants (HAPs). The reason is HAP emissions are voluntarily reported in many SLTs. Therefore, some states and pollutants are not reported to EIS. On the other hand, NEI data are largely compiled from data submitted by SLTs. Prior to release, EPA staff generate maps for review and run many comparisons of the data to other data, such as compare inventories across versions and years, calculate and compare VOC versus summed VOC-HAP totals by county and SCC and identify outliers. EPA provides feedback to SLTs during the compilation of the data on critical issues (such as potential outliers, missing data), and requesting assistance in reviewing and editing as needed. EPA also builds on-line Data Completeness Report that shows number of point sources required to report and the number of facilities reported, a percentage completion, a percentage completion metric for the 16 ------- expected HAPs, and an indicator for which facilities have "outliers" (either high or low values or missing altogether). In the development of EPA's environmental justice mapping and screening tool, EPA asks SLTs to perform the point source review that serves as an opportunity for additional review of hazardous air pollutant (HAP) emissions in conjunction with draft modeled risk results from initial modeling of a draft inventory and identify corrections that would better estimate risks at communities near these facilities when they are modeled for AirToxScreen. Although the above QA/QC activities focus on emissions, issues related to facility inventories could be identified, such as missing facilities in reporting, mistakes in release point parameters and apportionments, site controls, and site control paths. SLTs may correct issues and problems identified by all those EPA's QA/QC activities for their respective SLT facility inventories. 6 Appendix F. EIS QA Checks and Analysis See Appendix F. EIS QA Checks and Analysis.xlsx. 7 Appendix G. CAERS OA Checks See Appendix G. CAERS QA Checks.xlsx 17 ------- |