Monitoring Insights Substitute data in EPA CAMD's Power Sector Emissions Data February 16th, 2021 Substitute data represents a fraction of a percent of total emissions data. Under the Clean Air Act, most fossil fuel-fired power plants must continuously monitor and report their emissions of carbon dioxide (C02), nitrogen oxides (N0X), and sulfur dioxide (S02) to EPA.1 Emissions are monitored by continuous emission monitoring systems (CEMS) or equivalent that power plants install and maintain. The power plants collect and report the hourly emissions data to EPA every calendar quarter and EPA publishes those data on its website. While the CEMS measure continuously, there may be operating hours when a CEMS does not provide valid data due critical system malfunctions, missed or failed quality assurance tests, routine maintenance, or other problems. When data is missing or invalid, EPA's regulation2 specifies how to estimate and substitute the data. The longer or more frequent the missing or invalid data, the more conservative (i.e., likely to overestimate emissions) the substitute data algorithm becomes. Because the emission data are used to assess compliance with several cap-and-trade programs, affected power plants do not want to overreport emissions. Therefore, they have an incentive to minimize the amount of missing or invalid emission data. 1 Referto 40 CFR part 75—Continuous Emission Monitoringfor more information about the monitoring and reporting requirements 2 Refer to 40 CFR part 75 Subpart D. SS75.30-37 for more information about the missing and substitute data provisions oEPA United States Environmental Protection Agency ------- Monitoring Insights Substitute data are a minor portion of total emission data Measured and substituted data- all parameters | Measured Substituted 1 square = 0.1% As shown in the figure above, in years 2015-2019, 0.6% of operating hours reported to EPA had missing or invalid data for the following parameters. • C02 concentration • NOx concentration • 02 concentration • S02 concentration • stack gas flow 99.4% of data for all parameters was measured (2015-2019) |—|^ jy United States Environmental Protection m m Agency 2 ------- Monitoring Insights Substitute data are a minor portion of total emission data Percent of substitute data by parameter Parameter Substituted C02 concentration 0.8% N0X concentration 0.4% 02 concentration 0.2% S02 concentration 0.9% Stack gas flow 1.1% Refer to the table above to review the percent, of substitute data by parameter from years 2015-2019. |—|^ jy United States Environmental Protection Agency Substitute data range from 0.2 to 1.1 percent of total operating hours ------- Monitoring Insights Substitute data have different effects on emission estimates The data substitution algorithms vary in their likelihood to overestimate emissions. A power plant must apply the appropriate algorithm based on the duration and frequency of missing or invalid data. For example: ¦ When the duration and frequency of missing or invalid data is low, the power plant averages the valid hours before and after the missing data period and applies that value to the missing data period. This approach is unlikely to significantly overestimate emissions. ¦ When the duration and/or frequency of missing or invalid data is long, the power plant may have to report maximum potential concentration or flow rate, regardless of operating level. This approach is likely to overestimate emissions. Substitute data can be categorized into three tiers of estimation to indicate the likelihood that emissions are overestimated. Tier 1: Low likelihood of overestimation Tier 2: Moderate likelihood of overestimation Tier 3: High likelihood of overestimation A small portion of substitute data hours cannot easily be categorized because they are reviewed on a case-by-case basis and do not fall into the traditional algorithm. These are listed as "other". JV United States Environmental Protection ^^1— I Agency 4 ------- Monitoring Insights Substitute data varies by parameter This figure illustrates the percent of total operating hours using substitute data and separates the data into the three tiers of estimation by parameter. • For C02 concentration and S02 concentration tier 3 (highest likelihood for overestimation) substitution accounts for 0.2% or less of total data and represents the smallest percent of substitute data for those parameters. • For N0X concentration, 02 concentration, and stack gas flow tier 3 substitution accounts for 0.5% or less of total data and represents the largest percent of substitute data. Although tier 3 represents the majority of the data, 0.5% is an insignificant percentage of the total data. Substitute data as a percent of total data C02 Concentration NOx Concentration 0.5- 0.4- 0.3- 0.2- 0.1- _ o.o- o 0.5- 0.4- CL 0.3- 0.2- 0.1- o.o- 02 Concentration I S02 Concentration Stack gas flow Tier | 1: Low likelihood of over estimation | 2: Moderate likelihood of over estimation | 3: High likelihood of over estimation I 4: other Tier JV United States Environmental Protection ^^1— I Agency 5 ------- Monitoring Insights For more information about the data or this analysis... EPA's part 75 monitoring and reporting program ¦ 40 CFR part 75—Continuous Emission Monitoring ¦ Plain English Guide to Part 75 (PDF) Power Sector Emissions Data ¦ CAMD's Power Sector Emission Data ¦ CAMD's Power Sector Emissions Data Guide (PDF) Contact information Stacey Zintgraff EPA's Clean Air Markets Division zintgraff.stacev@epa.gov JV United States Environmental Protection ^^1— I Agency 6 ------- Monitoring Insights Categorizing method of determination codes Every hourly measurement includes a method of determination code (MODC) to inform how the value was measured or calculated. The MODCs can be categorized into measured and calculated. The calculated MODCs can be further categorized by the likelihood that the associated algorithm will overestimate emissions. For more information about MODCs, refer to the part 75 reporting instructions. Categories of MODC values Category MODC Measured 1-5, 14, 16-17, 21-22, 26, and 40 Tier 1—Low likelihood of overestimation 6-7 and 11 Tier 2—Moderate likelihood 8-9 of overestimation Tier 3—High likelihood of overestimation 10, 12-13, 15, 18-20, 23-25, and 46-48 Other 53-55 JV United States Environmental Protection ^^1— I Agency 7 ------- Monitoring Insights Analytical methodology This analysis was completed in RStudio. If you would like to review the code or the source data, contact Stacev Zintgraff to make the request. To complete this analysis, we ... Summarized steps 1. Created a data frame consisting of all operating hours, including measured and substituted data by parameter. 2. Calculated percent of operating hours measured versus substituted by parameter. 3. Categorized substituted data into the tiers of estimation and calculated percent of operating hours in each tier by parameter. By the numbers (All Parameters) ¦ Power plant combustion units: o 3,963 units ¦ Hours of operation: o 181,189,654 hours ¦ Measured hours of operation: o 180,084,949 hours ¦ Tier 1 substituted data hours of operation: o 388,837 hours ¦ Tier 2 substituted data hours of operation: o 246,793 hours ¦ Tier 3 substituted data hours of operation: o 468,781 hours ¦ Other substituted data hours of operation: o 294 hours JV United States Environmental Protection ^^1— I Agency ------- |