United Stales National Risk Management Environmental Protection Research Laboratory Agency Research Triangle Park, NC 27711 Research and Development EPA/600/SR-97/005 July 1997 oEPA Project Summary Improving Emissions Estimates with Computational Intelligence, Database Expansion, and Comprehensive Va idation J.G. Cleland, V.E. McCormick, H.L. Waters, J.R. Youngberg, and J.A. Zak The EPA Is investigating techniques to improve methods for estimating vola- tile organic compound (VOC) emissions from area sources. Using the automo- bile refinishing industry for a detailed area source case study, an emission estimation method is being developed that uses advanced computational tech- niques and updated, comprehensive, emissions-related information. New computational techniques contributing to the estimation method are fuzzy logic, neural networks, and genetic al- gorithms. This method development re- quires a thorough characterization of the area sources, an analysis of cur- rent emission estimation methods, the development and execution of a na- tionwide industry activity survey, and a compilation and analysis of the sur- vey results and other explanatory vari- ables. Results will be captured in the personal-computer-based emissions es- timation system, VOCEES (VOC Emis- sions Estimation System). VOCEES has been developed as a dual-use tool that prepares VOC emissions inventories and analyzes the impact of numerous factors on emissions. This methodol- ogy can easily be extended to other area sources. This Project Summary was developed by EPA's National Risk Management Research Laboratory's Air Pollution Prevention and Control Division, Re- search Triangle Park, NC, to announce key findings of the research project that is fully documented in a separate report of the same title (see Project Report ordering information at back). Introduction EPA's emission inventory identifies the types of emission sources in a geographic area, the amount of each pollutant emit- ted by each type of source, and any emis- sion control devices being employed. In developing these inventories, state agen- cies may use either emission estimation methods endorsed and provided by EPA or their own methods. New methods of estimation are sought which will be more accurate, efficient, cost effective, dynamic, and robust. Stationary sources of pollutant emissions are designated as either point sources or area sources. While point sources are in- ventoried on an individual basis, area sources are processes, activities, or busi- nesses that are too small or too numer- ous to be practically tracked as individual emission sources. The required components of an emis- sion estimation methodology are (1) cal- culation of the emission estimates, (2) tem- poral and spatial allocation of the emis- sions, (3) validation of the emission esti- mates, and (4) speciation of the emis- sions. This project concentrates on the first three components. Improving existing emission estimation methods addresses such issues as • Accurate emission estimates require area source-specific estimation meth- ods. ------- • Current methods use data from 2 to 5 years old which may also miss sig- nificant segments of the area source industry or have disclosure restric- tions. • Current methods do not consider such dynamic factors as local economics and consumption patterns, changing technology, and regulatory influence. This project investigates advanced, in- ference-based computational intelligence techniques for emissions estimation, aug- menting or supplanting traditional math- ematical models. Also, since all emissions have a spatial distribution, emissions in- formation is captured and represented us- ing a geographic information system (GIS). Procedure The research concentrates on a case study of a single area source, automobile refinishing, in order to demonstrate the feasibility of developing an improved method for deriving VOC emission esti- mates from area sources. The project plan incorporates several new or expanded fea- tures for emissions estimation, listed in Table 1. The new area source emissions esti- mation method development process in- volves 1) area source characterization, ex- amining materials balances, pro- cess operations and controls, and economic influences, 2) review of current emissions esti- mation methods, 3) database development, including re- trieval and screening of the best accessible data correlating with area source characteristics and emis- sions, 4) selection of analytical methods (the "tool set") for calculating or infer- ring final emissions levels and their distributions, 5) system configuration for data input, computation, and reporting, and 6) validation of results (for this case study, emphasizing a national sur- vey of automobile refinishing shops). For the case study, steps 1 and 2 are complete, steps 3 to 5 are near comple- tion, and a plan for step 6 is complete. Results and Discussion Area Source Characterization A detailed characterization of the auto- mobile refinishing industry was conducted. The number of U.S. cars increases each year, but the volume of auto refinishing solvent use has stagnated because of re- duced numbers of accidents, more corro- sion resistant paints, and more efficient paint spray guns. According to the Na- tional Paint and Coatings Association, over 36 million gallons of coatings were sold in the U.S. in 1989. That number has re- Table 1. Project Initiatives for New Method Development Initiative Description Annual updating Improved validation Intensive area source characterization More extensive data Improved data recovery and structuring Improved data correlations Application of imprecise information and expert opinion Regulatory policy consideration The method will use information and logic that is continuously reviewed and revised. New validation activities include: (a) preparation of the most thorough nationwide survey of the automobile refinishing industry (i.e., auto refinishing shops) to date. The survey will obtain industry answers about activity levels( e.g., numberand types of employees or repairjobs), solvent usage, and emission control. In addition, extensive product distribution data have been requested from all major automotive paint manufacturers; (b) selective sampling and chemical analysis of local sources has been designed to establish actual mass rates of shop emissions; and (c) literature review and consulting with experts to provide new insight into major influences on emissions. Characterization includes main industry variables influencing emission levels. In addition to the national survey, the research team has met with shop managers and industry consultants, attended national conferences, contacted associations, and analyzed the literature. Data analysis will extend down to the county level. Data from federal and state agencies, trade associations, and industry sources have significantly increased the EPA information base related to this area source. One result is a systematic and efficient approach for using information, emphasizing data sources which are regularly updated (e.g., annually) and which are readily accessible at minimal cost to EPA and state agencies. This involves applying statistics, for prescreening of the best data relationships. Such "artificial intelligence" techniques as fuzzy logic, genetic algonthms, and rule-based expert systems can augment the use of pertinent information which had been neglected to this point. The method incorporates regulatory influence on emissions at local, state, and national levels. Alternative methods review Methods (used by states), other than the standard EPA guidelines and emission factors, have been examined. 2 ------- mained basically constant over the past ten years. In addition to more than 60,000 licensed auto refinishing shops, unlicensed "back- yard" shops may make up to 25%-40% of the industry. These are potentially a large source of VOC emissions, since their con- trol technology is more likely to be primi- tive and they are more likely to disregard laws and regulations. Solvent-based coatings (lacquers, enamels, and urethanes) are by far the most common in automobile refinishing. Waterborne coatings are only now be- coming more popular in the refinishing industry. Typical VOC contents range from 4.8 lb/gal (575 kg/m3) for acrylic urethanes to 6.8 lb/gal (815 kg/m3) for acrylic lac- quers. Auto refinishing accounts for 3% of VOC emissions in the U.S. The three main ap- proaches to control are 1) use of lower- VOC coatings, 2) use of enclosed clean- ing devices, and 3) increase of paint-to- surface transfer efficiency. EPA is consid- ering implementation of a national rule that would limit VOC contents in the vari- ous automotive refinishing coatings, just as some states have done. Review of Current Estimation Methods Most EPA-endorsed methods for esti- mating solvent emissions from area sources have been derived from a meth- odology developed as part of the National Emissions Data System (NEDS). The ba- sic approach in estimating emissions is derived from a simple calculation that re- quires an estimate of an activity level, an emissions factor relating emissions to ac- tivities, and estimates of effectiveness of environmental controls and regulation: Emissions = Activity level x Emission factor x [1-(CE x RE x RP)] where: CE = Control efficiency percent/ 100 RE = Rule effectiveness per- cent/100 RP = Rule penetration percent/ 100 CE is related to control technology (e.g., carbon filtration of VOCs from auto refin- ishing paint booths). RE is an adjustment to reflect that air pollution rules are not able to ensure full compliance with regu- latory requirements at all times. RP is a measure of the extent to which a rule applies to a source category. Per capita and per employee activity levels (e.g., gallons of solvent per capita) are provided to the state pollution control agencies by the EPA in State Implemen- tation Plan (SIP) guidance documents. Figure 1 shows typical discrepancies between different methods and factors. A national value for VOC emissions from auto refinishing has been calculated from information provided by the seven differ- ent sources. The activity factor used in each case is based on population. The two state studies are most closely based on actual contacts with body shops within specific regions of the country. Without further validation, it is not possible to pros- elytize for one method or the other. Basis California (1990) New York (1990) Paint Market Analysis (1990) AP-42 (1991) Auto Refinishing (1991) EPA Control Techniques Guideline (1991) EPA Control Technology Center (1988) 100 200 VOC 1000 tons per year in U.S. (1 ton = 907 kg) 300 Figure 1. National VOC estimations by different methods. 3 ------- Database Development The database that has been assembled has both geographic "spatial" components (e.g., nation, state, county, or city) and temporal components (e.g., year or month). The current data set assembled for this study includes • 60 variables over 12 years for the U.S., • 25 variables over 12 years for 51 states (including DC), • 11 variables over 12 years for 3,126 counties, and • 10 variables for 1993 for 64,524 automobile refinishing establish- ments. Variables selection and final arrangement of the desired databases are necessary to tailor a true analysis and estimation system for optimal emission predictions. The current information remains surrogate data until vali- dation is provided by comparing predicted emissions to actual solvent use information. Nevertheless, some interesting preliminary re- sults and selections have come from the evalu- ations to date. The surrogate data sources include 1) census population data, 2) automobile sales and accident data, 3) general eco- nomics data, 4) statistical data from the Statistical Abstract of the United States, Bureau of Economic Analysis, the Bureau of the Census, and the Bureau of Labor Statistics, 5) annual income and employ- ment information covering 1969 to 1990 for states, counties, and metropolitan ar- eas, 6) Regional Economic Information System (REIS), 7) direct marketing infor- mation, 8) state regulations for VOC emis- sions from automobile refinishing, and 9) Occupational Safety and Health Adminis- tration regulations. Data subsets were first selected based on availability, frequency of revision, and expert option on relation to auto refinishing emissions. To provide working surrogate variables and data for the emission estimation method, several data subsets were fur- ther screened using statistical regression analysis. Annual data collected at the state level have proven to be the most useful for analyzing trends and regional varia- tions. Most variables were considered as in- dependent, or explanatory. A few vari- ables (i.e., number of auto refinishing em- ployees, receipts for the auto refinishing Standard Industrial Classification (SIC), and sales for the paints and allied prod- ucts SIC) were selected as most closely representing area source activity. Using these as surrogates for solvent use, the independent variables were correlated. The best explanatory variables for use in estimating derived state-level emission estimates are total civilian labor force, li- censed drivers, employed civilian labor force, and resident population. The best correlations with paint sales were for ur- ban vehicle miles traveled, licensed driv- ers, and civilian labor force. Analytical Tool Set Selection The current tool set includes computa- tional intelligence (CI) tools, statistical tools, and a geographic information system. CI is a term adopted by the Institute for Elec- trical and Electronics Engineers (IEEE) for innovative computational techniques like expert systems, artificial neural net- works, fuzzy logic, and genetic algorithms. CI is being used to • Select the best initial data matrix to become the VOCEES resident datafile, • Continuously update the best data by a training function, • Verify and/or improve statistical correlations, • Quantitatively interpret qualitative responses from a nationwide survey, • Interpolate to fill in data gaps, • Develop rules and produce if/then sce- narios using expert opinion, • Assess geographic and time depen- dent influences to establish trends, • Provide a structured technique for adding and evaluating new factors, and • Support optimal information displays and information transmittal. An expert system is a computer-based system that contains human expertise or reasoning capabilities. A new expert sys- tem for area source emissions must evolve if one is to be applied. The rules derived from industry and emissions experts thus far have been, at best, very general. Since "crisp-logic," rule- based expert systems do not handle "ap- proximate reasoning" well, a fuzzy logic expert system is being developed. Fuzzy logic is an approximate reasoning tech- nique used in processing inexact informa- tion. While a typical expert system may be thought of as defining "true or false" conditions, fuzzy systems allow for vary- ing degrees of truth, or "shades of gray," more like human reasoning. Fuzzy logic will supply a set of secondary emission factors (V1, V2, V3,...Vn) based on quali- tative or uncertain influences to augment the best quantitative data correlations be- tween emissions and independent vari- ables. These would contribute by such a relationship as Total Emission Factor = EF „ x quantitation V1 x V2 x V3.... where a specific example might be if{county=suburban} and [winter=average} then {VOC emissions factor V3=moderate} The linguistic result is then "defuzzified" to provide a numerical value for V3. An artificial neural network is an analy- sis tool that is modeled after the mas- sively parallel structure of the brain. The neural network does not have to be pro- grammed, but learns from example. A neu- ral network's ability to generalize will prove beneficial in data interpolation. Once a network has been trained, it provides an instantaneous output, without iterating, for each set of inputs. Genetic algorithms select optimum rules and data by an evolutionary process analo- gous to a "survival of the fittest" approach. A personal-computer (PC)-based VOC Emission Estimation System (VOCEES) automates the data management, compu- tation, and displays/reports under the new method. The main components of VOCEES are 1) a variable database screened by neu- ral nets and/or genetic algorithms, 2) ba- sic computational algorithms, 3) the supplemental fuzzy logic expert system, and 4) the GIS-based user interface, that presents data in both map and graph dis- plays. Users have a choice of examining counties, nonattainment areas, states, or EPA regions. Also, since VOCEES can be used to examine emissions for differ- ent years, temporal changes can be ob- served. Extended Method Development Five areas of new method development have begun under this case study, but not completed. These five areas should be extended to completion in the future, in suggested order of importance: 1) Nationwide Automobile Refinishing Solvent Use Survey (ARSUS) for Validation of Emissions and Veri- fied Correlations with Explanatory Variables — this activity is essen- 4 ------- tial to the project's success, con- firming solvent use and variable cor- relations. ARSUS is almost certainly the most significant area source emissions estimation validation un- dertaken to date. Important features of ARSUS are a) results will be statistically defensible, based on random probability sets and repre- sented by accuracy estimations and confidence levels, b) proper survey techniques will be applied to as- sure a high percentage response, c) results are expected to increase available validation data by three orders of magnitude and improve accuracy by 20% to 200%, and d) the survey is designed based on a detailed knowledge of the industry. 2) Application of Computational Intel- ligence (Fuzzy Logic Expert Sys- tem, Neural Nets, and Genetic Al- gorithms) to Emissions Estimation Using Validated Data — these tools allow new kinds of information to be used, accelerate the data selec- tion process, and provide more ac- curate estimates. 3) Potential Negotiation of the Use of VOC-Containing Product Manufac- turers Data for Validation and Da- tabase Updates — important to pro- vide an accurate estimate of geo- graphic distribution of product cat- egories, to compare with survey data, and to extend survey results to an annual validation update. 4) Sampling and Chemical Analysis of Selected VOC Species at a Limited Number of Area Source Sites — important to improve the credibility of emissions predictions and the VOCEES extrapolation of product use data to actual annual volume of emissions of VOC. 5) Graphical Interpretation of Past Emission Estimation Data — im- portant for comparison with VOCEES estimates, to illustrate needs for estimating improvement, and to provide a baseline perspec- tive of techniques to date. A method for applying the techniques is shown in Figure 2. Results Summary The results of the case study cannot be complete until validation data become available and emissions estimates can be better substantiated. However, results of the study to date include • A comprehensive information search and development of a database; • Assessment of current emission esti- mation methods related to the area source and their limitations; • A PC Windows- and GlS-graphics- based system with computational techniques that provide reasonable examples of system function and out- put; • Demonstration of the VOCEES sys- tem; • An examination of computational in- telligence and recommendations for incorporation into the overall estima- tion method; Dual Product VOCEES Rgure 2. New method for estimating emissions. 5 ------- • Preliminary screening of explanatory variables using statistical regressions; • Preparation of a survey (question- naire, sampling population, and data storage and analysis system), and completion of a presurvey; and • Development of preliminary source sampling recommendations. Conclusions and Recommendations • Automobile refinishing is similar to other major area sources associated with VOC use in terms of establish- ment size, customer interaction, envi- ronmental compliance attitudes, ma- terials suppliers, and demographic in- fluences. • The auto refinishing industry expects to be using lower-VOC materials and to expand the use of high transfer efficiency equipment like high-volume, low-pressure (HVLP) paint guns. • Difficulties exist for obtaining product distribution data from automotive paint manufacturers because of confidenti- ality issues. A system for tracking sol- vent distribution is needed. Manufac- turers and original equipment manu- facturers (OEMs) control the product mix. For auto refinishing, sales records of about seven large companies could characterize more than 90% of the VOC distribution. • Discrepancies in current emission es- timation methods exist for auto refin- ishing, because of unsubstantiated activity factors and of fundamental in- accuracies in activity factor data (e.g., numbers of individual shop employ- ees). • PC-based analysis and GIS computer graphics displays are a good combi- nation for an accurate, easy-to-use, low-cost emissions estimation system. • No expert approach exists upon which to base an expert system (i.e., no agency is consistently providing reli- able, accessible, and continuously available emission estimates). There- fore, an expert system is needed, one that is built through new expertise and then captured in software. • Once validation data are obtained for VOC use by an area source, genetic algorithms and neural networks should be efficient for completing selection and weighting of the best explanatory variables, and for training the system to optimally integrate new informa- tion. • Fuzzy logic is appropriate for manipu- lating rules to apply inferential esti- mates in augmenting the correlation of VOC usage variables. • State and county databases should be used as activity factor data wher- ever possible. The best ones are readily accessible and typically up- dated annually. EPA would continu- ously refine the techniques and tools for applying these databases and could be responsible for centralized validation of the method. • Preliminary indications show that li- censed drivers and registered vehicles are better explanatory variables for auto refinishing emissions than popu- lation or number of shop employees. • Data related to the area source ap- pear to be best for emissions estima- tion purposes, including data related to materials volumes and product us- ers' levels of activity (e.g., registered vehicles). • An estimate of the impact of regula- tions and standards, and of their level of enforcement, requires more accu- rate emissions estimation and predic- tion. • The current VOCEES design could evolve into an expert system. Valida- tion will confirm an ever-improving technique that will result in a highly accurate system of rules. • Validation of data and variable rela- tionships using industry responses is essential to completion of a new esti- mation method. The best validation is through a national survey of the us- ers of the pertinent VOC-containing materials and/or a distribution of the materials obtained from solvent prod- uct manufacturers. 6 ------- J. G. Cleland, II.E. McCormick, H.L Waters, J.R. Youngberg, and J.A. Zakare with Research Triangle Institute, Research Triangle Park, NC 27709. P. Jeff Chappell is the EPA Project Officer (see below). The complete report, entitled "Improving Emissions Estimates with Computational Intelligence, Database Expansion, and Comprehensive Validation," (Order No. PB97-152565; Cost: $31. 00, subject to change) will be available only from: National Technical Information Service 5285 Port Royal Road Springfield, VA 22161 Telephone: 703-487-4650 The EPA Project Officer can be contacted at: Air Pollution Prevention and Control Division National Risk Management Research Laboratory U.S. Environmental Protection Agency Research Triangle Park, NC 27711 United States Environmental Protection Agency Center for Environmental Research Information Cincinnati, OH 45268 Official Business Penalty for Private Use $300 EPA/600/SR-97/005 BULK RATE POSTAGE & FEES PAID EPA PERMIT NO. G-35 ------- |