United Stales	National Risk Management

Environmental Protection	Research Laboratory

Agency	Research Triangle Park, NC 27711

Research and Development	EPA/600/SR-97/005 July 1997

oEPA Project Summary

Improving Emissions
Estimates with Computational
Intelligence, Database
Expansion, and Comprehensive
Va idation

J.G. Cleland, V.E. McCormick, H.L. Waters, J.R. Youngberg, and J.A. Zak

The EPA Is investigating techniques
to improve methods for estimating vola-
tile organic compound (VOC) emissions
from area sources. Using the automo-
bile refinishing industry for a detailed
area source case study, an emission
estimation method is being developed
that uses advanced computational tech-
niques and updated, comprehensive,
emissions-related information. New
computational techniques contributing
to the estimation method are fuzzy
logic, neural networks, and genetic al-
gorithms. This method development re-
quires a thorough characterization of
the area sources, an analysis of cur-
rent emission estimation methods, the
development and execution of a na-
tionwide industry activity survey, and
a compilation and analysis of the sur-
vey results and other explanatory vari-
ables. Results will be captured in the
personal-computer-based emissions es-
timation system, VOCEES (VOC Emis-
sions Estimation System). VOCEES has
been developed as a dual-use tool that
prepares VOC emissions inventories
and analyzes the impact of numerous
factors on emissions. This methodol-
ogy can easily be extended to other
area sources.

This Project Summary was developed
by EPA's National Risk Management
Research Laboratory's Air Pollution
Prevention and Control Division, Re-
search Triangle Park, NC, to announce
key findings of the research project
that is fully documented in a separate

report of the same title (see Project
Report ordering information at back).

Introduction

EPA's emission inventory identifies the
types of emission sources in a geographic
area, the amount of each pollutant emit-
ted by each type of source, and any emis-
sion control devices being employed. In
developing these inventories, state agen-
cies may use either emission estimation
methods endorsed and provided by EPA
or their own methods. New methods of
estimation are sought which will be more
accurate, efficient, cost effective, dynamic,
and robust.

Stationary sources of pollutant emissions
are designated as either point sources or
area sources. While point sources are in-
ventoried on an individual basis, area
sources are processes, activities, or busi-
nesses that are too small or too numer-
ous to be practically tracked as individual
emission sources.

The required components of an emis-
sion estimation methodology are (1) cal-
culation of the emission estimates, (2) tem-
poral and spatial allocation of the emis-
sions, (3) validation of the emission esti-
mates, and (4) speciation of the emis-
sions. This project concentrates on the
first three components. Improving existing
emission estimation methods addresses
such issues as

• Accurate emission estimates require
area source-specific estimation meth-
ods.

-------
•	Current methods use data from 2 to 5
years old which may also miss sig-
nificant segments of the area source
industry or have disclosure restric-
tions.

•	Current methods do not consider such
dynamic factors as local economics
and consumption patterns, changing
technology, and regulatory influence.

This project investigates advanced, in-
ference-based computational intelligence
techniques for emissions estimation, aug-
menting or supplanting traditional math-
ematical models. Also, since all emissions
have a spatial distribution, emissions in-
formation is captured and represented us-
ing a geographic information system (GIS).

Procedure

The research concentrates on a case
study of a single area source, automobile
refinishing, in order to demonstrate the
feasibility of developing an improved
method for deriving VOC emission esti-
mates from area sources. The project plan
incorporates several new or expanded fea-

tures for emissions estimation, listed in
Table 1.

The new area source emissions esti-
mation method development process in-
volves

1)	area source characterization, ex-
amining materials balances, pro-
cess operations and controls, and
economic influences,

2)	review of current emissions esti-
mation methods,

3)	database development, including re-
trieval and screening of the best
accessible data correlating with area
source characteristics and emis-
sions,

4)	selection of analytical methods (the
"tool set") for calculating or infer-
ring final emissions levels and their
distributions,

5)	system configuration for data input,
computation, and reporting, and

6) validation of results (for this case
study, emphasizing a national sur-
vey of automobile refinishing
shops).

For the case study, steps 1 and 2 are
complete, steps 3 to 5 are near comple-
tion, and a plan for step 6 is complete.

Results and Discussion

Area Source Characterization

A detailed characterization of the auto-
mobile refinishing industry was conducted.
The number of U.S. cars increases each
year, but the volume of auto refinishing
solvent use has stagnated because of re-
duced numbers of accidents, more corro-
sion resistant paints, and more efficient
paint spray guns. According to the Na-
tional Paint and Coatings Association, over
36 million gallons of coatings were sold in
the U.S. in 1989. That number has re-

Table 1. Project Initiatives for New Method Development

Initiative

Description

Annual updating
Improved validation

Intensive area source
characterization

More extensive data

Improved data recovery
and structuring

Improved data correlations

Application of imprecise
information and expert opinion

Regulatory policy
consideration

The method will use information and logic that is continuously reviewed and revised.

New validation activities include: (a) preparation of the most thorough nationwide survey of the
automobile refinishing industry (i.e., auto refinishing shops) to date. The survey will obtain industry answers about
activity levels( e.g., numberand types of employees or repairjobs), solvent usage, and emission control. In addition,
extensive product distribution data have been requested from all major automotive paint manufacturers; (b)
selective sampling and chemical analysis of local sources has been designed to establish actual mass rates of shop
emissions; and (c) literature review and consulting with experts to provide new insight into major influences on
emissions.

Characterization includes main industry variables influencing emission levels. In addition to the national survey, the
research team has met with shop managers and industry consultants, attended national conferences, contacted
associations, and analyzed the literature.

Data analysis will extend down to the county level. Data from federal and state agencies, trade associations, and
industry sources have significantly increased the EPA information base related to this area source.

One result is a systematic and efficient approach for using information, emphasizing data sources which are
regularly updated (e.g., annually) and which are readily accessible at minimal cost to EPA and state agencies.

This involves applying statistics, for prescreening of the best data relationships.

Such "artificial intelligence" techniques as fuzzy logic, genetic algonthms, and rule-based expert systems can
augment the use of pertinent information which had been neglected to this point.

The method incorporates regulatory influence on emissions at local, state, and national levels.

Alternative methods review

Methods (used by states), other than the standard EPA guidelines and emission factors, have been examined.

2

-------
mained basically constant over the past
ten years.

In addition to more than 60,000 licensed
auto refinishing shops, unlicensed "back-
yard" shops may make up to 25%-40% of
the industry. These are potentially a large
source of VOC emissions, since their con-
trol technology is more likely to be primi-
tive and they are more likely to disregard
laws and regulations.

Solvent-based coatings (lacquers,
enamels, and urethanes) are by far the
most common in automobile refinishing.
Waterborne coatings are only now be-
coming more popular in the refinishing
industry. Typical VOC contents range from
4.8 lb/gal (575 kg/m3) for acrylic urethanes
to 6.8 lb/gal (815 kg/m3) for acrylic lac-
quers.

Auto refinishing accounts for 3% of VOC
emissions in the U.S. The three main ap-
proaches to control are 1) use of lower-
VOC coatings, 2) use of enclosed clean-
ing devices, and 3) increase of paint-to-
surface transfer efficiency. EPA is consid-
ering implementation of a national rule

that would limit VOC contents in the vari-
ous automotive refinishing coatings, just
as some states have done.

Review of Current Estimation
Methods

Most EPA-endorsed methods for esti-
mating solvent emissions from area
sources have been derived from a meth-
odology developed as part of the National
Emissions Data System (NEDS). The ba-
sic approach in estimating emissions is
derived from a simple calculation that re-
quires an estimate of an activity level, an
emissions factor relating emissions to ac-
tivities, and estimates of effectiveness of
environmental controls and regulation:

Emissions = Activity level x Emission
factor x [1-(CE x RE x RP)]

where: CE = Control efficiency percent/
100

RE = Rule effectiveness per-
cent/100
RP = Rule penetration percent/
100

CE is related to control technology (e.g.,
carbon filtration of VOCs from auto refin-
ishing paint booths). RE is an adjustment
to reflect that air pollution rules are not
able to ensure full compliance with regu-
latory requirements at all times. RP is a
measure of the extent to which a rule
applies to a source category.

Per capita and per employee activity
levels (e.g., gallons of solvent per capita)
are provided to the state pollution control
agencies by the EPA in State Implemen-
tation Plan (SIP) guidance documents.

Figure 1 shows typical discrepancies
between different methods and factors. A
national value for VOC emissions from
auto refinishing has been calculated from
information provided by the seven differ-
ent sources. The activity factor used in
each case is based on population. The
two state studies are most closely based
on actual contacts with body shops within
specific regions of the country. Without
further validation, it is not possible to pros-
elytize for one method or the other.

Basis
California (1990)

New York (1990)

Paint Market
Analysis (1990)

AP-42 (1991)

Auto Refinishing (1991)

EPA Control Techniques
Guideline (1991)

EPA Control Technology
Center (1988)

100	200

VOC 1000 tons per year in U.S. (1 ton = 907 kg)

300

Figure 1. National VOC estimations by different methods.

3

-------
Database Development

The database that has been assembled
has both geographic "spatial" components
(e.g., nation, state, county, or city) and
temporal components (e.g., year or month).

The current data set assembled for
this study includes

•	60 variables over 12 years for
the U.S.,

•	25 variables over 12 years for 51
states (including DC),

•	11 variables over 12 years for
3,126 counties, and

•	10 variables for 1993 for 64,524
automobile refinishing establish-
ments.

Variables selection and final arrangement
of the desired databases are necessary to
tailor a true analysis and estimation system
for optimal emission predictions. The current
information remains surrogate data until vali-
dation is provided by comparing predicted
emissions to actual solvent use information.
Nevertheless, some interesting preliminary re-
sults and selections have come from the evalu-
ations to date.

The surrogate data sources include 1)
census population data, 2) automobile
sales and accident data, 3) general eco-
nomics data, 4) statistical data from the
Statistical Abstract of the United States,
Bureau of Economic Analysis, the Bureau
of the Census, and the Bureau of Labor
Statistics, 5) annual income and employ-
ment information covering 1969 to 1990
for states, counties, and metropolitan ar-
eas, 6) Regional Economic Information
System (REIS), 7) direct marketing infor-
mation, 8) state regulations for VOC emis-
sions from automobile refinishing, and 9)
Occupational Safety and Health Adminis-
tration regulations. Data subsets were first
selected based on availability, frequency
of revision, and expert option on relation
to auto refinishing emissions.

To provide working surrogate variables
and data for the emission estimation
method, several data subsets were fur-
ther screened using statistical regression
analysis. Annual data collected at the state
level have proven to be the most useful
for analyzing trends and regional varia-
tions.

Most variables were considered as in-
dependent, or explanatory. A few vari-
ables (i.e., number of auto refinishing em-
ployees, receipts for the auto refinishing
Standard Industrial Classification (SIC),
and sales for the paints and allied prod-
ucts SIC) were selected as most closely

representing area source activity. Using
these as surrogates for solvent use, the
independent variables were correlated.

The best explanatory variables for use
in estimating derived state-level emission
estimates are total civilian labor force, li-
censed drivers, employed civilian labor
force, and resident population. The best
correlations with paint sales were for ur-
ban vehicle miles traveled, licensed driv-
ers, and civilian labor force.

Analytical Tool Set Selection

The current tool set includes computa-
tional intelligence (CI) tools, statistical tools,
and a geographic information system. CI
is a term adopted by the Institute for Elec-
trical and Electronics Engineers (IEEE)
for innovative computational techniques
like expert systems, artificial neural net-
works, fuzzy logic, and genetic algorithms.
CI is being used to

•	Select the best initial data matrix
to become the VOCEES resident
datafile,

•	Continuously update the best
data by a training function,

•	Verify and/or improve statistical
correlations,

•	Quantitatively interpret qualitative
responses from a nationwide
survey,

•	Interpolate to fill in data gaps,

•	Develop rules and produce if/then sce-
narios using expert opinion,

•	Assess geographic and time depen-
dent influences to establish trends,

•	Provide a structured technique for
adding and evaluating new factors,
and

•	Support optimal information displays
and information transmittal.

An expert system is a computer-based
system that contains human expertise or
reasoning capabilities. A new expert sys-
tem for area source emissions must evolve
if one is to be applied.

The rules derived from industry and
emissions experts thus far have been, at
best, very general. Since "crisp-logic," rule-
based expert systems do not handle "ap-
proximate reasoning" well, a fuzzy logic
expert system is being developed. Fuzzy
logic is an approximate reasoning tech-
nique used in processing inexact informa-
tion. While a typical expert system may
be thought of as defining "true or false"
conditions, fuzzy systems allow for vary-

ing degrees of truth, or "shades of gray,"
more like human reasoning. Fuzzy logic
will supply a set of secondary emission
factors (V1, V2, V3,...Vn) based on quali-
tative or uncertain influences to augment
the best quantitative data correlations be-
tween emissions and independent vari-
ables. These would contribute by such a
relationship as

Total Emission Factor = EF „ x

quantitation

V1 x V2 x V3....

where a specific example might be

if{county=suburban} and [winter=average}
then {VOC emissions factor V3=moderate}

The linguistic result is then "defuzzified" to
provide a numerical value for V3.

An artificial neural network is an analy-
sis tool that is modeled after the mas-
sively parallel structure of the brain. The
neural network does not have to be pro-
grammed, but learns from example. A neu-
ral network's ability to generalize will prove
beneficial in data interpolation. Once a
network has been trained, it provides an
instantaneous output, without iterating, for
each set of inputs.

Genetic algorithms select optimum rules
and data by an evolutionary process analo-
gous to a "survival of the fittest" approach.

A personal-computer (PC)-based VOC
Emission Estimation System (VOCEES)
automates the data management, compu-
tation, and displays/reports under the new
method. The main components of VOCEES
are 1) a variable database screened by neu-
ral nets and/or genetic algorithms, 2) ba-
sic computational algorithms, 3) the
supplemental fuzzy logic expert system,
and 4) the GIS-based user interface, that
presents data in both map and graph dis-
plays. Users have a choice of examining
counties, nonattainment areas, states, or
EPA regions. Also, since VOCEES can
be used to examine emissions for differ-
ent years, temporal changes can be ob-
served.

Extended Method Development

Five areas of new method development
have begun under this case study, but not
completed. These five areas should be
extended to completion in the future, in
suggested order of importance:

1) Nationwide Automobile Refinishing
Solvent Use Survey (ARSUS) for
Validation of Emissions and Veri-
fied Correlations with Explanatory
Variables — this activity is essen-

4

-------
tial to the project's success, con-
firming solvent use and variable cor-
relations. ARSUS is almost certainly
the most significant area source
emissions estimation validation un-
dertaken to date. Important features
of ARSUS are a) results will be
statistically defensible, based on
random probability sets and repre-
sented by accuracy estimations and
confidence levels, b) proper survey
techniques will be applied to as-
sure a high percentage response,
c) results are expected to increase
available validation data by three
orders of magnitude and improve
accuracy by 20% to 200%, and d)
the survey is designed based on a
detailed knowledge of the industry.

2) Application of Computational Intel-
ligence (Fuzzy Logic Expert Sys-
tem, Neural Nets, and Genetic Al-
gorithms) to Emissions Estimation
Using Validated Data — these tools
allow new kinds of information to
be used, accelerate the data selec-

tion process, and provide more ac-
curate estimates.

3)	Potential Negotiation of the Use of
VOC-Containing Product Manufac-
turers Data for Validation and Da-
tabase Updates — important to pro-
vide an accurate estimate of geo-
graphic distribution of product cat-
egories, to compare with survey
data, and to extend survey results
to an annual validation update.

4)	Sampling and Chemical Analysis of
Selected VOC Species at a Limited
Number of Area Source Sites —
important to improve the credibility
of emissions predictions and the
VOCEES extrapolation of product
use data to actual annual volume
of emissions of VOC.

5)	Graphical Interpretation of Past
Emission Estimation Data — im-
portant for comparison with
VOCEES estimates, to illustrate
needs for estimating improvement,
and to provide a baseline perspec-
tive of techniques to date.

A method for applying the techniques is
shown in Figure 2.

Results Summary

The results of the case study cannot be
complete until validation data become
available and emissions estimates can be
better substantiated. However, results of
the study to date include

•	A comprehensive information search
and development of a database;

•	Assessment of current emission esti-
mation methods related to the area
source and their limitations;

•	A PC Windows- and GlS-graphics-
based system with computational
techniques that provide reasonable
examples of system function and out-
put;

•	Demonstration of the VOCEES sys-
tem;

•	An examination of computational in-
telligence and recommendations for
incorporation into the overall estima-
tion method;

Dual Product

VOCEES

Rgure 2. New method for estimating emissions.

5

-------
•	Preliminary screening of explanatory
variables using statistical regressions;

•	Preparation of a survey (question-
naire, sampling population, and data
storage and analysis system), and
completion of a presurvey; and

•	Development of preliminary source
sampling recommendations.

Conclusions and
Recommendations

•	Automobile refinishing is similar to
other major area sources associated
with VOC use in terms of establish-
ment size, customer interaction, envi-
ronmental compliance attitudes, ma-
terials suppliers, and demographic in-
fluences.

•	The auto refinishing industry expects
to be using lower-VOC materials and
to expand the use of high transfer
efficiency equipment like high-volume,
low-pressure (HVLP) paint guns.

•	Difficulties exist for obtaining product
distribution data from automotive paint
manufacturers because of confidenti-
ality issues. A system for tracking sol-
vent distribution is needed. Manufac-
turers and original equipment manu-
facturers (OEMs) control the product
mix. For auto refinishing, sales records
of about seven large companies could
characterize more than 90% of the
VOC distribution.

•	Discrepancies in current emission es-
timation methods exist for auto refin-

ishing, because of unsubstantiated
activity factors and of fundamental in-
accuracies in activity factor data (e.g.,
numbers of individual shop employ-
ees).

•	PC-based analysis and GIS computer
graphics displays are a good combi-
nation for an accurate, easy-to-use,
low-cost emissions estimation system.

•	No expert approach exists upon which
to base an expert system (i.e., no
agency is consistently providing reli-
able, accessible, and continuously
available emission estimates). There-
fore, an expert system is needed, one
that is built through new expertise
and then captured in software.

•	Once validation data are obtained for
VOC use by an area source, genetic
algorithms and neural networks should
be efficient for completing selection
and weighting of the best explanatory
variables, and for training the system
to optimally integrate new informa-
tion.

•	Fuzzy logic is appropriate for manipu-
lating rules to apply inferential esti-
mates in augmenting the correlation
of VOC usage variables.

•	State and county databases should
be used as activity factor data wher-
ever possible. The best ones are
readily accessible and typically up-
dated annually. EPA would continu-
ously refine the techniques and tools
for applying these databases and

could be responsible for centralized
validation of the method.

•	Preliminary indications show that li-
censed drivers and registered vehicles
are better explanatory variables for
auto refinishing emissions than popu-
lation or number of shop employees.

•	Data related to the area source ap-
pear to be best for emissions estima-
tion purposes, including data related
to materials volumes and product us-
ers' levels of activity (e.g., registered
vehicles).

•	An estimate of the impact of regula-
tions and standards, and of their level
of enforcement, requires more accu-
rate emissions estimation and predic-
tion.

•	The current VOCEES design could
evolve into an expert system. Valida-
tion will confirm an ever-improving
technique that will result in a highly
accurate system of rules.

•	Validation of data and variable rela-
tionships using industry responses is
essential to completion of a new esti-
mation method. The best validation is
through a national survey of the us-
ers of the pertinent VOC-containing
materials and/or a distribution of the
materials obtained from solvent prod-
uct manufacturers.

6

-------
J. G. Cleland, II.E. McCormick, H.L Waters, J.R. Youngberg, and J.A. Zakare with
Research Triangle Institute, Research Triangle Park, NC 27709.

P. Jeff Chappell is the EPA Project Officer (see below).

The complete report, entitled "Improving Emissions Estimates with Computational
Intelligence, Database Expansion, and Comprehensive Validation," (Order No.
PB97-152565; Cost: $31. 00, subject to change) will be available only from:
National Technical Information Service
5285 Port Royal Road
Springfield, VA 22161
Telephone: 703-487-4650

The EPA Project Officer can be contacted at:

Air Pollution Prevention and Control Division
National Risk Management Research Laboratory
U.S. Environmental Protection Agency
Research Triangle Park, NC 27711

United States

Environmental Protection Agency

Center for Environmental Research Information

Cincinnati, OH 45268

Official Business
Penalty for Private Use $300

EPA/600/SR-97/005

BULK RATE
POSTAGE & FEES PAID
EPA

PERMIT NO. G-35

-------