EPA/600/A-97/101
Development of the EIIP Emission Inventory Data Model for Purposes of
Facilitating Data Transfer Between the States and U.S. EPA
John Slade
Pennsylvania Department of Environmental Resources,
P.O. Box 8468, 400 Market Street, Harrisburg, PA 17105-8468
William Benjey*
Atmospheric Sciences Modeling Division, Air Resources Laboratory,
National Oceanic and Atmospheric Administration, Research Triangle Park, NC 27711
Andrew Blackard and Michael Pring
Eastern Research Group, Inc.,
1600 Perimeter Park, Suite 300, Morrisville, NC 27560
ABSTRACT
The Emission Inventory Improvement Program (EIIP) was formed with the cooperation of the
State and Territorial Air Pollution Program Administrators/Association of Local Air Pollution Control
Officials (STAPPA/ALAPCO) as a cooperative effort between State and local agencies, industry, and
EPA to improve emission estimates and establish a National Standard for Emission Data Exchange.
Emission data are needed for a variety of purposes, including regional air quality modeling, air
quality planning, trends analysis, and for informing the public. The Data Management Committee
(DMC) of the EIIP was charged with the goal of developing the data exchange standard, and
undertook the task of developing a data model on which to base the data exchange format. The
purpose of developing the data model was to establish a standardized set of data relationships which
reflect physical reality and ensure accurate and consistent reporting of emissions and emission related
data.
INTRODUCTION
The Clean Air Act Amendments of 1990 (CAAA) require that many states collect emission
inventory data as a basis for planning and demonstrating attainment with National Ambient Air
Quality Standards (NAAQS). Emission inventories are a basic component of air quality modeling
and other air pollution management analyses that are used to demonstrate that proposed air pollution
control strategies are sufficient to attain the state and federal air quality standards.
Sharing of emission data is essential for urban and regional air quality modeling exercises that
cross state boundaries, for regional and national regulatory impact and other policy analyses, for
scientific research efforts, and for informing the public. In order to more easily share data among
the emission inventory community, common formats for data sharing must be established. Such
common formats can be based on an underlying data model which establishes the relationships of the
data elements reflected within the common format.
The Emission Inventory Improvement Program (EIIP) Data Management Committee (DMC)
has been formed through STAPPA/ALAPCO as a joint effort of state and local air agencies, industry,
and the U.S. Environmental Protection Agency (EPA) to improve the way data/information is
transferred and shared among users. The EIIP DMC is currently working to develop and coordinate
recommendations for data transfer at the facility, state/local agency, and EPA levels. In addition, the
DMC will develop recommended mechanisms for sharing emission inventory data and related
information among all users.
'On assignment to the National Exposure Research Laboratory, U.S. Environmental Protection Agency

-------
The first step in achieving these goals was to develop the EIIP Phase I Data Model.1 The
focus of the Phase I Data Model is on the data needed for regional air quality modeling. Phases II
and HI are intended to add information needed for the air quality permitting process and other related
emission inventory data needs of industry for reporting to state and local air pollution control
agencies.
DATA MODEL DEVELOPMENT PROCESS
The draft EIIP Phase I Data Model was developed from an emission inventory viewpoint by
individuals knowledgeable in the origin and use of emission inventory data. The types of data
needed by emission inventory users for modeling and developing emission inventories were identified
from several sources, and then grouped and organized to construct the data model.
Initially, the EIIP Point, Area, Mobile, and Biogenic Sources Committees (source committees)
submitted independent draft data models used to identify data elements needed in the EIIP Phase I
Data Model as well as to show how the data elements related to each other. In addition to the
source committee data models, a technical EPA memorandum (Seitz, 1995)2 was used to identify the
emission inventory data elements needed to support regional modeling. At the direction of the EIIP
Steering Committee, these data elements were to be the focus of the EIIP Phase I Data Model.
Using these data, a preliminary data model was compiled, distributed back to the source
committees for review, and discussed during subsequent teleconferences with each committee. The
draft data model was also mapped against several existing state databases to identify any missing
data elements and/or incorrect relationships among the data. This review process culminated in a
workshop held in Research Triangle Park (RTP), North Carolina, in April 1996, and was attended by
DMC members and state representatives involved in the mapping exercise. At this workshop, the
DMC decided that it would be desirable to document the data model using a separate diagram for
each source type (i.e., point, area and non-road mobile, on-road mobile, and biogenic). Each diagram
would then represent the viewpoint of the user of that data (e.g., point sources) using familiar
terminology.
The data model is currently being used to begin development of the EIIP data transfer format
as well as an Oracle database at EPA for use in the EIIP Data Transfer Prototype project.
Application of the data model in the context of the data transfer format and prototype database
offered a final round of review and precipitated additional changes in the data model to ensure
compatibility between the data model, data transfer format, and prototype database. Further
information on these two applications may be found in the paper "Prototype Demonstration of
Application of the EIIP Data Transfer Format" which is also being presented at this conference.
FINAL DATA MODEL
A data model is a representation of data on paper. It is used to logically organize data and
show how the data relate to each other. To categorize each individual piece of information requires
specifying two organizational concepts—the entity and the attribute. The entity is the item (person,
place, or thing) that the information describes. The attribute is a specific characteristic or description
of an entity.
A data model is visually presented in the form of an entity-relationship diagram (ERD). The
ERD shows each entity in the model and how they relate to one another.
For purposes of this paper, only the point source view of the EIIP Phase I Data Model is
discussed in detail. Please refer to the final document "EIIP Phase I Data Model" for a

-------
comprehensive description of the complete data model including the data element dictionary, data
model coding schemes, examples, and further information on conventions of use for the data model.
Figure 1 presents an ERD showing the chief entities that provide the hierarchical framework
for applying the EIIP Phase I Data Model to point sources. Figure 2 shows the complete ERD for
point sources. This includes additional entities organized around the hierarchy of the "backbone"
entities shown in Figure 1. Each of the "boxes" presented in the figure is an entity. A brief
discussion of the intended use of each of these entities in the point source data model is provided
below.
For the readers benefit, Figures 3, 4, and 5 are presented which show the interpretation of the
EIIP Phase I Data Model for Area and Non-Road Mobile, On-Road Mobile, and Biogenic Sources,
respectively. The primary "backbone" entities are discussed in hierarchical order as they appear on
Figure 1, with the remaining "supporting" entities discussed in alphabetical order.
Site
The Site entity identifies the facility or plant where the emissions are created and is at the top
of the data model organizational hierarchy. The name, address, and contact information for the
facility or plant is contained here. This entity also includes information such as the Standard
Industrial Classification (SIC) Code for the facility, the number of employees, and the federal
identification ID {e.g., Aerometric Information Retrieval System (AIRS) plant ID} for the facility.
An abbreviated attribute listing for this entity is presented in Figure 6.
A Site contains one or more Emission Units, as indicated in Figure 2.
Emission Unit
The Emission Unit entity contains information on the physical unit that creates the emissions.
For instance, this could be a boiler or an incinerator. An abbreviated attribute listing for this entity
is presented in Figure 7.
An Emission Unit has one or more Emission Processes as indicated in Figure 2.
Emission Process
The Emission Process entity identifies the process(es) occurring at the Emission Unit that
produces the emissions. For example, combustion of more than one type of fuel may occur in a
single boiler. Each Emission Process is assigned a specific Source Classification Code (SCC). An
abbreviated attribute listing for this entity is presented in Figure 8.
An Emission Process has one or more measures of activity in the Activity entity, as indicated
in Figure 2.
Activity
The Activity entity is used to represent activity information for a specific Emission Process.
This would include information such as thruput or process rate. This information is divided into
individual time periods with a measure of activity for each specific time period.
Each time period in the Activity entity will be linked to the emission of one or more
pollutants in the Emissions entity. An abbreviated attribute listing for this entity is presented in
Figure 9.

-------
Emissions
The Emissions entity represents the emission estimate. A separate instance of the Emissions
entity is used to represent each pollutant. Therefore, each emission estimate is unique for a specific
combination of Site, Emission Unit, Emission Process, Activity time period, and pollutant. The
emission estimate may be designated as controlled or uncontrolled, and under a variety of operating
scenarios (i.e., maximum, potential) and time periods (annual, daily, etc.,.). An abbreviated attribute
listing for this entity is presented in Figure 10.
Aggregate Controls as Applied
The Aggregate Controls as Applied entity represents the combined, overall effect of the
Control Strategy for an emissions estimate. This information is pollutant-specific and includes total
capture efficiency, control efficiency, as well as rule effectiveness information. Also, when control
data are provided, the corresponding emission estimate reflects the control efficiency indicated in the
Aggregate Controls as Applied entity.
Control Strategy
This entity identifies the complete collection of control measures that are applied to the linked
Geographic Location, Site, Emission Unit, or Emission Process. The various rules, regulations, and
corresponding emission limits that apply are named here.
Control Equipment
The Control Equipment entity contains information for a specific control device. More than
one control device may be linked to a single instance of the Aggregate Controls as Applied entity.
The description and capacity of each individual control device is contained here. The connectivity of
control devices to Emission Units, Emission Processes, stacks, or other control devices is indicated in
the Path entity.
Control Equipment Characteristics
This entity is related to the Control Equipment entity. The Control Equipment Characteristics
entity is used to store the pollutant-specific attributes (percent capture efficiency, percent control
efficiency) associated with the control device described in the Control Equipment entity.
Defined Areas
A given Site may belong to more than one specially defined enforcement or study areas. The
Defined Areas entity is intended to identify the organizational group that an area belongs to. The
AIRS coding system is used for identifying specific nonattainment areas.
The Area Name attribute of the Defined Areas entity will be a free-form text attribute so that
other specially defined areas may be specified if needed.
Emission Factors
The Emission Factors entity is used to identify the specific emission factor used to calculate
the emission estimate contained in the corresponding Emissions entity.
Emission Release Point
This entity is used to identify the stack (or release point, such as a roof vent) the emissions
are released from as well as to contain the stack's federal ID code for reporting. The entity is linked
to the Path entity to indicate its connectivity to the Emission Process for which emissions are being
reported. Alternatively, the connectivity of stacks to emission units or control devices may be
indicated in the Path entity.

-------
Geographic Coordinates
The discrete coordinates for site locations, stack locations, and grids are located here.
Appendix D of the EIIP Phase I Data Model document presents additional details on the use of this
entity.
Geographic Location
This entity contains geographic location information. This includes the following chief fields:
country, state/province/territory, and county/parish/reservation. Other supplementary information
such as air basin municipality, air quality control region, and study grids are also identified here.
The discrete coordinates for points and grids are located in the Geographic Coordinates entity.
The location information in the Geographic Location entity is linked with other entities using
a geographic unique identifiers (UID).
Meteorology
This entity contains the meteorological data that were used to derive the supplied emission
estimates. These data are linked to the Activity entity. Wind speed and temperature data are
included in this entity.
Path
The Path entity is used to show the relationships existing between any configuration of
Emission Units, Emission Processes, Control Devices, and Emission Release Points (stacks). It may
be applied to show the connectivity between one or more pairs of these physical pieces of
equipment. The connectivity of Emission Units to Control Devices and Emission Release Points is
indicated in Figure 2 by use of the Path entity. A more comprehensive discussion of application of
the Path entity is provided in the EIIP Phase I Data Model document.
Process Growth Factors
Annual growth factors for a given Emission Process are contained in the Process Growth
Factor entity. This is typically the expected annual growth of the SCC assigned in the Emission
Process entity. The Process Growth Factor entity includes the initial (base) year, a projected year,
and a growth factor. The initial year corresponds to the year of the measure of emissions reported in
the linked Emissions entity. The Process Growth Factor may be applied to this measure to estimate
emissions for the projected year.
Stack Physical Parameters
The type of stack is identified here. This entity also contains the physical dimensions of the
stack and stack gas properties including temperature, velocity, and flow rate. The Geographic
Coordinate entity contains stack location information and is linked through the Stack Geographic
UID.
Transmittal Information
The transmittal information entity is not a part of the hierarchy of entities that form the
"backbone" of the EIIP Data Model. Instead it contains information about a particular data
transmittal. This information includes the year of the inventory, the inventory's approval status,
contact personnel, and other descriptive attributes.
SUMMARY
The EIIP Data Model was developed through the collaboration of personnel from the EPA,
state and local air pollution control agencies, industry, and support contractors. The intent of the
data model is to define and organize the data elements (and relationships between data elements)

-------
which are needed in air pollution studies, with particular emphasis on air quality modeling
applications.
To date, the EIIP Data Model has been used in several applications to support the overall
goals of the DMC. The primary application was to provide a structural basis for developing the
standard EIIP Data Transfer format. The data model has also provided the basic blueprint for
developing an Oracle database at EPA designed for use in the EIIP Data Transfer Prototype
Demonstration, and for storing the EPA's National Emission Trends (NET) Inventory. In addition to
the immediate goals of the DMC, the data model may also be used by States and local agencies
considering revision or initial development of a relational database used for management of air
pollution information.
As stated previously, the next steps for the evolution of the EIIP data model is expansion to
include the information needed for permitting and compliance activities, as well as detailed facility
level information required of industry by State and local agencies. It is not expected that these
future phases would necessitate a re-design of the data model, but rather it would require additions to
the current data model structure.
ACKNOWLEDGEMENTS
This paper, and the Phase I Data Model Report upon which the paper is based, are the result
of almost three years of conscientious and diligent effort by State and Federal members of the EIIP
DMC, as well as the support contractors . Many thanks are also due to numerous other external
reviewers and those who helped "map" the data model against State emission data models.
DISCLAIMER
The information in this document has been funded wholly or in part by the United States
Environmental Protection Agency. It has been subjected to Agency review and approved for
publication. Mention of trade names or commercial products does not constitute endorsement or
recommendation for use.
REFERENCES
1.	Emission Inventory Improvement Program, Data Management Committee, EIIP Phase I Data
Model, Final Report, EIIP Document Series, Volume 7, U.S. Environmental Protection Agency,
Research Triangle Park, NC, September 1997; EPA-454/R-97-004g (http://www.epa.gov/oar/
oaqps/eiip/dmchaps.html)
2.	Seitz, J.S. 1995. U.S. Environmental Protection Agency, Research Triangle Park, NC, letter to
State Air Directors.

-------
Figure 1. Point Source Hierarchical Entities
entity
1 to many
relationship
/\
. \
Emission
Process
Emissions
Activity
Emission
Unit
Site
(Plant)

-------
Geographic
Coordinates
Defined
Areas
"Y*
Geographic
Location
Control
Equipment
Control
Equipment
Characteristics
Control
Strategy
Meteorology
Activity
Aggregate
Controls
as Applied
Transmltta
Information
(Plant)
Emission
Unit
entity
1 to 1 relationship
(mandatory)
1 to 1 relationship
(optional)
1 to many
relationship
entity repeated
originating /
P°lp* ! may
or fed by feod
mutually exclusive
Stack
Physical
Parameters
Control
Equipment
may be ted by
\may feed
d point
of
Emission
Release
Point
Emission Process
Process
Growth
Factors
Emissions
Emission
Factors

-------
Control
Equipment
Control
Equipment
Characteristics
Defined
Areas
Transmittal !
Information
Physical Unit
Emission Process
Activity
Emission
Factors
Emissions
Aggregate
Controls
as Applied
Process
Growth
Factors
Source
Geographic
Location
Control
Strategy
1 to many
relationship

-------
Defined
Areas
Geographic
Coordinates
Geographic
Location
Control
Process
Growth
Factors
Aggregate
Controls
as Applied
TO*
c
•1
re
Meteorology
Transmittal
Information
Physical
Unit
Schedule
(Activity)
Emission Process
Emissions
2
o
a"
h* »
fT
CO
o
c
o
0>
m
70
a
entity
1 to 1 relationship
(mandatory)
1 to 1 relationship
(optional)
1 to many
relationship
mutually
exclusive

-------
Dynamic
Grids
Geographic
Coordinates
Geographic
Location
entity
1 to 1 relationship
(mandatory)
1 to 1 relationship
(optional)
1 to many
relationship
Physical
Unit
Activity
Meteorology
I	
Emissions
Transmittal
Information
Source
Speclatlon
Profiles

-------
Figure 6. Site Attribute Table
Data Attribute
Notes
Site UID
Site Unique Identifier Code
Site Geographic UID
Site Geographic Unique Identifier Code
Transaction ID
Data Transaction Identifier
Site Name

Physical Street Address

Physical City

Physical State/Province

Physical Country

Physical Zip Code

Mailing Street Address

Mailing City

Mailing State/Province

Mailing Country

Mailing Zip Code

SIC
FTC Standard Industrial Classification Code
Number of Employees

Description

Federal ID Code
AFS Plant ID
Federal ID Code #2
EPA Key Identifier Code
ORIS Code
The Office of Regulatory Systems (ORIS)
code (Department of Energy)
Jurisdiction

State/Local Site ID

Confidentiality Indicator

Jurisdiction (Secondary)

Dun & Bradstreet Number


-------
Figure 7. Emission Unit Attribute Table
Data Attribute
Notes
Unit UID

Site UID

Physical Unit Geographic UID

Federal ID Code
AFS Point ID
Description

Emission Unit Type Code

Number of Units

Confidentiality Indicator

Date Installed/Modified

Status

Rule Applicability

Design Capacity

Design Capacity Units

State/Local Emission Unit ID

Status


-------
Figure 8. Emission Process Attribute Table
Data Attribute
Notes
Process UID

Site UID

Unit UID

Start Date/Time

End Date/Time

Description

see

Process SIC

Federal ID Code
AFS Segment ID
AMS Code

Winter Throughput Percentage

Spring Throughput Percentage

Summer Throughout Percentage

Fall Throughput Percentage

Annual Average Hours Per Day

Annual Average Days Per Week

Annual Average Hours per Year

Annual Average Weeks Per Year

Heat Content

Sulfur Content

Ash Content

Confidentiality Indicator

Material

Material Description

State/Local Process ID


-------
Figure 9. Activity Attribute Table
Data Attribute
Notes
Site UID

Unit UID

Process UID

Start Date/Time

Process Rate/Throughput

Confidentiality Indicator

Maximum Actual Throughput

End Date/Time

Throughput Method Code

Reliability Indicator
Data Attribute Rating System (DARS)score
Unit of Measure Text

Unit of Measure Expression

Period Average Hours Per Day

Period Average Days Per Week

Period Average Hours Per Year

Period Average Weeks Per Year


-------
Figure 10. Emissions Attribute Table
Data Attribute
Notes
Unit UID

Process UID

Start Date/Time

End Date/Time

Pollutant Code
Such as VOC, PM, NO,
Emission Type
Such as Actual, Potential, Maximum
Numeric Value

Unit of Measure Text

Unit of Measure Expression

Confidentiality Indicator

Control Status
Controlled or Uncontrolled
Emission Estimation Method Code
Such as Stack Test, Emission Factor
Reliability Indicator
DARS score

-------
TECHNICAL REPORT DATA
X. REPORT NO.'
EPA/600/A-97/101
2.
iliumriiiiii
PB98-116288
4. TITLE AND SUBTITLE
Development of the EIIP Emission Inventory Data
Model for purposes of facilitating data transfer
between the states and the U.S. EPA
5.REPORT DATE
6.PERFORMING ORGANIZATION CODE
7. AUTHOR(S)
John Slade1, William Benjey2, Andrew Blackard3, and
Michael Pring3
8.PERFORMING ORGANIZATION REPORT NO.
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Pennsylvania Department of Environmental Resources
P.O. Box' 8468, 400 Market St., Harrisburg, PA 17105
2 Atmospheric Sciences Modeling Division, ARL, NOAA,
Research Triangle Park, NC{on assignment to
National Exposure Research Laboratory, US EPA,
Research Triangle Park, NC)
3Eastern Research Group, Inc., 1600 Perimeter Park,
Suite 300, Morrisville, NC 27560
10.PROGRAM ELEMENT NO.
11. CONTRACT/GRANT NO.
12. SPONSORING AGENCY NAME AND ADDRESS
NATIONAL EXPOSURE RESEARCH LABORATORY
OFFICE OF RESEARCH AND DEVELOPMENT
U.S. ENVIRONMENTAL PROTECTION AGENCY
RESEARCH TRIANGLE PARK, NC 27711
13.TYPE OF REPORT AND PERIOD COVERED
14. SPONSORING AGENCY CODE
15. SUPPLEMENTARY NOTES
16. ABSTRACT
The Emission Inventory Improvement Program (EIIP) was formed with the cooperation of
the State and Territorial Air Pollution Program Administrators/Association of Local
Air Pollution Control Officials (STAPPA/ALAPCO) as a cooperative effort between
State and local agencies, industry, and EPA to improve emission estimates and
establish'a National Standard for Emission Data Exchange. Emission data are needed
for a variety of purposes, including regional air quality modeling, air quality
trend planning, trends analysis, and for informing the public. The Data Management
Committee (DMC) of the EIIP was charged with the goal of developing the data
exchange standard, and undertook the task of developing a data model on which to
base the data exchange format. The purpose of developing the data model was to
establish a standardized set of data relationships which reflect physical reality
and ensure accurate and consistent reporting of emissions and emission related data.
17. KEY WORDS AND DOCUMENT ANALYSIS
a. DESCRIPTORS
b.IDENTIFIERS/ OPEN ENDED
TERMS
c.COSATI



18. DISTRIBUTION STATEMENT
19. SECURITY CLASS (This
Report)
21.NO. OF PAGES
20. SECURITY CLASS (This
Page)
22. PRICE

-------