Model for Information
Integration
A Preview of the
ERA'S Target
Architecture [EIA]
Components of
Information
Of FICE OF
ENVIRONMENTAL
INFORMATION
July
2DD2
-------
Table of Contents
Preface
Purpose of the Document 1
Intended Audience 1
Scope of the Document 1
Relationship to Other IT Planning & Design Documents 2
Structure of the Document 3
Acknowledgment 4
Chapter 1: Introduction
1.1 Integration Defined 5
1.2 Why an Environmental Information Architecture 5
1.3 Progress 7
1.4 Overview 9
1.5 Benefits 12
Chapter 2: EIA: Connect & Exchange
2.1 Background 15
2.2 Critical Features and Operations 16
2.3 User Registration Services 16
2.4 Data Collection and Exchange Network Services 16
2.5 Access Services 18
2.6 Benefits of the Enterprise Portal 20
2.7 Issues and Next Steps 21
Chapter 3: EIA: Process & Stage
3.1 Background 23
3.2 Critical Features and Operations 24
3.3 Benefits of Integrating Program and Regional Systems 27
3.4 Issues and Next Steps 27
Chapter 4: EIA: Store for Use
4.1 Background 29
4.2 Critical Features and Operations 30
4.3 Data Warehouse Services 30
4.4 Registry Services 31
4.5 Geospatial Data Services 34
4.6 Metadata Services 35
4.7 Benefits of the Enterprise Repository 36
4.8 Issues and Next Steps 37
Chapter 5: EIA: Use
5.1 Background 40
5.2 Critical Features and Operations 40
5.3 Benefits of Decision Support Tools 42
5.4 Issues and Next Steps 43
Model for Information Integration — Preface
-------
Table of Contents (continued)
Chapter 6: EIA: Foundation
6.1 Background 45
6.2 Critical Features and Operations 46
6.3 Benefits of the Foundation (Architecture, Standards, and Policy) 50
6.4 Issues and Next Steps 51
Chapter 7: Summary of Issues and Next Steps
7.1 Overview 53
7.2 Management Actions 54
Appendix-A: Glossary of Terms
Appendix-B: References
Appendix-C: Other Registries
Appendix-D: EPA's "Reinventing Environmental Information (REI)" Systems
List of Figures and Tables
Figure 1: EPA Enterprise Architecture Business Domains 1
Figure 2: Relationship of Architecture and Exchange Network Documents 2
Figure 3: June 26,2001 figure depicting components of EPA integration architecture 8
Figure 4: Information Lifecycle 9
FigureS: The Model for Information Integration 10
Figure 6: Enterprise Portal Manages Flows in and out of Agency 14
Figure 7: Operational Database Prepare Data for Storage and Use 23
FigureS: Enterprise Repository Decision Support Components 29
Figure 9: Envirofacts Information Warehouse 31
Figure 10: Decision Support Tools Use Data in the Enterprise Repository 40
Figure 11: EPA Enterprise Architecture Framework 47
Figure 12: EPA Enterprise Architecture Conceptual Framework (Expanded) 48
Figure 13: Data Standards Setting Process 49
Table 1: Alignment of Information Quality Lifecycle with Model for Information Integration 9
Table 2: EIA Core Components & Functions 11
Table 3: EPA Enterprise Portal User Sets 17
Table 4: Exchange Network Partner Requirements & Supporting Projects 18
Table 5: Current and Anticipated Use of CDX 19
Table 6: EPA's Registry Systems 33
Table 7: Foundation Components and Activities 46
Model for Information Integration — Preface
-------
PREFACE
PURPOSE OF THE DOCUMENT
This document is to be used as a high-level architecture guideline for integration at the Environmental Protection
Agency (EPA). It proposes a series of "core components," i.e., technologies, policies, plans, and services that will
enable more efficient and effective management of information resources and will support EPA's participation in the
National Environmental Information Exchange Network. This document will also serve as a basis for business case
analysis, user requirements analysis, and other decision-making needed to complete the target Environmental Informa-
tion Architecture (EIA).
INTENDED AUDIENCE
The intended audience of this document includes: (1) members of the Quality and Information Council; (2) managers
and staff who oversee Program system development and operations; and (3) Agency staff who will develop the EIA.
This document requires that the reader have some basic knowledge of Information Technology (IT) management and
enterprise architecture development.
SCOPE OF THE DOCUMENT
EPA is in the initial stages of developing an enterprise architecture, i.e., a definition of its business and a description of
the processes, data, applications, and technology that support it (Spewak, 1992). This document addresses core
components in the EIA, one of the three business domains of EPA's overall enterprise architecture (Figure-1). When
complete, the EIA will define the processes, data, applications, and technology that support environmental management.
As depicted in Figure 1, the EIA is segmented further to reflect the Agency's major business functions. The EIA will
serve as a blueprint for future design and implementation of an integrated infrastructure.
Environmental Information
Architecture (EIA)
Administrative
Architecture
Research &
Development
Architecture
Ambient Monitoring
Substance Risk & Hazards
Pollution Sources
Figure 1: EPA Enterprise Architecture Business Domains
Model for Information Integration — Preface
-------
RELATIONSHIP TO OTHER IT PLANNING AND DESIGN DOCUMENTS
Over the last two years, a number of key planning documents encompassing the Network, EPA's infrastructure, and
individual projects have been generated under the aegis of the IIP. Figure-2 arrays these documents to show their
relationship. The rest of this section briefly describes the purpose of each document and its relationship to the Model
for Information Integration.
Information Agenda (Vision)
Information Integration Program Management Plan (Mgmt. & Oversight)
Activity
Architecture
(Defining)
Construction
(Design &
Implementation)
EPA Architecture (Internal)
Model for Information Integration
Geospatial Baseline Assessment
EPA Enterprise Architecture-EIA,
Administrative, Research and
Development (R&D)
Architecture Sequencing Plan
Individual Component Systems
Design, Implementation, or
Modernization plans
National Environmental Information
Exchange Network (External)
Network Blueprint
Network Implementation Plan
Figure 2: Relationships of Architecture & Exchange Network Documents
Information Agenda (EPA(IO), 2002) sets a cross-Agency vision for how information and technology will be
managed to support EPA's mission and overall strategy. The Enterprise Architecture provides the blueprint for achieving
this vision.
Information Integration Program Management Plan (EPA(ll), 2002) provides an umbrella structure for
tracking and linking the design and implementation of EPA's enterprise architecture with the National Environmental
Information Exchange Network.
EPA Enterprise Architecture (EPA(6), 2001). EPA's enterprise architecture will define the business, information,
technologies, and transitional processes necessary to support Agency mission, strategy, and to respond to changing
needs. The Enterprise Architecture is broken into three business domains: Environmental Information; Administrative;
and Research and Development. This document defines the core functions and components of the target EIA.
Geospatial Baseline Assessment (EPA(9), 2001). This document is a product of EPA's National Geospatial
Program, it contains an overview of geospatial data, tools, and technologies used throughout EPA and how geospatial
resources are used to implement the Agency's mission. As such it is a direct input to baseline EIA. The Geospatial
Blueprint, due in Spring, 2002, will help link the three architectures together because geography (i.e., place) is a com-
mon variable to all three of the EPA enterprise architecture business domains.
Model for Information Integration — Preface
-------
EPA Project Documentation. This model uses existing blueprints, plans, and documentation for EPA systems,
Programs like the National Geospatial Program, and EPA's Data and Information Quality Strategic Plan (EPA(4),
2002). For example, it highlights the Central Data Exchange (CDX) design to illustrate EPA's participation in the
Network and to inspire the design of a set of access services that complement and build upon the existing CDX ser-
vices.
Blueprint for a National Environmental Information Exchange Network (State/EPA 1MWG, 2000) ex-
presses an initial conceptual model of the Exchange Network. It is an analog to this document. Both documents are
linked by way of the descriptions of EPA's node responsibilities as a partner on the Exchange Network.
National Environmental Information Exchange Network Implementation Plan (State/EPA INSG, 2002)
provides a roadmap for building the Exchange Network. Contains a business case, implementation strategy, and
identifies key projects, objectives, and key milestones for tracking progress.
STRUCTURE OF THE DOCUMENT
This document begins with an introductory chapter that provides an overview of EPA's IT environment. The first
chapter also cites drivers for integration, an overview of the conceptual model for integration, and illustrates the align-
ment with EPA's Information Quality Lifecycle. The remaining chapters are then organized by IT function and introduce
the core components for integration. These chapters introduce the function; background on how this functionality is
currently carried out; critical features and operations; and existing EPA projects and efforts that either directly or poten-
tially play a role. Most importantly, these chapters highlight key technical and policy issues that must be analyzed,
debated, and decided upon as part of the development of the target EIA. The last chapter presents key technical and
policy issues associated with defining and implementing the EIA in the next few years. A glossary (Appendix -A), a list
of references (Appendix-B), Other Registries (Appendix C), and a list of EPA national systems (Appendix D), are
provided for the interested reader.
ACKNOWLEDGMENT
The components proposed in this model evolved from two key preliminary architecture documents: (1) the FY2001
Information Integration Initiative Management Plan (EPA, 2000); and (2) March 2001 EPA Enterprise Architecture
submission to the Office of Management and Budget (EPA(6), 2001).
The development of this document began with a June 26,2001 IIP Team retreat with over thirty EPA staff. In this
meeting a proposed target EIA (Sullivan, et al, 2001) was presented and discussed. The conceptual model presented in
this document is based on a number of interviews and meetings with project managers who oversee many of the
projects described in this document.
This document builds upon the Blueprint for EPA Environmental Information Integration deliverable produced Con-
tract 68-W-99-038, Work Assignment #046.
Model for Information Integration — Preface
-------
The following EPA staff contributed to this document:
Primary Authors
Heather Anne Case
Linphord Darlington
Eugene Durman
John Sullivan
Bewanda Alexander
Jennifer Cranford
Evangeline Cummings
Diane Esanu
Patrick Garvey
Sara Hisel-McCoy
Barbara Jarvis
Matthew Leopard
Debra Villari
Jeffrey Wells
Jeffrey Worthington
Key Contributors
John Armstead
Wendy Blake-Coleman
Tim Crawford
Ron Decesare
Larry Fitzwater
Debra Forman
Steven Goranson
Bill Grabsch
Steve Hufford
Rashmi Lai
Bill Muldrow
Ling Wan
Dave Wolf
Steve Young
Model for Information Integration — Preface
-------
1.0 INTRODUCTION
This document proposes a set of the core component technologies, policies, plans, and services that will enable
integration at EPA. It shows how these components support four high-level IT functions. This model is input to the
target Environmental Information Architecture (ELA). It is intended to be a vehicle for discussion and development over
the 2002 fiscal year.
1.1 INTEGRATION DEFINED
Integration has a variety of meanings at EPA. In the context of systems, it is used synonymously with
"interoperability," i.e., the ability of a system to use the parts of another system. For processes, integration is synony-
mous with "consolidation," i.e, the process of uniting.
In the context of data and information it has special meanings:
Graphical Integration - Data elements co-located on a map, table, or statistical graph.
Data Integration/Reconciliation - Linking a single data element across multiple programs or media (facility, chemical,
or substance) to reconcile differences in how EPA Programs/States collect the same data.
Analytical Integration - Summation of multiple indicators into a single index of performance, compliance, or health.
Database Integration - Combining data elements from multiple Program systems or databases into a single collabora-
tive data schema for cross-media, cross-program reporting.
Merriam Webster defines integration as the ability to, "form, coordinate, or blend into a functioning or unified whole"
(Merriam Webster, 2002). For the purposes of this document, integration is broadly defined as the unification of
processes, data, applications, or technology. Integration can be physical, virtual, or in some combination.
Data and databases need not be centralized to be integrated. For example, distributed data may be virtually
integrated through the use of multi-system queries. The overall purpose is to increase efficiency and enable
more effective use of data and technology resources.
1.2 RATIONALE FOR AN ENVIRONMENTAL INFORMATION ARCHITECTURE (EIA)
The target EIA will be derived from a thorough analysis of existing Agency goals, business processes, data, applica-
tions, and technology associated with environmental management. Once complete, it will enable an enterprise view, i.e.,
the ability to look across EPA's collection of information and technology resources, to better align these resources to
support Agency goals and operations.
HISTORY
To date, an enterprise view has been hampered by the decentralized nature of EPA's Information Technology/Infor-
mation Management (IT/IM) environment. This arose largely from the manner in which the EPA Programs were
created. EPA's enabling statutes are generally single media in scope which has led to the independent program manage-
ment and systems development. EPA is not the single entity many outside its walls may perceive, but a disparate set of
environmental and administrative programs.
Model for Information Integration — 1.0 Introduction
-------
These circumstances are common to private and public sector organizations. Historically, the falling prices of technol-
ogy and the growing ease in which technology could be implemented promoted decentralized IT management, typically
along organizational boundaries (Cook, 1996). While this supports flexibility it can also lead to redundant processes, a
limited enterprise view, and use of incompatible technologies. These issues were recently illustrated at a national level
when U.S. government agencies scrambled to coordinate efforts to respond to the September 11,2001 terrorist
attacks. Since then many officials recognize and share a serious concern about the "lack of intelligence sharing by the
government" (The Economist(l), 2001).
The effects are felt by EPA stakeholders, the public, and EPA employees. States and the regulated community have
to "feed" multiple Program systems, using a variety of transmission formats, to fulfill environmental reporting obligations.
The public, seeking information about their safety, receive data that mirrors EPA's organizational structure or falls short
of providing a complete picture of environmental conditions of interest. And finally, these circumstances have slowed
EPA from pursuing the cross-media environmental protection strategies necessary to address the emerging environmen-
tal problems. Seemingly simple queries to support, for example, a cross-agency chemical initiative are burdensome
tasks for EPA.
The IIP mission is to transform the Agency's business and operations through the use of information technology and
policy. This will require evolution from our current environment to a distributed environment marked by a cohesive,
interdependent set of processes, data, applications, and technology. A distributed IT/IM environment is highly desirable
as it maximizes the flexibility of decentralized computing and the coordination advantages of a centralized approach
(Cook, 1996).
The challenges that EPA faces are much like those the U.S. government must overcome to fuse different levels of
government together to produce a sound, cohesive base of intelligence. The problems we face in filling the gaps and
making the linkages, whether it be across government or across EPA, are caused by budget distribution, lack of coordi-
nation, and political will (The Economist(l), 2002).
In order to achieve a distributed environment, the Agency must adopt data and technology standards, promote the
use of some centrally managed services and systems, and approach IT investment and decision-making with an, "All
For One - One For All," mindset. When organizations within an enterprise agree that it's not their area's data or
system, but instead resources to be shared by the enterprise, will the benefits of distributed computing will be realized
(Carr, 2000). The EIA is the mechanism for planning and managing this change.
FEDERAL REQUIREMENTS
Many government agencies have undertaken enterprise architecture efforts to assess their IT resources and the extent
to which they support organizational goals. At the federal level, enterprise architecture efforts are conducted not only as
a good business practice, but to fulfill the requirements of the Clinger-Cohen Act which requires that Agency's Chief
Information Officers maintain and implement a sound integrated architecture (Clinger-Cohen Act, 1996).
While this requirement has been on the books for over six years, many agencies have been slow to develop and
incorporate an enterprise architecture into their planning and investment processes. Because of this lag, the Office of
Management and Budget (OMB) has begun to monitor Agency IT investment budgets with a close eye towards redun-
dant technology investments and how an Agency manages those investments to accomplish their mission (Petruccelli,
2001).
Model for Information Integration — 1.0 Introduction
-------
E-GOVERNMENT
Enterprise architecture has also received considerable attention in light of the Bush Administration's focus on "e-
govemment," defined as, "The use of digital technologies to transform government operations in order to improve
effectiveness, efficiency, and service delivery,." (Forman, 2001). The focus on e-government has been further expressed
as the "Quicksilver Initiative," managed out of OMB. Governed by the President's Management Council, the Quicksil-
ver Initiative rallies Federal agencies to "simplify and unify" processes and technology to deliver better government
services to stakeholders and the public. E-government funds have been allocated to federal agencies who apply technol-
ogy to enable: (1) intergovernmental exchange; (2) government to citizen services; (3) government-to-business services;
and (4) internal efficiency and effectiveness.
This last aspect of the Administration's e-gov strategy, "internal efficiency and effectiveness," is receiving particular
attention. This is demonstrated in part by OMB's growing scrutiny of federal IT investments, in which perceived
duplicative funding is being questioned and in some cases rejected. It is also likely that this scrutiny will continue as IT
budgets shrink given the re-distribution of federal dollars to support emerging military and homeland security priorities.
The development and management of an enterprise architecture is the means by which EPA will achieve "internal
efficiency and effectiveness." EPA's baseline enterprise architecture will be completed early in FY02 (EPA(5), 2001)
and will define existing business processes, data, applications, and technology. This "as is" inventory, analyzed within
the context of the Agency's mission, is basis for identifying redundancies and proposing IT configurations that will enable
the Agency to be more effective in achieving its mission.
Efficiency, defined as "doing the job right," and effectiveness defined as, "doing the right job," (Drucker, 2001) need
both be the goals of EPA's target EIA. It is important to note that streamlining redundant data collections, applications,
and processes will improve efficiency, however, it does not ensure that the Agency is "doing the right jobs" to be most
effective in carrying out its mission. For example, the QIC has endorsed the Central Data Exchange as the single portal
for all incoming environmental data flows (EPA(l), 2001). In some cases, Programs are even coupling the use of CDX
with consolidated reporting requirements (EPA(3), 2001). This generates efficiency gains in two areas: business pro-
cess and data collection. However, this decision does not ensure that the data collected, or the manner in which it is
collected, improves EPA's ability to carry out is mission. This situation underscores the need to examine architectural
options within the context of Administration priorities, governmental trends, and emerging scientific research so that
technology is used not just to "pave the cow paths" for efficiency's sake, but to use technology to transform the business
of environmental protection.
1.3 PROGRESS
An initial vision for the EIA was described in the Enterprise Architecture submission to OMB, dated March, 2001
(EPA(6), 2001) and presented at the June 26,2001 IIP Team retreat to promote project coordination (Figure 3).
The conceptual model presented in this document updates Figure 3. The systems identified in this figure are de-
scribed in terms their existing and potential functionality. For example, the CDX portal concept is expanded to include
a common set of registration services and access services which are described as an enterprise portal. Within a func-
tional framework this model also provides more detail on how the components interact.
Also pertinent to this conceptual model is a set of principles approved by the QIC in October, 2001 (EPA(l), 2001).
Based on this approval Agency Programs are expected to:
Model for Information Integration — 1.0 Introduction
-------
Partners
Exchange
Network
EPA Integration
System
of
Access
(Internal/External)
Tools and Access/
Mechanisms
Modernized
Program and
Regional
Systems
Foundation
Information Architecture and Data Standards
Figure 3: June 26,2001 figure depicting components of the
EPA Integration architecture
document.
• Rely on a single portal (Central
Data Exchange) for all environmental
data flows. The Exchange Network
Blueprint calls for each partner to
have one node on the Exchange
Network. In this model, the CDX is
the major part of an Enterprise Portal
concept. It serves to manage all
incoming data flows and exchanges
via the Exchange Network.
• Adhere to data standards. Data
standards are the cornerstone of the
Exchange Network and an important
part of the foundation of the Environ-
mental Information Architecture. EPA
currently has six data standards in
place. The idea that data standards
will be managed and enforced within
EPA is a guiding principle of this
Rely on registries as the Authoritative Data Sources. There is a core set of data that is used repeatedly throughout
the Agency and among stakeholders that forms the basis for most cross-media, cross-program analysis. This
information includes items such as facility information and chemical information. Establishing one authoritative
source for this type of data is an essential step in reducing duplication and incompatibility among EPA's data.
Registries can also serve as automated normalizing agents to all standardized data received and transferred by the
Agency. Information that varies from standardized protocol would be normalized or would include a notice
indicating the discrepancy.
Provide access to integrated Agency data and shared datasets. The ability to seamlessly access and analyze data
from various Program Offices has long been considered an ability that would help EPA and environmental stake-
holders get a better picture of the environment. This can be achieved by making the data that is needed for cross-
media analysis easily accessible using standard access methods and protocols. In this model the System of Access
and Enterprise Repository are proposed to support this principle.
Consolidate and integrate systems consistent with the vision. All Programs are expected to continue to modernize
their systems to meet their changing information needs. Just as this model seeks to promote integration across
Programs, many Program Offices already have efforts underway to consolidate and integrate the systems within
their offices.
Rely on Enterprise Architecture for all IT planning and investment. The Enterprise Architecture will serve as the
primary tool for planning and managing information technology. This document will be the basis for developing the
target EIA.
Model for Information Integration — 1.0 Introduction
-------
1.4 OVERVIEW
INFORMATION LIFECYCLE
At the outset, it is important to establish a basic framework
to identify and assess the IT functionality that supports data
management: from data collection planning through data use.
The Agency recently drafted a Data and Information
Quality Strategic Plan which contains six major recommenda-
tions to address the Agency's overarching data and informa-
tion quality vulnerabilities (EPA(4), 2002). This plan estab-
lishes an information lifecycle as depicted in Figure-4. The
lifecycle is broken into five universal stages (planning - collec-
tion/analysis - assessment - transfer/storage - use) and steps
within the stages. Each step is further defined to help identify
vulnerabilities, develop corrective actions, and measure
quality.
Information Life Cycle Model
(or identifying data quality vulnerabilities
Planning
1 .. 2
Figure 4: Information Lifecycle
Across the Agency numerous processes, data, applications, and technology support each of the five major stages of
the information quality lifecycle. Integration can occur at many points, e.g., at the point at which reporting requirements
are consolidated, or at the point at which data are analyzed, e.g. the geographical integration through the use of a
geospatial tool. The model in this document presents 4 major IT functions that align with the later stages of the Informa-
tion Lifecycle: from the point at which data are collected by EPA from an external source (Transfer/Storage) to the point
at which it is used (Use), i.e. from Steps 9-12 in Figure #4.
Table-1 shows how the IT functions described in this document align with the information quality lifecycle and mea-
surable quality characteristics.
Information Lifecycle Stage
Planning
Collection/ Analysis
Transfer Storage
- Data entry
- Transfer/Process
- Archive to master database
Use
Measureable Quality
Characteristics
completeness, correctness,
representativeness, validity, consistency,
variability, etc.
completeness, accuracy, timeliness,
measurement quality, adherence to
standards, oversight/audits
completeness, correctness, conforms to
specifications, e-format, verification,
validation, error checks
completeness, integrity, usability,
accessibility, presentation
Model for Information Integration -
IT Function
Not in the scope of
this document
Connect and Exchange
Connect and Exchanae
Process and Staqe
Store for Use
Use
Table 1: Alignment of Information Lifecycle with Model for Information Integration
Model for Information Integration — 1.0 Introduction
-------
Data or Information?
This document defines data as information trans-
lated into a form that is more convenient to move or
process. Information is data presented to meet
user expectations. Data presentation must be user
friendly and impart some meaning to the data
(Worthington,2001)
The connection between these two efforts is important, hi
late FY2001, OMB finalized guidelines that require federal
agencies to develop and institutionalize practices to, "Ensure
and Maximize the Quality, Objectivity, Utility, and Integrity of
Information Disseminated by Federal Agencies," (OMB,
2001).
The OMB guidelines require that Agencies set a quality
performance goal and measure progress towards that goal,"
throughout the creation, collection, maintenance, and dissemination," of information. These stages align well the EPA
Information Lifecycle and can be linked to the IT functions in this model. Information Lifecyle stages and follow-on
guidelines should take into consideration and leverage IT functions in this model and other emerging information value
chain models (EPA(12), 2002). One approach is to use the architecture as the structure in which to implement the Data
and Information Quality Strategic Plan recommendations and follow-on development of quality indicators.
MODEL FOR INFORMATION INTEGRATION
It is important to note that in EPA's current computing environment each of the four IT functions presented in this
document are carried out a by a majority of EPA's Program systems. This functional framework, thus, is a useful mecha-
nism to inventory and analyze business processes and technology to identify redundancies. The framework can be used
to develop solutions for more efficient use of EPA's IT resources.
The model proposed in this document recasts existing EPA systems, services, and policies and proposes new ones in
terms of this IT functional framework. The new EIA core components and their associated function are depicted in
Figure 5 and defined in Table 3.
It is important to note that in this
model operational processing is
supported by the Central Data Ex-
change and the operational databases
components. The decision support
processing is supported by the Enter-
prise Repository and analytical tools.
Collectively these decision support
components support business intelli-
gence, i.e., they capture organizational
data from separate sources and
present it to decision makers in a user
friendly way (Microsoft, 2001).
To illustrate how the components
work, a sequence of activities is
presented for two common informa-
tion transactions.
EPA Target Architecture
EXCHANGE NETWORKS
4—
USE
Non-government
Partners
Government
Partner.
EPA Users
Intranet ] -«1>• program Support
Public Access
~~ * Decision Support
Extranet
STORE for USE
Enterprise
Repository
Metadata Holdings
Catalog
Shared Geospatial Data
Central Registries
Data Warehouse
Operational
Databases &
Applications
Management Practices
(Architecture, Policies, Standards, Security)
Figure 5: The Model for Information Integration
10
Model for Information Integration —1.0 Introduction
-------
ENVIRONMENTAL REPORTING
An external user, an Exchange Network partner, or member of the regulated community, registers once to submit and
access information. Access rights are defined by user type and nature of the data submission. Following registration, a
user logs on and connects to EPA services through an Enterprise Portal. The Enterprise Portal is simply defined as a
gateway to the many services the Agency maintains. The user accesses CDX services and transmits the data. CDX
extracts data, validates, perhaps against a registry, and sends a confirmation to the user. Data are passed onto the
Program where it is temporarily stored for further quality control services. Validation and verification against registries
may occur, as well as error correction and quick informal data analyses. Once complete, a Data Steward approves the
data for storage in Enterprise Repository. Data in the Enterprise Repository is considered authoritative and ready for
Agency-wide and external analysis.
DATA AND TOOL ACCESS
An internal or external information user logs on and connects to EPA services through an Enterprise Portal. The
System of Access, using the data contained in the key registries directs the user to data held in the Enterprise Repository
as well as a suite of decision support tools. Communication channel and level of access will be dependant on user type.
Within the EIA, the high-level IT
functions are supported by core
components. In turn, the components
are defined in terms of services,
policies, and processes. These are
further supported by projects or
systems. This is basis for organization
of the remaining chapters.
It is important to note that this model
presents a cohesive set of technologies
and services, when in fact, many of the
supporting projects operate indepen-
dently. Some of the core components,
like the Enterprise Portal are, at this
time, merely concepts presented for
discussion. This model also presents
very prominent data warehouse and
emphasizes a processing role for
Program and Regional systems. There
are a variety of options for maintaining
Program and Regional systems- they
will be explored as part of the EIA
research discussed in Chapter 6.
OOigOMM DT
tftHBSGta
Connect and Exchange
(Operational Processing)
Process and Stage
(Operational Processing)
Store for Use
(Decision Support)
Use
(Decision Support)
Foundation
[po&rsws
(TsHffljpSGiXSiifi
EPA Enterprise Portal
Program Systems
Enterprise Repository
Decision Support Tools
Enterprise Architecture
Data Standards Program
IT Policies
Secunly
CCuMsgi] (FsaffiEtgi)
User Registration and
Authentication Services
Data Collection and
Exchange Network Services
Access Services
Operational Database
Services Transformation,
Load, Maintenance, and
Quality Control Services
Data Warehousing Services
Central Data Registry
Services
Geospalial(Geo) Services
Metadata & Holdings
Catalog Metadata Services
Environmental Analysis
Services
Analytical Models &
Guidelines
Administration and 1RM
IMSDBB^C SSlSOEpSCOtag
IftepeoisD
CDX registration, TSSMS
CDX
System of Access,
www epa gov
EPA Public Access
Strategy
e g , SDWIS, AQS
Envirofacts
Geospatial Data Services
FRS,SRS,TRS
Integrated Geo Database,
Geospatial Bluepnnt
EIMS, EDR, IRRS
Window to My Env
TRI Explorer
Guidance for Data Quality
Assessment Practical
Methods for Data
Analysis,
Data Standards Program
XML TAG
Security Program
Table 2: EIA Core Components & Functions
Model for Information Integration — 1.0 Introduction
11
-------
1.5 BENEFITS
The overall benefits of integration are increased efficiency, effectiveness, and quality. These benefits will be derived
from efforts to streamline processes, to standardize some aspects of data and technology, and to promote the use of
common core component services. Benefits of the components themselves are described in each chapter but generally
fall into these four broad categories:
BETTER ALLOCATION OF RESOURCES
Adopting the core components described in this document has the potential to free Agency Programs from the
mechanics of information management, i.e., the collection, staging, and storage of data. These core component services
will permit more cost effective allocation of resources, both budget and human, to focus on the planning and guidance
needed to "get the right data" for supporting the Agency's mission and goals.
IMPROVED QUALITY
The enterprise architecture is the means by which organizations plan for quality (Spewak, 1992). Spewak makes a
direct link between Deming's 14 Points of Quality and the 14 Points of data quality derived through the definition and
implementation of an architecture. The enterprise view and prospective IT planning derived from the architecture will
enable the Agency to implement standards and policies, simplify processes, and implement a common structure for
monitoring data quality throughout the lifecycle. These will lead to overall improvements in quality.
BETTER USE DATA AND INFORMATION
By relieving Agency Programs from the mundane operational tasks of "keeping the data right" resources can be
focused on "getting the right data" to manage an environmental problem. This can range from identifying new data
needs, arranging data partnerships, or developing data collection guidelines or standards to assure consistency and
comparability. EPA resources may also be re-allocated to focus on the development of analytical models and analytical
tools that will enable the Agency to focus on business intelligence, i.e., the ability to predict the future impact of current
decisions (Inmon, et al, 2001).
RESPONSIVENESS TO THE NEEDS OF EPA STAKEHOLDERS AND INFORMATION USERS
Integration also has direct benefits for information users as many of the EIA components will enable either timely
access to needed information or more productive transactions with the Agency. These benefits can be cast in terms of
the type of EIA information user.
EPA Decision-makers - The Centralized Data Registries within the enterprise repository are intended to provide easy
access, through linkages, references, and the maintenance of authoritative lists to the wealth of existing data and infor-
mation currently held throughout the Agency. Metadata maintained in these registries will also promote a better under-
standing of the data. Currently EPA scientists and other analysts have "to fish" either by Web browser or through word
of mouth to look for existing data and information resources. Once received, data may not be adequately characterized
or documented to ensure credible analysis. Because this can be a frustrating and time consuming, contractors are often
hired to do this, or worse, work is replicated because information is "hidden" somewhere in the Agency.
12 Model for Information Integration — 1.0 Introduction
-------
Data Partner - The Central Data Exchange will be EPA's, "front door for environmental reporting." To date, States
with the delegated authority to collect environmental data and information have had to "feed" multiple Program systems,
using a variety of transmission formats, to fulfill reporting obligations. This is burdensome and has slowed States in their
efforts to integrate and modernize their own systems. The purpose of the National Environmental Information Exchange
Network, thus, is to reduce the burden of reporting and to decouple the State IT management from that of EPA.
Member of the public - The system of access presented in this model will enable members of the public to get to the
data, information, and tools they need, when they need them. While great strides have been made over the last eight
years to improve public access through resources like, www.epa.gov, the Envirofacts warehouse, and tools like
AirNow and Surf-your watershed, members of the public, like the EPA scientists, still have to search. This is com-
pounded by the lack a familiarity with EPA's organizational structure. The System of Access will help to get external
users to the resources they need when they need it.
SUMMARY OF INTEGRATION STRATEGY
The EIA is the mechanism to plan integration that create efficiencies or enable more effective use of information. This
model presents a set of core components that will help to streamline the mechanics of information collection, processing,
storage, and access across the Agency. The implementation and adoption of some set of core component services has
the potential to produce significant efficiencies to save time, money, and human resources.
This model also contains a set of decision support components - the Enterprise Repository and Decision Support
tools - which will support more effective use of information. The Enterprise Repository and Decision Support tools
together are proposed to capture data from disparate sources to present them to users in an easy-to-use format. Their
purpose is to enable study and management of key issues that require data from multiple sources both inside and outside
of EPA. Examples include urban sprawl, pesticide spray drift, total maximum daily loads to a waterbody, and homeland
security These components, along with analytical models and guidelines, can enable analysis of existing conditions from
multiple sources to forecast trends and environmental outcomes. Investing in these decision support capabilities will help
to: (1) link decisions and policy to environmental results; and (2) identify the activities that will make EPA more effective
in carrying out its mission.
Model for Information Integration — 1.0 Introduction 13
-------
2.0 ENVIRONMENTAL INFORMATION ARCHITECTURE —
CONNECT AND EXCHANGE
This model begins by
presenting the basic functional-
ity needed to support informa-
tion users, both internal and
external, as they interact with
EPA's information and tech-
nology resources. The Con-
nect and Exchange function is
the basis for analyzing EPA's
management of user connec-
tions and the flow of informa-
tion into and out of the Agency.
The Connect function
involves the coordinated
management of user identifica-
tion, registration, and security
procedures allowing informa-
tion users easy interaction with
the Agency.
Connect and Exchange
Enterprise Portal
Flows out of
Decision Support
Components
i
I
i
i
i
i
Flows into the Agency
i
i
I
Figure 6: Enterprise Portal Manages Flow
In and Out of the Agency
The Exchange function
encompasses coordinated
management of the information flows into and out of the Agency. It must be supported by processes, technologies,
and services that streamline data submission transactions. The Exchange function must also allow information users to
locate and access the data.
In this model, the concept of an Enterprise Portal is proposed as a means to encourage the design of a complemen-
tary set of processes, technologies, and services that support the Connect and Exchange function. This concept was
created in response to recent QIC approval to, "Maintain a single portal (i.e., CDX) for all environmental data flows,"
(EPA( 1), 2001). Currently, the CDX portal provides user registration and supports incoming data flows for the Na-
tional Environmental Information Exchange Network. In this model, the portal concept is expanded to include access
services.
The Enterprise Portal will provide the following services:
• User Registration Management
• Data Collection and Exchange Network Services through the CDX Portal
• Access Services through the System of Access
Each service is described in the "Critical Features and Operations" section.
14
Model for Information Integration — 2.0 EIA: Connect & Exchange
-------
EPA Enterprise Portal
An interface through which
people and organizations
electronically access EPA's
environmental information.
2.1 BACKGROUND
ENTERPRISE PORTAL
There are many definitions of a portal (Phifer, et al, 2001). In this document a
portal is simply a gateway to services. The Enterprise Portal is the means by which
to structure access to services and includes an interface people and organizations
use to electronically connect to EPA's and access environmental information.
In EPA's current computing environment, it is often difficult for users to connect to the EPA environment to either
transmit data or access information. The longstanding approach to connecting directly to EPA's environment requires
that EPA, State, local, and tribal users to maintain multiple logon ID's and interact with many contacts, To send data to
EPA, a State or regulated entity may be required to make data submissions to several different Programs. To retrieve
information from EPA, a user must first locate the appropriate data and access tools. This can be a difficult task be-
cause EPA has a variety of tools, each tool is independently managed, and there is little coordination among the tools.
To retrieve information from EPA, a user must diligently explore EPA's Web site to locate the appropriate tool. Depend-
ing on which tool the users chooses (s)he may get a different result because the tools may use different data sources.
An Enterprise Portal is proposed to help coordinate the management of user ID's. Users will need only one user ID
that will support their needs regardless of the type of transaction. Also, unlike the current access services which empha-
size access for the public and external stakeholders, the target environment described in this model provides special
focus on the access needs of EPA employees.
The Enterprise Portal concept was derived largely from the
National Environmental Information Exchange Network ("the
Network"). In order to be a Network partner, EPA must
maintain a node as follows (State/EPA IMWG, 2000):
EPA Node
The collection of capabilities, processes, data and
infrastructure supporting EPA services for the
National Environmental Information Exchange
Network. EPA's node is the CDX
• Each Network partner has only one node, although that
node may handle many kinds and types of data.
• The node is the only route for Network delivery and
receipt of information.
• The node is the single place for each member to present its standard node catalog of available information and
associated network metadata e.g., their Trading Partner Agreements. To be on the Network, the node must
present data and associated information.
• The node is the single place where each member implements the essential transport, security, and query protocols
described in the Exchange Network Blueprint and specified in a TPA.
• The node is the only place where a member's compliance with a TPA can be demonstrated or evaluated.
CDX is recognized as EPA's node of the Exchange Network. As described in Table #3 there are other features of
the Enterprise Portal, like the System of Access, and other core components like the Enterprise Repository that support
access by Network partners.
Model for Information Integration — 2.0 EIA: Connect & Exchange
15
-------
2.2 CRITICAL FEATURES AND OPERATIONS
Operationally, the Enterprise Portal will provide three groups of services:
• User Registration Services will coordinate management of user connections to the EPA's computing environment.
• Data Collection and Exchange Network Services will provide a central point for all EPA data collections and
support EPA's Node on the Exchange Network.
• Access Services will provide coordinated management of the flow of information out of the Agency through the
System of Access.
2.3 USER REGISTRATION SERVICES
A key feature of the Enterprise Portal is the control of user access to data. These are somewhat divided between
external and internal users:
REGISTRATION FOR EXTERNAL USERS
CDX customer registration supports EPA registration and authentication functions for legal filings to EPA's Programs
by the regulated community and Network flows with our many external agency partners. TSSMS ID supports regis-
tration functions for both internal and external (States, Tribes, Universities) users of our EPA Program systems.
REGISTRATION FOR INTERNAL USERS
Other registration services include ORACLE ID, LAN ID, and Email ID to name a few. These are internal registra-
tion services used to maintain security over EPA's internal systems. User Registration Services will provide coordination
to allow a user to have only one EPA user ID with access to multiple EPA systems. An initial segmentation of informa-
tion users, their access characteristics, and the communication channel they may use to conduct business with the
Agency is provided in Table-3.
PRIMARY SUPPORTING SYSTEMS
EPA has several systems that perform Registration Services but none of them may be considered a primary support-
ing system. Currently, Registration Services are supported by the Time Sharing Services Management System
(TSSMS), Oracle DBMS, the EPA LAN, and CDX. Additional analysis is required to develop a strategy to coordi-
nate these activities.
16 Model for Information Integration — 2.0 EIA: Connect & Exchange
-------
User Set Description
Access Characteristics
Communication
Channel
Anonymous
Public
Any Person
Read-only access to EPA datasets,
registeries, analysis capabilities, and
unrestricted information.
Internet
Registered
Public
Any person that pre-
registers with EPA for
password-protected
access
Read-only access to EPA datasets,
registeries, analysis capabilities, and
restricted information made available
according to an account registration
agreement.
EPA extranet, Virtual
Private Network, or
secure Internet channel
Regulated
Facility
Data submission from
regulated community by
authorized representa-
tives
Legal transmissions, electronic signa-
ture, Public Key Infrastructure.
Secure Internet Chan-
nel
Exchange
Network
Parnters
Authorized representa-
tive of other Federal
Agencies, States, Tribes.
Must pre-register with
EPA for authorized
password-protected
access
Access to Exchange Network capabilities
at EPA mode. For example:
• Supporting data exchange services
using data exchange templates (DET's).
• Providing Specific data for authorized
requesters.
• Providing a catalog of available
information and metadata on the node.
• Supporting access to registeries;
access to reliable and authoritative
sources for commonly used data.
EPA extranet, Virtual
Private Network, or
secure Internet channel
EPA Users
Any EPA authorized user.
Must pre-register with
EPA for authorized
password-protected
access
Access to all services at EPA Enterprise
Portal. Access to some capabilities may
require further authorization before a
user can access them.
EPA intranet
Table 3: EPA Enterprise Portal User Sets
2.4 DATA COLLECTION AND EXCHANGE NETWORK SERVICES
Data Collection Services provide: (1) a central point for collection of all data submissions to various EPA Program
systems and databases; and (2) the functionality and interface protocols necessary for EPA to maintain a node on the
Network.
Early in FY'01 the State/EPA Information Management Workgroup endorsed the Exchange Network Blueprint
which sets the minimum requirements for Network participation. It is important to note that these services encompass
both incoming and outgoing flows of data. Hence it is envisioned that some of these services will be provided by CDX,
EPA's node on the Network, while others will be supported in a complementary way by the System of Access and the
Registries (Chapter 4). These distinctions are noted in the summary of the following requirements:
Model for Information Integration — 2.0 EIA: Connect & Exchange
17
-------
Exchange Network Partner Requirement
Support the creation of a single point of exchange for all EPA
Programs systems data
Support data exchange services using data exchange templates
(DET's), including data flows to and from Exchange Network
Partners, as governed by Trading Partner Agreements (TPA's)
Provide specified data for authorized requester
Provide a catalog of available information and metadata on the
node
Support access to registries; access to reliable and authoritative
sources for commonly used data or code sets made available on
the Network
Provide information on Network data standards; used for building
DET's
Providing security support; PKI, SSI, Secure HTTP, support for
security levels 1-4
Primary Supporting Project(s)
CDX
CDX (flows into Agency) System of
Access (flows out of Agency to partners)
System of Access
Metadata & Holdings Catalog through
System of Access
System of Access/Centralized Data
Registeries
Environmental Data Registry through
System of Access
CDX & Foundation Component
Table 4: Exchange Network Partner Requirements and Primary Supporting Projects
PRIMARY PROJECT - CENTRAL DATA EXCHANGE (CDX)
When fully implemented, the Central Data Exchange (CDX) will serve as a single point of entry into EPA for environ-
mental compliance reporting in both electronic and paper forms. CDX is also a record-keeping and distribution point
for submissions to various Agency systems and databases. It supports all submitters of environmental information,
including industry, States, EPA Programs and systems, and other Federal agencies. CDX enables data transfer with
Exchange Network partners as well as access to the EPA data holdings for the purpose of data confirmation, data
update, and data status review.
CDX provides the following services:
• User registration
• Secure transmission of legal submissions
• Data receipt and data receipt and notification
• Maintain user mailboxes
• Archive data
• Manage security (e.g., authentication, encryption, virus scans)
• User support
• Training assistance
• Data format translation - XML
• Transaction documentation, including transfer transactions
18 Model for Information Integration — 2.0 EIA: Connect & Exchange
-------
Implemented Underway Proposed FY02 To Be Decided
National Emission
Inventory (NEI)
Unregulated
Contaminent Monitor-
ing Rule (UCMR)
Facility Registry
System Flows
Permit Compliance
System (PCS)*
Resources Conserva-
tion and Recovery Act
Info (RCRAInfo)*
Toxic Release Inven-
tory System (TRIS)*
Toxic Substances
Control Act Forms
TSCATS Forms
Safe Drinking Water
Info System (SDWIS)*
Air Quality System
(AQS)*
Storage and Retrieval
System (STORET)*
Air Facility Subsystem (AFS)*
System for Risk Management
Planning (SRMP)*
Continuous Release Emer-
gency Response Notification
(CR-ERNS)
Table 5: Current and Anticipated Use of CDX
In early FY'02 the QIC discussed a goal set forth for implementing CDX to support flows to EPA's "REI systems"
(Appendix-D) by FY04 (EPA(l), 2001). A summary of current and proposed system flows, including additional
systems appears in Table-5.
System of Access
A tool that will allow information users (Public,
Partners, and EPA Employees) to locate EPA's data
and the tools to access and analyze that data accord-
ing to their authorization level.
2.5 ACCESS SERVICES
The primary goal of Access Services is to streamline access
to Agency information resources. Access Services will
provide the policies, procedures, standards, and technology to
allow seamless access to EPA's data without requiring the user to understand a great deal about EPA's computing
environment or organizational structure. Access Services will be supported by the "System of Access".
It should be noted that Access Services are intended to provide access to internal as well as external data users. In
the past, EPA has focused on making integrated environmental information available to the public, with the assumption
that the public access tools could also support the needs of environmental decision makers within the Agency. Access
Services will provide a coordinated approach to managing access to EPA's data and tools that will make the information
and tools easier to access for internal as well as external users.
PRIMARY SUPPORTING PROJECT — SYSTEM OF ACCESS
When fully implemented, the "System of Access" component will allow internal and external information users to
easily locate EPA's data and the tools to access and analyze that data according to their authorization level. Through the
System of Access users will be able to connect to EPA's environment, log in (if necessary), search EPA's information
holdings, select items of interest, and download information or use the appropriate EPA-provided tools to access, or
analyze the information. The System of Access will allow read access to the data and will not support data collection or
data maintenance.
At it highest conceptual level, the System of Access is not only a "system" in the traditional sense of an application, it
is also a "system" in the sense that it combines features of the Enterprise Portal, Decision Support Tools, and the
Enterprise Repository into a cohesive unit that provides seamless access to EPA's data. Critical to the success of the
Model for Information Integration — 2.0 EIA: Connect & Exchange
19
-------
System of Access is the ability to coordinate among the various Decision Support Tools. As a coordination mecha-
nism, the System of Access will provide the structure to make it easier for Program Offices and other stakeholders to
develop and deploy Decision Support Tools that draw on the data in the Enterprise Repository.
System of Access services will be accessed through the EPA Enterprise Portal. Portal visitors will not actually see
the term "System of Access" at the EPA Enterprise Portal Web site but will be using the System of Access when
accessing appropriate hyperlinks.
System of Access services will include:
• A search capability that allows users to search the Centralized Data Registries, the Metadata and Holdings Cata-
log, and the XML registry for DETs TPA's, databases, geospatial tools, and Decision support tools.
• A data/information selector - After performing a search, the user should be able to select an item from their search
results list and be linked directly to the data or the appropriate analytical tool to retrieve the desired information.
• Convenient, flexible linkages to tools that provide the access and analysis of EPA's data holdings, i.e., Decision
support tools.
• A customizable portal that allows users easy access to commonly-used features.
• Access to related metadata information needed to understand the meaning and the implications of the data and
recommended background and contextual information, such as caveats and explanations, that promote informed
and responsible data use.
• A "help" capability, with well-organized materials prepared in advance, as well as some user support/response
function, and that may include pointers to other Network nodes.
• A feedback mechanism that gives visitors the ability to provide feedback, as well as an automated capability to
capture and summarize relevant performance measures.
• An error correction feature that allows users to report erroneous information for EPA to investigate and resolve.
The exact technical requirements and design options for the System of Access will be addressed as part of the Data
Warehouse Master Plan recently funded by the 2002 Systems Modernization Fund (SMF).
PRIMARY SUPPORTING PROJECT — EPA's PUBLIC ACCESS STRATEGY
The implementation of Access Services must align with EPA's emerging Public Access Strategy. The purpose of this
Strategy is to define the direction and scope of EPA's public access activities over the next 5- 7 years. This Strategy
represents the Agency's commitment regarding what EPA provides, to whom it provides it, and how it operates in
developing and disseminating data and information products. These are complex issues involving broad, inter-related
topics, many of which include high degrees of uncertainty. This Strategy attempts to approach these issues from a high
enough level to present the interconnections, and from a low enough level to set meaningful strategic action. It attempts
to assess the Agency's and its audiences' current needs, to project future public access trends of both, and to position
the Agency's public access efforts in ways that will be most beneficial.
20 Model for Information Integration — 2.0 EIA: Connect & Exchange
-------
2.6 BENEFITS OF THE ENTERPRISE PORTAL
A single portal for receiving and accessing environmental information makes it easier for the Agency to manage its
data flows and easy for customers to work with EPA. Other benefits of the EPA Enterprise Portal include:
• User Registration Services will make it easier for users to work with EPA. Coordinated management will require
users to maintain only one user ID to access multiple EPA systems. These services will also reduce duplication of
effort by establishing one point for the management of user registrations.
• Data Collection and Exchange Network Services wi 11 reduce the burden on EPA's regulated community by
streamlining the data submission process. Instead of a separate data submission for each Program, a regulated
entity can make one data submission and the data will be parsed and relayed to the appropriate Program Offices.
• System of Access Services makes it easier for customers to interact with EPA by providing one location for users
to access all EPA Decision support tools. Access Services also support more efficient use of resources (human
and budget) by coordinating the development and deployment of Decision support tools. Finally, the System of
Access leads to more consistent environmental analysis by using a single source, the Enterprise Repository for
EPA's analyses.
2.7 ISSUES & NEXT STEPS
ISSUES
• The Enterprise Portal currently exists as a concept. It is presented in this model to encourage simplification and
unification of the processes EPA has created over the years to interact with stakeholders, data partners, intermedi-
aries, and the general public (EPA(17), 2002). There are compelling reasons to streamline processes like environ-
mental reporting, however further analysis is needed to more fully identify and characterize all interactions with the
external world so that streamlined services fully meet the needs of Programs and/or do not replicate existing
processes.
• If the Agency pursues the Enterprise Portal concept, its design should take into consideration the EPA's emerging
Public Access Strategy (EPA( 17), 2002). The model in this document presents a rough cut of user segments and
is entirely "e-centric," i.e., it does not address a number of, "human touch," services that currently support public
access to information. The Public Access Strategy principles and guidance for non-electronic access, as well as
market research on customer segments and information needs can provide more depth to the Enterprise Portal
design.
• Agreeing on the scope of the services to be provided.
- Registration services for internal users are currently fragmented. The Agency needs to reach formal agree-
ment on the goals, objectives, and scope of integrated user registration services for internal users. Similarly, the
Agency needs to determine the degree and sequence under which the TSMSS system should be integrated
into the CDX registration services for the Agency's diverse set of external users.
- While the principle of CDX was agreed upon within the Agency in 1998, and mandated by the QIC in early
2002, a concrete decision on when/whether legacy exchange operations should be retired in lieu of the CDX
portal has not been discussed. One key factor that influences this decision is the readiness of State data
partners and the degree to which integrating key legacy flows into CDX are desirable or practical.
Model for Information Integration — 2.0 EIA: Connect & Exchange 21
-------
- Access services have typically been viewed as intended to benefit the general public. While serving the public is
essential, this view assumes that the needs of EPA internal users and data partners are similarly met by general
purpose access. The Architecture Team must account for the needs of serving internal EPA users. This should in
turn guide the scope of the access services.
Resolving resource issues associated with provisioning portal services. At this point most costs of data
exchange are carried within Program and Regional budgets. How are the Operating and Maintenance costs of
CDX to be supported as it takes on a major role in providing Agency wide services? Similarly, most access costs
are implicitly or explicitly carried within Program and Regional budgets. How are the costs of integrated access to
be allocated, particularly if this includes substantial focus on access by internal EPA users and data partners? A
draft CDX funding plan is under development.
Scope of System of Access services. This model places greater value on the data that is maintained and used
by the Decision Support components (data warehouse, registries, and tools) because it is envisioned that this data
are approved for use by both internal and external users. This raises important questions about control of user
access e.g., just to the centrally maintained data stores, and if so how this transition will be planned and managed.
NEXT STEPS
1. Determine options for the EPA Portal Services. This includes a requirements analysis and user assessment (in-
cluding EPA users and data partners) to determine needed portal service needs. Architectural options are particu-
larly needed for registration and access services, including for the latter, the scope of information to which access is
granted (to all or part of EPA holdings, to external sources, including the multi-partner content of the Exchange
Network).
2. Determine the operational and technical options for portal structure and function as part of defining the overall
architecture of these functions. This includes identification of appropriate policies, procedures, and standards that
may be needed to support an Enterprise Portal. On a technical level, EPA needs to describe the mechanisms of
and process for establishing Web-based and desk-top access to the services available.
3. Clarify the role of CDX in providing access services, including user registration. In supporting the Network
Services for the Agency, CDX provides "access" for EPA's partner on the Exchange Network. This must be
coordinated with the Access Services to ensure consistency and to avoid duplication of effort. For example,
reporting or accessing through TSMSS often requires multiple IDs and passwords, as well as the deployment of
secure remote. An enterprise portal will need to determine when and if migrating/integrating certain aspects of
current TSMSS to the CDX portal is desirable or should be left as a stand alone. Similarly the degree to which
TSMSS services remain or are integrated with other internal registration systems (ORACLE DBMS, NOVEL
directory etc) needs to be worked out.
4. Clarify CDX support for Exchange Network Services. While substantial agreement exists on the Network
exchange functions of CDX, clarification is needed on aspects of CDX function, including its role in: (1) data
exchange with geospatial systems; (2) transfer of multi-Program consolidated reports from States; (3) exchange of
confidential information; and (4) exchange of datasets created within EPA (e.g., from a Region); (5) role in receiv-
ing non- electronic submissions; and (6) verification and validation of data against a DET or data standard.
22 Model for Information Integration — 2.0 EIA: Connect & Exchange
-------
PROCESS AND STAGE
Process and Stage
Figure 7: Operational Databases Prepare Data for Storage & Use
After data has been collected by
CDX, it passes to the Programs for
further refinement and review prior
to use. Process and Stage is the
next function in this model where
data will be manipulated and trans-
formed prior to being Stored for
Use. This is a major function for
EPA because it includes the often
lengthy series of steps included in
the "clean-up" of Program and
Regional data that is often necessary
prior to use. Once complete, the
processed data are transferred to a
data warehouse part of the enter-
prise repository. The Process and Stage function thus includes the principal quality control checks necessary to meet the
Agency's quality objectives for exchanging data.
The Process and Stage function is supported primarily by the Program/Regional Systems component (databases and
supporting applications), although some Process and Stage services are supported by CDX and the Enterprise Reposi-
tory (ER) component.
Since Program and Regional systems already exist and have multiple functions, a key focus of this chapter is defining
how the roles and functions of Program and Regional systems might evolve as this target architecture is implemented.
3.1 BACKGROUND
In EPA's current computing environment, Program and Regional systems serve all the IT functions presented in this
model. They support the collection, processing, storage and access to environmental information in their domain. In
carrying out these functions, these systems are linked in only a limited way to other systems with parallel functions. The
responsibilities of a Program or Region to collect, authenticate, and analyze environmental information are tightly
coupled to the actual data processing associated with each step.
This document envisions an arrangement under which a Program or Region retains responsibility for, and authority
over, the quality of data collected within its domain. This includes data revision, correction, characterization, and
appropriate uses, but shifts responsibility for most of the actual data processing steps to enterprise services. While the
exact scope and architecture of a data warehouse is not yet defined, this document envisions that Programs and Regions
would move data ready for analysis and use to a data warehouse which is part of the Enterprise Repository.
Version control and updates of data in the ER is the responsibility of the Program data owners. The data will be
made valuable for use by any office, Program, employee, partner or stakeholder with interest, including the general
public, as appropriate.
Model for Information Integration — 3.0 EIA: Process & Stage
23
-------
Program Systems are defined as those sys-
tems used by individual Program Offices and
Regions to accomplish their particular missions
or goals. Program systems include, for ex-
ample, SDWIS, PCS, RCRA Info, and NEI. This
definition includes the Toxics Release Inventory
System (TRIS), which, while organizationally not
in a media Program office or Region, collects
primary data across environmental media for a
Programmatic purpose
Regional systems are data collections and the
associated applications that collect and use
programmatic and environmental data not
required in national Program systems but
needed to meet Regional needs.
Operational Database is defined as a database
used to hold data while it is being checked for
completeness and accuracy. Data contained in
an operational database is considered interim
and subject to change.
Consistent use of standardized, documented, published
access methods will allow interested parties to develop Decision
support tools that utilize the data. As outlined in Chapter 5.
Programs and Regions would redirect most access tools, i.e.,
applications that read the data for analysis and display from local
data sources, to the data warehouse for the relevant information.
Program Offices could design and develop additional access
tools, reports, and analysis tools to access and/or retrieve the
data to leverage the increased consistency and completeness of
data in the warehouse.
Much of data collection, data storage for use and access tools
would be centrally administered under direction from the Pro-
grams and Regions whose business needs must be met. The
Process and Stage function would remain largely with the Pro-
gram and Regional office responsible for the collection of a
particular set of information
3.2 CRITICAL FEATURES AND OPERATIONS
PROGRAM & REGIONAL SYSTEMS MODERNIZED FOR
INTEGRATION
This chapter addresses a more complex relationship between the function (Process and Stage) and the component
(Program and Regional systems) than exists for some of the other functions. For example, it is envisioned that the
exchange function discussed in Chapter 2 will be addressed in part by CDX over time, which is a new component
expressly created for this purpose. In Chapter 4, the data warehouse is proposed as an expansion of Envirofacts, a
component which, while preexisting still serves primarily the Store for Use functions envisioned in this document. By
contrast, Program and Regional systems are neither new nor have their functions been limited to the agency-wide
function envisioned for them in this document-the processing and staging of information prior to transfer to a data
warehouse.
The features of Program/Regional systems modernized for integration are: (1) these systems function in the agency-
wide integrated architecture; and (2) the Program/Region-Specific functions that may still need to be met by these
systems. In addressing these functions, the implications for Program and Regional systems as they modernize within an
integrated agency architecture are also noted.
The Function of Program/Regional Systems in Agency-wide Integration. In this architecture, the Process and
Stage function becomes the primary role of Program and Regional systems. This function includes the operational
Program/Regional database and the tools that allow the Program and Regional data owners to create, update, and
delete the operational data. The Process and Stage function includes the services of data transformation, quality control,
interim data maintenance and transfer to the enterprise repository. These functions are elaborated below.
Program/Region-Specific functions of integrated systems. As suggested above, certain needs are unique to
Program/Regional Systems. These business needs may require specialized data and forms of analysis that have little or
no agency wide implication. For these functions, full agency wide integration may be of minimal benefit. This implies
that on an as-needed and limited basis Programs and Regions will independently maintain their own interfaces and
databases and conduct the four major IT functions to meet their needs.
24
Model for Information Integration — 3.0 EIA: Process & Stage
-------
Implications for Program and Regional systems. At the most basic level, this document implies that Program
and Regional systems need to evolve to serve two functions-the process and stage function within the overall agency
architecture and the function of meeting of Program/Region specific information needs that can not be effectively ad-
dressed with agency-wide resources.
To accomplish this evolution, the first need is conceptual and cultural. Program and Regional data managers will need
to recognize that their current data operations are bundles of functions (exchanging, processing, storing and using) that
can be separated and managed in different ways. They will need to accept the distinction between: (A) authority over/
responsibility for information content, quality and use and (B) management of information processing steps. "A" is
always a Program/Regional role. "B" should be done centrally or locally based on efficiency and effectiveness in meeting
agency-wide and Program/Regional business needs. The distinction between data "in process" and data "ready for use"
needs to be recognized explicitly. This distinction is key to recognizing the distinct roles of an operational database and
a warehouse in meeting Agency-wide and Program/Regional business needs.
The evolution of Program/Regional systems also requires system changes. To ensure seamless transferabihty and
consistency, Program and Regional operational databases will need to conform to the policies and standards of the
Enterprise Repository and in particular, those of the data warehouse.
Other system changes might also be required. The metaphor of Program/Regional systems as "stove pipes" or silos
oversimplifies the actual situation. Within Programs and Regions are multiple "mini-silos," ad hoc and special-purpose
databases and data collections that complicate the ability of Programs and Regions to ensure the quality and consistency
even of information that is uniquely their own. Consolidating and rationalizing this collection of "mini-silos" is key to
fulfilling the Agency-wide Process and Stage function, to better addressing unique business needs and to leveraging
agency-wide resources such as a warehouse to meet business needs that are common across Programs and Regions.
Many of these changes will require planning and systematic review of how information is used within a Program or by
a Region.
DATA MAINTENANCE AND QUALITY CONTROL SERVICES
In this proposed model, CDX performs the exchange and extraction te Steward
EPA Program and Regional staff with
knowledge of, and responsibility for, data.
functions, by collecting incoming data, performing some initial validation
functions, and transferring the data to an interim Program or Regional
operational database for further processing. The next step, data trans-
formation, is part of the Process and Stage function and is typically
performed by the Program/Regional Systems. It includes: (1) recasting data contents to conform to existing standards
as necessary (e.g., date and latitude/longitude); (2) carrying out a defined set of validation checks to assure the com-
pleteness and quality of the data; (3) developing aggregate records as needed (e.g. summarizing a series of hourly
monitoring observations into a daily average); (4) assuring consistency with applicable agency-wide registries and data
standards.
The Process and Stage function also includes the generation of the appropriate metadata linked to the data as it
moves to a data warehouse for analysis and use. Ensuring that this information is developed and properly linked to the
dataset is another of the key quality functions exercised in this function.
Once the Program and Regional stewards of the process and stage function are satisfied with the data, the process of
transferring the data to a warehouse can begin. This includes source-to-target mapping, where data elements in the
submission are mapped to data elements in the host EPA warehouse, and the mechanics of the actual data transfer to
populate the warehouse as described in Chapter 4 as part of the "Store" function.
Model for Information Integration — 3.0 EIA: Process & Stage 25
-------
OPERATIONAL DATABASE SERVICES
While this series of checks, reviews, and modifications is occurring, the data reside in an operational Program or
Regional database. This database may have associated applications designed to track process issues such as the
degree to which a particular data submission has been cleaned-up or the overall completeness of a data exchange. Data
transformation within this operational database can occur as an incremental process, as a result, for example, of on-
going dialogue with a series of data submitters. These operational databases should be accessible to queries originating
from an Agency data warehouse. However, due to the transitory nature of the data they contain, operational databases
are not intended to function as the primary source of data for analysis. This role assumes a central data warehouse.
However a study to determine the appropriate ER model for EPA is pending. One approach maybe to link these
database via shared data and standards to achieve virtual integration.
To serve Program or Regional needs, the operational database may also serve as an archive for data provided in one
level of detail but used at another. For example, hourly observations from a particular monitor may be received but a
daily average constructed for oversight purposes. The stream of raw hourly data might reside in the operational data-
base under agreed upon retention rules while the daily average when checked for quality, would be transferred to a data
warehouse.
There are a variety of options for maintaining operational Program and Regional databases. Program offices can elect
to move their data directly into the ER and let OEI take responsibility for their data storage needs. This method creates
the greatest cost savings. If a Program prefers to maintain control of its database management functions, the Program/
Region may be required to bring its databases into compliance with the ER policies, procedures, and standards that
enable some form of virtual linkage.
PRIMARY SUPPORTING PROJECTS
Because the Process and Stage function is carried out in a variety of ways across the Agency, there are no key
supporting projects to highlight this specific function. Rather, work is necessary within each Program to modernize and
integrate key systems.
The newness of the Program and Regional system role means that systems which contain all the elements to be
deemed "modernized for integration" and which specifically serve this specific Process and Stage role do not exist.
However, aspects of the model and key steps in the planning process can found in several Program office efforts.
OECA has taken on the "mini silo" problem in the design of its ICIS system which consolidates a number of legacy
enforcement and compliance information systems. OW has initiated a thorough review of data needs leading to mod-
ernization plans for its major systems explicitly linked to agency-wide integration components. OPPTS has consolidated
a number of its operational systems in OPPIN. OAR has initiated an information planning process within OAQPS that
has highlighted the usefulness to the Program of distinguishing between operational and warehoused data. Also notable
are Program efforts to implement key data standards. As systems implement data standards they will be increasingly
prepared to exchange and integrate data and information.
26 Model for Information Integration — 3.0 EIA: Process & Stage
-------
3.3 BENEFITS OF INTEGRATING PROGRAM AND REGIONAL SYSTEMS
There are a number of benefits in assigning Program/Lab/Regional Systems the roles and functions outlined above.
These include:
• Programs and Regions wi 11 be able to make better use of existing resources by reducing or eliminating the number
of "mini silos" with duplicate operations and functions by focusing on the Process and Stage function of their
overall information management activities.
• Because the existing Program and Regional infrastructure evolves rather than is replaced, achievement of enter-
prise goals can leverage the major investments already made in Program and Regional systems
• Data stewards (Program and Regional staff with knowledge of and responsibility for the data) have the authority
to maintain the integrity, quality, and timeliness of the data.
• The development, transmittal, and collection of metadata becomes an explicit function and responsibility prior to
the associated information being stored and or released for use.
• Linking the operational Program/Regional databases to the Enterprise Repository make more data accessible for
use by the Agency.
• Programs and Regions can dedicate more resources to information planning and decision support, as well as
continue to support unique, mission-critical needs.
3.4 ISSUES AND NEXT STEPS
ISSUES
• Need to ensure broad understanding, acceptance, and implementation of the role envisioned for inte-
grated Program/Regional systems. This includes senior leadership willingness to actively support a plan
whereby the E1A is translated into a series of specific steps by OEI and Program/Regional systems staff.
• The appropriate balance must be defined between enterprise wide service roles and those retained in
Program and Regional systems. While striking the right balance may follow some general criteria, each case
should be presented to the QIC or similar senior management body for consideration. This balancing must recog-
nize that schedules drive decisions so that decisions made in the short- term must be reviewed with longer-term
considerations in mind.
• Incentives for system owners to migrate their systems into the EIA technical discipline. While significant
benefits to integration exist, it is also important that financial charge back systems such as the Working Capital
Fund (WCF) be structured to provide incentives to participate, or at least to avoid creating disincentives.
• A strategy to achieve Program/Region information integration needs to be developed and agreed upon
based on services provided through the EIA. An appropriate mix of incentives, guidance, and technical
assistance must be available to support Program and Regional offices during initial integration efforts as "mini-stove
pipes" are re-configured within their domain.
Model for Information Integration — 3.0 EIA: Process & Stage 27
-------
• Implementation of Data Standards. Data standards play a fundamental role in the EIA. As new systems are
designed and legacy systems are re-engineered, there must be strong incentives for standards implementation.
• Data stewards play a very important role in this model: they approve operational data prior to its Storage for
Use. They serve as "Sentries for quality." As the EIA evolves, the Agency must closely examine the role of the
Data Steward: (1) to identify and leverage existing roles; (2) to ensure enough resources are allocated to support
this role; (3) to ensure error correction and approval procedures are consistently followed; and (4) that their
efforts support EPA's Quality System, as well as, decisions forthcoming about the Data and Information Quality
Strategic Plan (EPA(4), 2002) recommendations.
NEXT STEPS
There are a number of specific steps Programs and Regions can take:
1. Implement existing EPA data standards. These data standards include Biological Taxonomy, Chemical Identifica-
tion, Date, Facility Identification, Latitude/Longitude, SIC/NAICS.
2. Track the development of other standards and anticipate what will be required via the Environmental Data Regis-
try to the extent possible. (Enforcement and Compliance, Geolocation, Permitting, Tribal Identifiers)
3. Work with OEI to develop Exchange Network DET's and TPAs, and begin receiving data thru CDX.
4. Modify Program systems to rely on registries to meet metadata needs and in some cases to conform to standards.
5. Make the Program system data available in the Enterprise Repository.
6. Modify applications to access the data in the Enterprise Repository.
7. Take specific steps to use and share enterprise data, tools, and services.
Additional steps that address the "mini-silo" problem include:
8. Examine existing systems within Programs for the purpose and consistency with each other. Consolidate these
where appropriate, in light of an office/Region wide understanding of data and information needs, as part of
modernization efforts.
9. Consider the design and development of Program/regional repository to extract data from several operational
systems into an architecture that allows Program and Regional business needs to be consistently and efficiently
met. This should be linked to Agency-wide repository efforts to produce an architecture that allows all reposito-
ries to be accessed seamlessly.
28 Model for Information Integration — 3.0 EIA: Process & Stage
-------
4.0 ENVIRONMENTAL INFORMATION ARCHITECTURE —
STORE FOR USE
Store for Use
Enterprise Repository
Data Warehouse
Public
Internal
Central Data Registries
Geospatial Data
i [Metadata Holdings Catal
This next chapter addresses the storage
of data, specifically databases that enable
easier access to the Agency's information
resources and decision support. The IT
function, Store for Use, is the basis for
examining the way EPA's enterprise data
are stored, managed, and made available
for use. This function begins when Data
Stewards (HQ Program, Regional Pro-
gram, or State) approve the data found in
the operational databases (Chapter 3) and
make it available to the rest of the Agency
and other public users as appropriate. The
Store For Use function consists of the
activities necessary to ensure the data are
available and ready for analysis.
The Store for Use function will be
supported by a set of coordinated data-
bases that are collectively referred to as the
Enterprise Repository (ER). These
databases are managed under a consistent
set of policies, procedures, and standards that promote data integration. The concept of the Enterprise Repository is
derived from the QIC approved principle that "EPA will provide access to integrated Agency data and shared
datasets." Figure-8 shows the collection of databases within the Enterprise Repository.
4.1 BACKGROUND
ENTERPRISE REPOSITORY
In EPA's current IT environment, the procedures for processing data and then appropriately storing it for use vary
widely across the Agency. Variations in the procedures governing version control, access control, archiving, error
correction, and documentation of these processes make it
difficult to locate and access data, particularly if a user is not
closely associated with the Program. These circumstances
make analyses difficult to replicate, which casts doubt on the
credibility of EPA analyses.
Figure 8: Enterprise Repository Decision
Support Components
Enterprise Repository (ER)
A centrally coordinated set of databases that conform
to common policies, procedures, and standards
One way that EPA currently addresses the problem of access is by copying (or linking) data from various Program
Office databases and storing it in the Envirofacts Information Warehouse. Currently databases "in" Envirofacts share a
common set of access methods, policies, procedures and standards, which allow for efficient storage, access, mainte-
nance, and integration of the data. In EPA's decentralized computing environment, this approach has proven to be the
only way to integrate EPA's environmental data and make it accessible for public use. However, this arrangement is
imperfect. Because data supplied to Envirofacts is a copy of a Program database, error corrections at the Program
Model for Information Integration — 4.0 Store for Use
29
-------
database level are not always reflected in Envirofacts. Also, database copy updates are voluntary and unpredictable.
This arrangement also does not address the inconsistency in policies and procedures that guide processing and storage
across the Agency. Finally, the types of data available in Envirofacts are limited because Envirofacts is voluntary, and
Programs are charged to use it through the Working Capital Fund (WCF).
4.2 CRITICAL FEATURES AND OPERATIONS
The target IT environment proposed in this model seeks to establish a robust data warehousing environment that is
specifically structured to satisfy the query and reporting needs of EPA's internal and external information users. Program
Office participation in the design of the data warehouse is critical to ensure requirements of all stakeholders are satisfied
and to ensure the inter-office coordination necessary to maintain the quality of the data. The key to the success of this
target environment will rest on the effectiveness of the policies, procedures, and standards that underpin data warehouse
development and implementation.
A data warehouse can be implemented in large organizations such as EPA in a variety of ways (Inmon, et al, 2001).
A single "global" data warehouse is one approach. Another approach is to build "local" warehouses for Divisions or
Offices and to connect them virtually to create a "global warehouse." Model selection should be based on which model
best meets Agency business needs. The process for choosing among these models is noted in the "Issues and Next
Steps" section of this chapter.
Critical to the success of the Enterprise Repository is not the technology, but the management structure that surrounds
it. The Enterprise Repository will provide policies, procedures, and standards for database management, query sup-
port, and database administration. It is important to note that the Enterprise Repository will not be a single physical
storage unit, but a coordinated set of databases. Although theses databases may be independently managed, they will
conform to a common set of policies, procedures, and standards to enable data integration and access to information
maintained across the Agency. Databases that are part of the ER simply conform to ER policies.
The Enterprise Repository will store EPA's "enterprise data " broadly defined as data that enables cross-media,
cross-program analysis. Examples include data reported by States or the regulated community to EPA, geospatial data,
or any other data that supports analysis, assessment, and modeling. Further analysis is needed to define the full scope,
including the needs of both internal and external analysts.
The Enterprise Repository will provide the following services:
• Data Warehouse Services.
• Registry Services.
• Geospatial Data Services.
• Metadata Management.
4.3 DATA WAREHOUSE SERVICES
The Data Warehouse will be the part of the Enterprise Repository that supports EPA's enterprise reporting needs.
This document defines a data warehouse as a collection of integrated subject-oriented databases designed for decision
support. With consultation by the Program data stewards or systems owners, the Data Warehouse will be designed to
illustrate the appropriate relationships among all the data holdings and to ensure that data integration and data consolida-
tion are maintained at the highest possible level. The data in the warehouse will be reviewed, processed, and cleared by
the Program before it is copied (or linked) from the operational databases.
30 Model for Information Integration — 4.0 Store for Use
-------
Decision Support tools will analyze the data in the Data Warehouse for analysis. The standard access methods,
query support, and documentation provided by the Data Warehouse will make it easier for EPA and stakeholders to
build tools that access the data for analysis, reporting, and decision support. A Data Warehouse Master Plan develop-
ment study was recently funded and will explore EPA's
requirements and technical options for a data ware-
house.
Data Warehouse (DW) - A collection of integrated subject-
oriented databases designed for decision support (Inmon, et
al,2001).
Decision Support - Analysis of many units to aid in learn-
ing, discovery, and problem solving (Inmon, et al, 2001).
Primary Supporting System - Envirofacts Infor-
mation Warehouse
The Envirofacts Information Warehouse currently
provides Data Warehouse Services at EPA. The
Envirofacts Warehouse is a collection of EPA environmental databases, largely derived from EPA source databases.
The Envirofacts Web site is an interface to the Envirofacts warehouse that allows the public access to numerous EPA
environmental databases using the internet. Envirofacts users can retrieve environmental information from databases on
Air, Chemicals, Facility Information, Grants/Funding, Hazardous Waste, Risk Management Plans, Superfund, Toxic
Releases, and Water Discharge Permits, Drinking Water, Drinking Water Contaminant Occurrence, and Drinking Water
Microbial and Disinfection Byproduct Information. Users may retrieve information from several databases at once, or
from one database at a time. Online queries allow users to retrieve data from these sources and create reports, or
generate maps of environmental information.
In its current configuration, the Envirofacts Information Warehouse is more like a "Federated Repository" than a
"Data Warehouse." A Federated Repository is a collection of databases from independent systems that have been
combined (either virtually or physically) into a common repository. This common repository allows the databases to be
integrated using common fields such as facility ID and chemical ID. The databases in this "Federated Repository" all
share a common set of access methods, policies, procedures and standards, which allow for efficient storage, access,
and maintenance of the data.
The weakness of the Federated Repository is that the queries of multiple databases can sometimes produce inconsis-
tent results since the data elements belong to
independent (or "federated") databases with
independent data schemas. In a true data
warehouse, the databases are integrated into
one heterogeneous data schema designed
specifically to handle queries with an enter-
prise focus.
4.4 REGISTRY SERVICES
Within the ER, a special set of databases
called the Centralized Data Registries will
provide the authoritative source of datasets
that are critical to data integration and infor-
mation exchanges between EPA and its
partners. Registry Services provide the means
for coordinating the management, access, and
use of EPA's Centralized Data Registries.
Demographics 2000 Database
Integrated GeoSpatial Database
National Shapeflle Repository
Figure 9: Environfacts Information Warehouse
Model for Information Integration — 4.0 Store for Use
31
-------
Data Schema
The structure of data in a data
warehouse usually indicated by data
fields, formats, field attributes, and
relations to each other.
WHAT is A REGISTRY?
This document defines a registry as "an official and authoritative list of
specific, well-defined items of interest." The Facility Registry System (FRS)
is the Agency's keystone registry as it supports better management and
integration of data most closely associated with EPA's bottom line: regulation
of facilities. Within this broad definition, a registry can provide one or more of the following functions:
Registration - As the name implies, registries allow users to
register new items (add new items to the authoritative list) or
screen existing items (check to see if items are already regis-
tered). For example, Programs and some States are currently
populating and validating facility identifiers in FRS. It is envi-
sioned that eventually the CDX will use FRS to validate or add
facility IDs submitted in reports via the Exchange Network.
Centralized Data Registries
EPA's core "registry" systems: Facility Registry
System (FRS), Substance Registry System (SRS),
Environmental Data Registry (EDR), and Environ-
mental Information Management System (EIMS)
Previously known as, "System of Registries."
Linkages - A registry can be used to establish linkages between items of interest. For example, currently the same
facility may be called Facility A in one system and Facility B in another system. The Facility Linkage Application within
FRS performs the link between unique IDs in each system.
Metadata Reference Database - In addition to listing the authoritative
terms, they may also store additional information about the item of interest or
links to other related information. For example, the FRS contains the name and
address of the parent company of a facility.
Registry
An official and authoritative list
of specific, well-defined items of
interest to an organization
Validation/Verification - Serve as automated normalizing agents to all standardized data received and transferred
by the Agency. Information that varies from standardized protocol would be normalized or would include a notice
indicating the discrepancy. For example, in the CRS the validation/verification function will identify where a chemical
name does not match the reported Chemical Abstract Service Number.
Discovery - Assist users in information searching and discovery. May also serve as indexes for finding other informa-
tion in the Enterprise Repository.
Cross-Referencing - Allowing for multiple representation of common data in disparate locations and systems.
Application systems would not necessarily be required to change their representation of common data but could use the
registry to map to equivalent concepts expressed elsewhere.
At EPA, all registries serve as official and authoritative lists of specific, well-defined items of interest to EPA. Some
registries are indexes that let information users know what's available and other registries are simply sources of
metadata. At a minimum, a registry will perform the first function and optionally may perform functions two and/or three.
Key registry systems are described below in Table-6.
32
Model for Information Integration — 4.0 Store for Use
-------
Registery Name Primary Supporting Project(s)
Facility Registration System (FRS)
Environmental Data Registery (EDR)
Substance Registry System (SR)
This registry contains both the Chemical Registry
System (CRS) and the Biology Registry System
(BioRS) as subsets
Environmental Information Management System
(EIMS)
Terminology Reference System (TRS)
Information Resource Registry System
XML Registry
Registration, Linkage, Metadata Reference,
Validation/Verification, and Discovery
Registration, Linkage, Metadata Reference Data-
base, Validation/Verification, Discovery, and Cross
Referencing
Registration, Linkage, Metadata Reference Data-
base, Validation/Verification, and Discovery
Registration, Linkage, Metadata Reference Data-
base, Validation/Verification, Discovery, and Cross
Referencing
Linkage, Metadata Reference Database
Registration, Linkage, Metadata Reference Data-
base, and Discovery
Registration, Linkage, Metadata Reference Data-
base, and Cross Referencing
Table 6: EPA's Registry Systems
PRIMARY SUPPORTING SYSTEM- FACILITY REGISTRY SYSTEM (FRS)
As discussed, the purpose of FRS is to provide EPA with a central database of facility identification records and to
provide links to all facility-oriented Program system records. The independent management of EPA Program Systems
has lead to multiple unique identification for single facilities.
A strategy under consideration is whether FRS should evolve into the only source of facility identification. This
approach requires two fundamental changes: (1) OEI will own and maintain the physical facility identification record;
and 2) Agency Programs will become dependent on the FRS as their only source for facility identification data. The
strategy is to seek out all reliable sources of accurate facility identification data and, with appropriate documentation,
populate the FRS records. This will improve efficiency and quality, however this approach has both technical and
policy implications. The issues must be analyzed and debated as part of the "Registry Linkage" options study planned
(Chapter 6).
PRIMARY SUPPORTING SYSTEMS - EDR, EIMS, IRRS, & XML
ENVIRONMENTAL DATA REGISTRY (EDR)
The Environmental Data Registry (EDR) is a comprehensive, authoritative source of reference information about the
definition, source, and uses of environmental data. The EDR catalogs major data collections and helps locate environ-
mental information of interest. As the major tool supporting the Agency's data standards program, the EDR records and
disseminates information about Agency data standards and the standard-setting process. The EDR is also affiliated with
the Substance Registry System with the Chemical Registration System and Biological Registry System as subsets, and
the Terminology Reference System.
Model for Information Integration — 4.0 Store for Use
33
-------
ENVIRONMENTAL INFORMATION MANAGEMENT SYSTEM (EIMS)
EIMS is a repository of information products and metadata. EIMS stores, manages, and delivers descriptive infor-
mation, i.e., metadata, for data sets, databases, documents, models, multimedia projects, and spatial information. The
EIMS user community includes environmental scientists, resource managers, and other stakeholders—both within
EPA, the research community, and from the general public. Users can search within EIMS to find information sources
of interest based upon topic or defined criteria related to types of environmental resources, geographical extent, date, or
content origin. The EIMS repository of scientific documentation, accessed with standard web browsers, places a virtual
library on the desktop of EPA staff and others with Internet access. The EIMS architecture also supports the manage-
ment of complex data, such as remote sensing data, Geographic Information System (GIS) coverages, and other types
of data.
WHAT OTHER REGISTRIES ARE NEEDED?
There are plans to link the EDR and EIMS in a meaningful way so that both can provide key inputs to the effort to
support the development of the Central Data Registries and EPA's node catalog for the Network. Currently Facility and
Chemical identification have received the most attention as key data elements for integration. However, "regulation" and
"business sector" have also been cited as useful for cross-program integration. Additional analysis needs to define
Agency needs, and determine how these needs can be fulfilled using existing registries. Appendix C provides prelimi-
nary suggestions for other registries that may support EPA's business needs. Efforts are already underway on the
registries needed.
INFORMATION RESOURCE REGISTRY SYSTEM (IRRS)
An information resource registry would start with an application systems registry to catalog information resources as
required by numerous Federal regulations, policies, and oversight agencies. (One specific need is related to the require-
ment that the Agency certify that all of its information systems meet security requirements.) This information was for-
merly stored in the Information Systems Inventory (ISI) until the mid 1990's. Much of the information from this system
was used in populating information resource metadata records as part of the Government Information Locator Service
(GILS) effort. Application system information currently exists in the EDR registries and could be extracted to populate
this registry. Additionally, the results of the Y2K inventory could contribute to this registry population. OEI is currently
working with the EPA's Enterprise Architecture initiative to construct this registry. It is projected to be operational
(although not fully populated) by the end of calendar year 2002.
XML REGISTRY
An XML registry/repository is envisioned to make reusable data components and specifications available. The
Agency has identified the need for a registry to manage XML objects, to support consistency of XML development,
and to support effective and consistent technology use and implementation. As XML objects, including tags and
schemas, are closely related to data elements, this registry could be implemented as part of the EDR registries. The
Data Standards Branch in conjunction with the EDSC has contracted with the National Institute of Standards and
Technology to create a pilot XML registry. It will be operational by the end of April 2002. This registry will be oper-
ated on a trial basis for six to nine months. At the end of the trial period, a decision will be made as to where the
registry will be hosted and whether it will be integrated into the EDR.
34 Model for Information Integration — 4.0 Store for Use
-------
4.5 GEOSPATIAL DATA SERVICES
Geospatial Data Services provide the means for coordinating the management, access, and use of EPA's Geospatial
Information. Geospatial data are currently acquired and managed independently by many different EPA Program
Offices and, as a result, there is often duplication of effort and resources.
There are several components necessary to eliminate redundancies and better leverage resources. First, a Geospatial
Data Index (GDI) will enable users to easily find and identify geospatial data holdings within EPA, easily access
metadata about that data, and where web linkage exists, access that data. It is envisioned that EIMS (the Agency's
federal geographic data standard node for FGDC/NDSI) will be the engine for the GDI. Second, one core enterprise
geospatial dataset necessary to implement key EPA business operations will be accessible to staff and partners. This
accessibility will be achieved via one or more integrated geospatial databases and/or through linkages to master files
housed at partner organizations. Third, the full implementation of the Agency data standard for latitude/longitude will
further enhance data exchange and sharing.
It is envisioned that the geospatial technical infrastructure will consist of a series of linked Headquarters, Regional,
and ORD Laboratory nodes which will support seamless access to distributed geographic data and services by EPA
staff, partners, and stakeholders.
The key success factors are: (1) having metadata associated with all data; and (2) the computing and telecommunica-
tions capacity for anyone in the Agency to access that data whether it be on the integrated Headquarters server or on an
integrated Regional and/or ORD Laboratory server.
PRIMARY PLANNING MECHANISM - GEOSPATIAL BLUEPRINT
In June 2001, EPA's Office of Environmental Information (OEI) completed an assessment of the current use of
geospatial data and technologies throughout the Agency. The resulting "Geospatial Activities Baseline Assessment"
(EPA, 2001) describes the use of geospatial data and technologies in support of the Agency's business operations, and
documents current data sets, hardware, software, users, expenditures, applications, and issues related to geospatial
technologies. More than 350 individuals across all Regions and headquarters Program offices actively contributed to
the development of the Baseline, confirming the pervasive and critical role that these data and technologies play in the
Agency. Contributors indicated that most of the Agency's business operations are tied explicitly to geographic locations
and are currently supported to some extent by the use of geospatial technologies. Many users, however, expressed
additional needs for geospatial data and analyses, as well as concerns about their ability to fully utilize the technologies
due to a variety of issues that characterize the present EPA organizational and information management environment.
The Geospatial Blueprint, slated for completion in the Spring of 2002, will likely recommend mechanisms to eliminate
redundancies and provide for more efficient management and use of geospatial information by managing geospatial
information as a corporate resource.
The Geospatial Blueprint will likely recommend an approach to more effectively organize, coordinate, and leverage
geospatial activities on an enterprise-level within EPA and with its partners in environmental protection. The intent is to
have the Agency operate on a common vision, move as an organization in a defined direction, and create an environ-
ment where geospatial data/tools are shared resources and incorporated into daily operations.
4.6 METADATA SERVICES
Metadata are data about data, and it is critical to locating and understanding EPA's data holdings. Metadata Services
within the ER provide coordinated management of metadata across the Agency.
Model for Information Integration — 4.0 Store for Use 35
-------
In EPA's current IT environment, metadata have been buried in a myriad of independent systems which make it
difficult to use. Format and content of metadata information vary across EPA. Some metadata records are duplicated
and inconsistent (EPA(14), 2001). Moving data to a data warehouse has little effect if users cannot locate and identify
the data in the warehouse.
Metadata Services help to ensure a single source of consis-
tent and current information, version control, and availability. An
Metadata and Holdings Catalog
The authoritative source of EPA's metadata. Also
enterprise commitment to metadata management is a foundation referred to simply as the "Holdings Catal°9-"
for future data standards that will support consistent data
interpretation.
METADATA AND HOLDINGS CATALOG
The Metadata and Holdings catalog will be a special registry that supports Metadata Services.
It is envisioned that the Holdings Catalog will support metadata management by serving as the authoritative source of
tracking and managing metadata for EPA. Just as information about EPA-regulated facilities is stored in the Facility
Registry, information about EPA's information resources will be stored in the Metadata and Holdings catalog. The
Metadata Strategy currently under development at EPA supports the establishment of an authoritative source of
metadata and describes how this registry can be used as a vital tool in the management of EPA's metadata.
The Metadata and Holdings Catalog will support the System of Access by serving as the source of information about
EPA's data holdings. This will provide the System of Access the capability to allow users to search for information
about EPA's databases on a user's authorization level. The Holdings Catalog may contain information such as a listing
of datasets, registries and their contents, record field names, associated data types, and formatting for records in the
warehouse or in the operational data stores. The Holdings Catalog may also contain active metadata used by Access
and Decision support tools to perform online queries.
With regard to operations, the Metadata and Holdings Catalog will be part of the Enterprise Repository. As required
for the other databases discussed, the Holdings Catalog will be required to conform to the policies, procedures and
standards of the Enterprise Repository. The Holdings Catalog may have an operational component where the data are
staged, managed, and reviewed for quality. It may also have a public access component that is part of the Data Ware-
house. Additional research needs to be done to identify the detailed requirements and specifications for the Metadata
and Holdings Catalog. This research will be done through EPA's Metadata Strategy.
EXCHANGE NETWORK NODE CATALOG
The Exchange Network Blueprint envisions that a subset of the Metadata and Holdings Catalog will be made avail-
able as the Exchange Network Node Catalog for EPA's network node. Each partner in the network will provide a
similar Catalog detailing the holdings of that partner being made available to the Exchange Network. Each network
node will thus have a catalog advertising the information resources which are available on that node and information on
how to access that resource.
36 Model for Information Integration — 4.0 Store for Use
-------
4.7 BENEFITS OF ENTERPRISE REPOSITORY
The Enterprise Repository provides the framework for managing EPA's data assets as a corporate resource by
applying a common set of policies, procedures, and standards to EPA's Data Warehouse, the Centralized Data Regis-
tries, Geospatial Data, and the Metadata Holdings Catalog. This coordinated management allows the Enterprise
Repository to provide the following benefits:
• More Efficient Data Management and Use of Resources
• Improved Access and Use of Agency Data
• Improved Data Quality
MORE EFFICIENT DATA MANAGEMENT AND USE OF RESOURCES
The Data Warehouse promotes more efficient data management and use of agency resources by helping maximize the
expertise within the Agency for database modeling, development, management, and access/security The integrated
design and management of the Data Warehouse will promote more efficient data retrieval. The consistent standards and
documentation of the warehouse will allow for a more efficient and faster development schedule for tools and applica-
tions. As a result of the Data Warehouse, the Agency will gain more staff for analysis and Program operations. Expert
Program staff can spend more time on the data rather than on the operations of the database system.
The Centralized Data Registries will help promote more efficient data management and use of agency resources by
reducing redundant data collection and storage, and providing an accessible source of authoritative data that can be
reused or linked to other Program databases and enable conformance with some data standards.
Geospatial Services will promote more efficient use of resources by helping reduce the duplication of effort in the
acquisition, management and use of geospatial data and tools. The Metadata and Holdings Catalog will provide the
database and the management structure to improve overall coordination of EPA's metadata assets and make EPA's data
assets easier to share and reuse.
IMPROVED ACCESS AND USE OF AGENCY DATA
The Data Warehouse will help improve access and use of Agency data by making EPA's mission-critical data cen-
trally accessible to all EPA employees, partners, and stakeholders from one logical location. The Data Warehouse will
also provide the consistent data and query support needed to produce the consistent, replicable, cross-media analysis
and reporting.
Consistent standards and documentation of the environmental data, combined with coordinated management of
geospatial data will help EPA provide analytical tools that portray a more accurate picture of environment. The Central-
ized Data Registries will provide the linkages between the databases that is necessary to enable effective, consistent,
cross-media analysis. Cataloguing the data and tools through the Metadata and Holdings Catalog will make it easier to
locate the data and the tools to access and analyze it. The policies of the ER will also help coordinate access to the
tools through the Systems of Access.
Model for Information Integration — 4.0 Store for Use 37
-------
4.8 ISSUES AND NEXT STEPS
ISSUES
The primary QIC/senior EPA management issues associated with the Enterprise Repository concern the Data Ware-
house. These include:
• Defining the function and scope of the Data Warehouse. While EPA has used Envirofacts as a data ware-
house, it has not committed to the basic principles embodied in the broader warehouse concept put forward in this
document. Perhaps most prominent is the redirection of Program applications to the Enterprise Repository.
Agreement on the basic scope and function issue and on the general architecture of the warehouse is essential
before final design and implementation begin. This will be examined as part of the Data Warehouse Master Plan
recently funded through the SMF investment process.
• Agreeing on the data and transition activities for the Warehouse. This includes Agency- wide agreement on
what constitutes "enterprise data," the processes for providing data to the ER, as well as definition of roles and
responsibilities of OEI and Programs.
• Resolving resource issues associated with the Warehouse. In terms of scale, the data warehouse concept
proposed is essentially new to EPA. It contemplates that most existing applications will be redirecting their input
data to the data warehouse and that most of the Agency data will be moved from operational data systems in
Programs and Regions to the warehouse. This involves development, operational, and data transfer costs not
currently budgeted or incorporated within overall maintenance costs of specific Program systems.
Issues also exist with regard to other elements of the Enterprise Repository.
• Determining the scope of the Centralized Data Registries and the Metadata and Holding Catalog. The
Agency must reach a management level agreement on the number of registries and how they will work together to
provide an authoritative summary of the key entities (e.g., facilities, chemicals, regulations) that EPA addresses in
order to fulfill its functions. This includes agreement on the architecture of registries and of the Metadata and
Holdings Catalog.
• Making the Registries Authoritative Sources. Registries serve the important function of providing an enter-
prise a holistic view of its information and technology resources. But registries are only authoritative if they are
populated, representative, regularly refreshed, and easy to use. Measures must be taken to ensure that EPA's
Centralized Data Registries contain regularly refreshed data.
• Building a "System of Registries." This model proposes a cohesive, interdependent set, i.e., a "system" of
authoritative registries. Currently EPA maintains a number of separate registries and is proposing to build additional
new ones (EPA(15), 2001). Prior to the design and implementation of the Metadata & Holdings Catalog pro-
posed in this model, EPA should look at all the registries currently in operation and develop a strategy to stream-
line and connect them in a meaningful way. Although the EDR, SRS, and TRS have been developed and operated
together, other registries come packaged with their own development and maintenance needs to make them an
"authoritative source." As the number of separate registries increases, the less likely that EPA staff will want or be
able to keep them current and authoritative.
38 Model for Information Integration — 4.0 Store for Use
-------
• Over the last few years the concept of a "Place Registry," has been bandied about as a critical need for the
Agency. Clearly, "place" is a key integrating element. "Place" can be designated as a point - a lat/long value for a
discharge pipe (currently supplied by the FRS), or a polygon, e.g., a watershed area, or a designated wetland
area. Many have argued, however, that the latter are actual geographic (geospatial) coverages that can be
purchased or accessed in a data partner's database. On the other hand there are others who feel these "places"
(polygons) of interest should be registered and inventoried in an EPA Registry for integrated analysis. In the
coming year, the, "Place" registry needs a clear definition including its relationship, if any, to the FRS and the other
Centralized Data Registries and the Geospatial Data Services.
• Resolving resources issues associated with registries and the catalog. While the resource issues associated with
these elements are less than those associated with the data warehouse, it is important to address increased devel-
opment costs associated with modification of registries to provide more general access and increased data trans-
actions associated with flow of information among registries and with Program systems.
• Establish and implement geospatial data management policies. While some core enterprise geospatial data will be
continued on the Headquarters Integrated Geospatial Database, others will be stored on servers in the Regional
and ORD laboratory nodes of the enterprise geospatial system. It will be critical to apply uniform data standards
to all the geospatial data and make it accessible via the geospatial technical infrastructure.
NEXT STEPS
There are a number of next steps associated with the Enterprise repository.
1. Identifying Data Warehouse architectural, procedural and technical options. This includes an examination of
agency-wide and Program/Region specific data warehousing requirements. Included here is the question of what
data belongs in the Data Warehouse. Once requirements have been identified, options for meeting those require-
ments need to be developed. These options could range from status quo, to purchasing a commercial data
warehousing solution, to developing a customized in-house data warehouse solution. The options analysis might
recommend a central data warehouse, or a set of small data warehouses for each office, or some combination of
all of the above.
Procedurally, needed are policies addressing migration requirements for all eligible databases and datasets as well
as a procedure for identifying exceptions. Also needed are options for interim access methods for getting data
from legacy systems as the elements of the enterprise repository are being developed and implemented.
2. Finalizing the architecture of geospatial data and services within the overall Enterprise Architecture. This is needed
to complete the process on fully integrating geospatial with other programmatic environmental information.
3. Defining architectural, procedural and technical options for the Centralized Data Registries. Options for the
scope, number, and architecture of Agency registries needs to developed. Key here is determining how to link
registries and to grant access to them seamlessly given the differing content and data structures of the current
registries.
4. Defining architectural, procedural and technical options for the Metadata and Holdings Catalog. The relationship
between the Metadata and Holdings Catalog and the datasets it references needs to be defined as well as the
scope, e.g., whether it includes references to holding outside EPA in the overall Exchange network. Critical also is
the relationship between EPA's node catalog and other Network node catalogs with the overall development,
implementation, and on-going operation of a metadata strategy for the Agency.
Model for Information Integration — 4.0 Store for Use 39
-------
5.0 ENVIRONMENTAL INFORMATION ARCHITECTURE — USE
The last function envisioned in this model is the
Use function. The EIA components that support
data use are mainly Decision Support Tools.
However, the System of Access, the Enterprise
Repository, as well as the models and algorithms
that support analysis and decision making also
play a role.
5.1 BACKGROUND
In EPA's current computing environment data
are used in a variety of ways to support business
needs ranging from one-time direct queries for
compliance information to cross-media
geospatial analysis.
Decision Support
Tools
Figure 10: Decision Support Tools Use Data in
the Enterprise Repository
The Use function is also supported by a variety of Decisions Support Tools that are either customized or purchased
to meet a specific analytical need. Further, the data sources used are also highly variable and include Programmatic or
Regional systems, centrally managed stores like Envirofacts, federal sources, or commercially available datasets &
geographic coverages.
Tool - Any device or that aids in accomplishing a task.
this document tools are often used synonymously with
"application."
In
Decision Support Tool - Tools which enable analysis of
many units to aid in learning, discovery, and problem solving
There are three major issues associated with the
current cross-agency support of the Use function that
the EIA must address:
(1) As discussed in Chapter #4, the procedures for
storage, archiving, and access of information are
highly variable across the Agency. Tools avail-
able to the public may not always draw upon the same data sources or datasets that were used to generate key
EPA reports that have been made public. Thus, it is possible that the results of analyses will vary, casting doubt
upon the credibility of EPA's analyses and decisions.
(2) Because of the independent ownership and management of tools across the Agency, they are often with hard to
locate.
(3) The methodologies and tools employed may not always be appropriate for the intended use of the data. This is
not an issue that can resolved with technology. It involves the deployment of data use and quality assurance
guidelines, training, and oversight mechanisms (e.g., peer review.)
5.2 CRITICAL FEATURES AND OPERATIONS
Key to this chapter are the tools themselves. Decision support tools will provide a variety of services to support the
access and analysis of EPA's data. The services may be enabled with graphical displays or interfaces similar to those
available like Arc View, ArcExplorer, and IBM Data Explorer. Specific capabilities are dependent upon the business
need, user needs, data sources, and the applications implemented.
40
Model for Information Integration — 5.0 EIA Use
-------
In order to eliminate redundant purchases of datasets and other Computer-Off-The-Shelf software (COTS) and
improve the credibility of analyses, three key policies are critical: (1) enterprise data must be made accessible in the ER;
(2) tools must use data deemed ready for use (e.g., "Stored for Use," and located in the ER); and (3) design and
deployment of tools must be coordinated, and tracked agency wide via mechanisms like the Integrated Resource
Registry System to ensure they are available via the System of Access.
In this model, decision support tools are made available via the web-enabled Enterprise Portal and located using the
System of Access. In some cases, access to locally provided services may include click-throughs via a networked
computer desktop (e.g., a Windows desktop on a Novell network).
The types of Decision support tools that Program Offices might provide are quite varied. Below are some example
capabilities, summarized from review of a number of EPA tools (EPA(9), 2001):
• Review answers to database queries on EIA datasets.
• Prepare reports.
• Plot variables at different scales.
• Create maps featuring overlays.
• Drill down spatial data from national, to state, and to county levels.
• Review trans-boundary air toxics movement provided by remote sensors.
• Explore relationships of different environmental media in ecosystems.
• Use TRI data for overlaying risks and hazards on a chemical-by-chemical basis.
• Monitor and assess the status and trends of national ecological resources.
• Perform analyses involving multiple independent datasets, some of which are outside the EPA (e.g., health data).
• Estimate spatial distribution ofbiogenic emissions.
• Model human exposure to urban air pollution.
• Display database extracts in a geographic context on maps.
Other critical features of the Use function are the policies, guidelines, and procedures that promote credible data and
information to support the use of decision support analyses. The Agency currently supports a Peer Review process as
well as a Quality System which directs the collection and use of data and information and support credible analysis.
Unfortunately, the procedures associated with these systems are overlooked.
These circumstances are somewhat fueled by the ease with which tools and data can be purchased and used. The
drive for analytical innovation has sometimes lead to creative uses of data not intended at the initial point of collection.
The coordinated management of tools and data, either associated with the ER or the System of Access, must take into
account the analytical controls established to ensure credibility of EPA analyses. These include the recommendations in
the analytical Best Practices and EPA Guidelines to Ensure and Maximize the Quality, Objectivity, Utility, and Integrity
of Information Disseminated, both forthcoming in FY2002.
PRIMARY SUPPORTING PROJECT - TRI EXPLORER
Toxics Release Inventory (TRI) Explorer is a Web-based analytical tool that allows users to generate reports on
specific chemicals and reported chemical releases by industry sectors, environmental media, geographic area, and
individual facilities. TRI data users can compile their own reports on-line. The TRI Explorer allows users to easily
determine what toxic chemicals might be present in their neighborhood, how the reported releases are changing over
time, and how their own situation compares to other communities around the country. TRI Explorer provides data for
all reporting years since 1988. The data are synchronized with the published Public Data Release documents (Lai,
2001). Data used in the TRI data is from Envirofacts.
Model for Information Integration — 5.0 EIA Use 41
-------
PRIMARY SUPPORTING PROJECT - WINDOW TO MY ENVIRONMENT
Window to My Environment (WME) is another Web-based analytical tool. WME combines interactive maps, with
links to federal, state, and local environmental data, to provide the public with information on environmental issues and
conditions affecting their community or location of interest. Developed through an EPA-State partnership, WME
answers popular questions about a community's air, land and water, as well as what is being done locally to protect the
environment
Particular features of WME include:
• Interactive Mapping Tools: WME allows the user to control the area the user can map and view the location of
regulated facilities, monitoring sites, waterbodies and watersheds, and demographics. As well as traditional
geographic designations like streets, counties, schools, and so on. The user can zoom, pan, and move all around
the area, and watch the information the user receives dynamically shift before the user's eyes to reflect the new
area the user has chosen. Then the user can look at the three dimensional view of local area land use patterns in
the user's area.
• Data on "Ambient" Environmental Conditions: WME provides daily Ultra-violet (UV) Index reading. Advice is
also available on health effects of exposures to sunlight, locations/reports from local air and water quality monitor-
ing sites, land cover characteristics, and more.
• Access to Analytical and Reporting Tools: WME links to EPA's Envirofacts, TRI Explorer and Surf Your Water-
shed tools, as well as State tools like Pennsylvania's "E-Facts" and Delaware's "Environmental Navigator,"
providing the user with the ability to generate custom reports on specific chemicals, facilities and trends in a
selected area.
• Local Governmental Services and Contacts: WME links the user to dozens of government and non- government
organizations and contacts with information on local issues in the user's area of interest.
5.3 BENEFITS OF DECISION SUPPORT TOOLS
A coordinated suite of decision support tools, which draw upon a consistent set of data, are used in accordance with
guidelines and processes that support credible analysis is invaluable. A coordinated suite of tools will also:
• Save the Agency money. Coordination will eliminate redundant tools, datasets, and partnerships to gain access to
the same types of data. This will enable the Agency to leverage resources for other analytical endeavors.
• Help to better serve analysts, both internal and external by: (1) making tools more prominent and accessible; (2)
enabling consistent content and interfaces; (3) meeting graphical and geographical analytical requirements; (4)
enabling "what if scenarios with data; (5) supporting analyses of multiple independent datasets.
42 Model for Information Integration — 5.0 EIA Use
-------
5.4 ISSUES AND NEXT STEPS
ISSUES
The issues currently identified for Decision Support Tools are:
• Linking Tool Design, Management, and Use to Quality Guidelines and Best Practices. As discussed in Chapter
1, EPA will soon have to implement guidelines to support the quality, objectivity, utility, and integrity of information
disseminated. These emerging guidelines must be integrated into future tool design, management, and use to
document the credibility of EPA analyses.
• Datamarts. Many integrated architecture models, like the Corporate Information Factory (Inmon, et al, 2001)
highlight the usefulness of datamarts, i.e., subsets of a data warehouse which are customized to address a business
need and support a specific tool. They are beneficial as they are, "owned" and managed at a departmental level;
they are less expensive than maintaining a departmental warehouse; and they are flexible and can be customized
for departmental reporting and analysis (Inmon et al, 2001). The Master Plan for a Data Warehouse effort should
take into consideration these benefits for Programs as well as the potential role of datamarts in the development of
the Enterprise Repository.
• Linking Use of Data to Planning for Data. The Information Quality Lifecycle adopted in this model is cyclical, i.e.,
the Use of data and information influences future Planning for data. As the Agency uses data to understand
emerging environmental problems it is unclear whether there are formal mechanisms in place to capture gaps and
new data needs. Governor Whitman recently launched and Environmental Indicators Initiative. In the coming year
the Agency will produce a "First Report that will provide an inventory of EPA indicators, identify promising
indicators that allow us to report on the environment, as well as identify data gaps..." (EPA(7), 2001). This Report
should influence future planning for data. The Report should also influence the types of data made available in the
ER and the future design of tools to support the results.
• Committing to enterprise-wide decision support. EPA has historically focused on data collection and monitoring.
It has spent large sums in developing systems to collect and store this data. Typically, attention to analysis and use
of the data to support internal decision-making has been ad hoc and resources to do so limited. The vision of
Decision Support Tools contained in this document requires Agency agreement on two themes that depart from
this traditional approach: (1) more attention and resources are needed to develop and utilize tools which support
decision-making by organizing our information, (2) this effort needs to be systematic, multi-program in scope, and
focused on leveraging resources to address agency-wide as well as specific Program and Regional needs. The
second theme implies a commitment to having many Program applications used for decision making use the data
warehouse rather than a Program database.
• Agreeing on Scope, timelines and transition to decision support services. As an essential element of commitment
to a vision of enterprise-wide decision support, agreement is needed on the scope, timeline for implementation,
and transition plan for this function.
• Addressing decision-support resource issues. As envisioned, this decision support function is a new agency-wide
commitment. Current costs for this activity are often folded into on-going development and operation of the
systems from which the decision support tools obtain the data. Separating this cost from these legacy systems,
and funding agency-wide services meeting this function, will require careful consideration from senior leadership.
Model for Information Integration — 5.0 EIA Use 43
-------
NEXT STEPS
1. Building a registry of current decision support tools available to the Agency. It is proposed that this be part of the
Information Resource Registry System, and build on tools identified in EPA Public Access Tool Inventory man-
aged by OEI and the tools identified in the Geospatial Baseline Assessment (EPA(9), 2001). This is necessary to
understand our current assets and to focus on tools that might be expanded for agency-wide use, thus leveraging
our existing knowledge.
2. Developing options for a decision support architecture - essentially an Agency applications architecture, and for a
transition plan that can achieve this in a reasonable and timely fashion.
44 Model for Information Integration — 5.0 EIA Use
-------
6.0 ENVIRONMENTAL INFORMATION ARCHITECTURE -
FOUNDATION
So far, this document has introduced a series of IT functions and a proposed set of "core component" technologies,
policies, plans, and services. In order for these components to be interconnected and interdependent, they must be
designed according to a blueprint (the enterprise architecture) and implemented and managed in accordance with
standards and policies. The Foundation component, thus, is EPA's enterprise architecture itself, and the standards and
policies that govern IT development and management. The Foundation is the "glue" that links the IT functions and
connects core components together in a meaningful way.
This section briefly describes some of the history of standards, policies, and enterprise architecture development at
EPA; the process for developing an enterprise architecture; and a more detailed description of how EPA's architecture
effort is divided. It then presents some of the key standards and policies framed within the layers of an enterprise
architecture that are pivotal to systems interoperability and data integration.
6.1 BACKGROUND
As discussed in Chapter 1, EPA's information and information technology has largely been divided and independently
managed along programmatic lines. Over the years, declining budgets, the need for information that crosses program-
matic boundaries, government-wide IT policies, and stakeholder demands have driven the Agency to develop and
adopt policies and standards to ensure consistency in some aspects of technology design, implementation, and manage-
ment
For the purposes of this document, a standard is a set of criteria, or a convention (FAWG, 2001) used to maintain
consistency across multiple entities. For example, EPA has a data standard for calendar date representation in Agency
information systems. A policy is a statement that is binding on entities within its scope. At EPA, some standards are
expressed as policies through the Agency Directives System. Policies help to maintain consistency and compliance
within an organization. Standards enable interconnection of processes, applications, and information (Cook, 1996).
Typically, standards and policies are developed with input from parties affected and carry with them some penalty for
violating them (Spewak, 1992).
In general EPA's information services organization, (initially the Office of Information Resources and Management,
and over the last two years, the Office of Environmental Information) leads Agency-wide processes to create IT-related
policies and standards. These policies range from a topic as broad as, "System Life Cycle Management," (EPA( 19),
1994) to something as narrow as the, "Personal Use of Agency Equipment" (EPA(16), 1998).
The development of these policies and standards has been an important initial step towards linking systems, integrat-
ing information, and streamlining processes. For example, EPA has developed and approved a Facility Identification
standard which has been instrumental in enabling an integrated view of a single facilities' performance across several
environmental Programs. Another is the policy to use Lotus Notes as the Agency's standard for e-mail communication.
While burdensome for some to move to a new system, the standardization of e-mail software has made intra-agency
communications more efficient and led to significant cost savings.
However, the IT policies and standards to date have been created in the absence of a blueprint describing how the
Agency will collectively manage its information and technology assets in support of the Agency mission and goals. In the
absence of an enterprise architecture, IT policy and standards setting has been slow, piecemeal fashion, yielding limited
gains in integrating processes, data, and systems.
Model for Information Integration — 6.0 EIA Foundation
45
-------
6.2 CRITICAL FEATURES AND OPERATIONS OF THE FOUNDATION
As indicated in Table-7 there are a number of activities that are common to the enterprise architecture and standards
and policy development. Common to all of these is the use of collaborative processes, involving multiple stakeholder
groups for development and implementation.
Foundation Component Activity Example
Enterprise Architecture
Standards
Policies
Development/Defining
Planning
Management
Identifying business need
Development
Implementation
Identifying business need
Development
Implementation
Baseline, target architectures
Sequestering plan
Overseeing implementation
Facility ID standard
XML standards
Security policy
Technology policy
Table 7: Foundation Components and Activities
ENTERPRISE ARCHITECTURE
An enterprise architecture is simply a definition of an organization's business and a description of the processes, data,
applications, and technology that support it (Spewak, 1992). Typically enterprise architecture efforts start by develop-
ing a baseline - a portrayal of the existing business. Once the baseline is complete, the "to be" or target portrayal of
processes, data, applications, and technology is undertaken and generally captured in an enterprise's strategic thinking
(FAWG, 2001). The strategy for transitioning from the baseline to target is the sequencing plan which includes a
schedule of multiple, concurrent, and incremental builds that evolve the enterprise (FAWG, 2001).
EPA's enterprise architecture will:
• Allow the CIO and Quality and Information Council (QIC) to make more informed IT investment decisions;
• Enable data integration by documenting the desired relationships among EPA's applications and data stores;
• Improve interoperability of EPA's applications by minimizing the number of system interchanges through reliance
upon standards and common data repositories; and
• Allow the Agency to respond more quickly to changing business requirements by establishing direct relationships
between its IT portfolio and business functions
As depicted in Figure-11 EPA's architecture allows the Agency to ensure its investment in information technology is
aligned with the Agency's mission and supports the Agency in its quest to meet its strategic goals.
The enterprise architecture planning process provides a methodology to break down and model the Agency along 5
distinct (horizontal) layers: business, data, applications, technology, and security. Each architectural layer will be
defined following a standard modeling protocol, resulting in a baseline and target environment model architecture being
46
Model for Information Integration — 6.0 EIA Foundation
-------
defined by the Agency enterprise
architecture team (See text box
below).
To accomplish this in manageable
components, EPA's enterprise
architecture development will
happen in three semi-concurrent
projects focused on three major
business domains of the Agency:
the environmental programs of the
agency (the Environmental Informa-
tion Architecture), the Agency's
large research and development
function (the Research and Devel-
opment architecture), and the
administration and finance functions
(Administrative Systems Architec-
ture).
Conceptual Framework
Environmental Business
Architecture
Information Architecture
Goals
Agency
Processes
Data
Applications
Technology
5
6
7
8
9
10
EPA's 10 Strategic Goals
Clean Air
Clean and Safe Water
Safe Food
Preventing and Reducing
Pollution
Effective Waste Management
Reduction of Global and Cross
Border Pollution
Right to Know Initiatives
Sound Science
Deterrent to Pollution and Greater
Compliance with the Law
Effective Management
Figure 11: EPA Enterprise Architecture Framework
/
\
Business
Data/Metadata
Applications
Technology
Security
Definition of EPA's Architectural Layers (EPA, 2001)
Models of functional areas (e.g., permitting, compliance, and monitoring), business processes
within the functional areas, and relationships between those functions and processes across the
Agency
Models of EPA's information holdings to identify holdings, what they are used for, and where
they are housed.
EPA's environmental and non-environmental information systems
EPA's network resources (hardware, network, non-system software) that enable the Agency's
applications
EPA's information security concerns related to business process, data storage and access,
system-level controls, and EPA's network and communications.
The business process, data, and applications architecture will be individually defined in FY2002 for each these three
business segments, and merged together later on. The technical infrastructure and security architectures, inherently
'enterprise' in function, will be developed across all the three business domains from the start. In addition to develop-
ing an applications architecture for the Agency's traditional database environment, the architecture planning process
must also focus on the unique needs of geospatial information and the document management needs of the Agency. For
each information media (database, geospatial, and document) architectural 'views' will be created. Since much work is
underway in all of these areas, the enterprise architecture planning effort will serve as unifying core, thus its foundation
status. Figure-12 depicts the three dimensional inter-relationships of the various architectural components.
These architectural layers provide a framework for identifying and developing policies and standards needed for
Agency wide integration whether it be at the business process, data, applications, technology, or security levels.
The rest of this section highlights a few key standards and policies that can be closely associated with the data,
technology, and security layers of EPA's enterprise architecture
Model for Information Integration — 6.0 EIA Foundation
47
-------
DATA STANDARDS
Within the Foundation key data
standards are identified, developed and
implemented. As depicted in Figure-13
data standards are developed through a
collaborative process involving Agency
Programs and State data partners.
EPA, as a participant in the Environ-
mental Data Standards Council, has
developed and approved six key data
standards including:
• Facility identification standard
• Biological taxonomy data standard
• Chemical identification data stan-
dard
• Date format standard
• Latitude/longitude standard
• Standard Industrial Code (SIC)/
North American Industry Classifi-
cation System (NAICS) standard
In FY'02, EPA is working to finalize
and approve standards for:
• Enforcement and Compliance;
• Permitting; and
• Tribal Identifiers.
TECHNOLOGY POLICY
EPA EA Design Methodology Framework
EPA EA Conceptual
Framework
Strategic Architecture
(Mission, Vision, Goals, Pert Measures)
Federal Business Arch;
-------
EPA Data Standard Process
Stages
Responsible Party
Proposal
| Submit Request for Data Standard |
Development
Implementation
I I-
Review
No —
Yes
| Form Subject Matter Action Team |
[ Review Existing Standards for Adoption |
| Develop Data Standard | .
Develop Business Rules Resolve Issues |
Yes
Post Business Rules in EDR
| Implement Data Standard in EPA Systems |
update
Yes
Figure 13: Data Standards Setting Process
The Systems Life Cycle Management Policy will define the process which Programs must follow in deploying
information technology. The SLC policy will dictate an up-front architecture check-in phase in the systems devel-
opment process to assist Programs in ensuring that their systems development projects are architecturally compli-
ant from the project initiation phase.
A Capital Planning and Investment Control Policy will codify that Agency information technology develop-
ment projects be formally reviewed prior to funding, and thus provide the Agency the ability to ensure that new
projects are architecturally compliant with the integration vision and clearly linked to Agency mission. While the
CPIC process is not new, this policy will institutionalize its enforcement. In addition, the budget threshold for
project inclusion in the process is expected to drop significantly in the next two years, thus assuring that the major-
ity of the Agency's projects are routed through the process.
Other new polices under development such as EPA Standards of Behavior for Security of Information
Resources, PDA Policy, Telecommuting, and Remote access all reflect how changing business paradigms
affect the technical infrastructure. These policies must remain current with the integration vision. For example, the
proliferation of new hand-held computing devices will dictate architectural specifications to the design of the
System of Access. The relationship of new workforce habits and new technologies, must be coordinated to reach
the integration vision. All of this must remain current with Agency policy.
Model for Information Integration — 6.0 EIA Foundation
49
-------
SECURITY POLICY
Key concerns around security dictate the issuance of new security policies. By design, the integration vision for the
Agency affords the opportunity to 'design-in' security (e.g., requiring inbound data flows to come through CDX.)
Increasing access to information must always be balanced with increased security measures.
Recommended security policy will include:
• Use of CDX for inbound data flows - will allow a consistent application of security measures at the CDX portal,
versus many Programs creating individual portals and adding diversity in the deployment of security protocols.
• Use of "System of Access" for accessing EPA data collections - will control and limit the various routes into the
Agency. Of course, exceptions will occur and must be planned for.
• Intrusion Detection and Perimeterization policies are needed. As more and more data are available via the System
of Access, the potential damage that someone could do increases. Policies must be established to require imple-
mentation of practices that alert appropriate personnel when an unauthorized person enters the Agency network,
and once detected, we must be able to immediately stop the intruder from going any further by establishing a
secure perimeter around the intruder (i.e., perimeterization).
PRIMARY SUPPORTING PROGRAMS AND WORKGROUPS
Several existing EPA Programs and workgroups support facets of the Foundation component including:
• Enterprise Architecture Program - OEI's Office of Technology, Operations, and Planning manages the
Agency's enterprise architecture planning process.
• Technology Architecture Change Management (TACM) process - The Agency has several mechanisms in
place to manage the deployment of new technologies into the Agency. These existing mechanisms will be utilized
in transition to the future target enterprise architecture. Given the rapid pace of technology change, we must have
consistent technology standards upon which to build our infrastructure and a process to update them. The
Agency deploys the Technical Architecture Change Management (TACM) process to conduct research on new
technologies, and to manage the implementation of new technologies into the Agency. A significant example is the
desktop conversion task of deciding EPA's future direction in providing word processing, spreadsheet, and
presentation functionality to EPA employees. These IT standards are housed in the IT Roadmap, a core compo-
nent of the enterprise architecture.
• Data Standards Program - OEI's Office of Information Collection manages this program which is responsible
for leading the collaborative identification, development and implementation data standards.
• XML Technical Advisory Group (XML TAG) - A cross-agency ad hoc workgroup that helps set standards
related to XML. The purpose of the workgroup is to help address issues and influence policy related to the use of
XML for Network and non-Network exchanges of data and information.
• Security Program - Managed out of OEI's Office of Technology, Operations, and Planning The Technical
Information Security Staff (TISS) defines and oversees the implementation of security policy the EPA. Through
the recent creation of a Security Program, the management aspects of handling security will complement the
development of a security architecture.
50 Model for Information Integration — 6.0 EIA Foundation
-------
6.3 BENEFITS OF THE FOUNDATION (ARCHITECTURE, STANDARDS, AND POLICY)
Up to this point, component benefits have been presented according to how each saves the Agency money, improves
the quality of data, makes data and information easier to use, and is responsive to the needs of EPA employees, as well
as external stakeholders. Clearly, the enterprise architecture, and its associated transition plan have the potential to
achieve all of these simply by the comprehensive, global nature of this activity, and the clear, deliberate linkage between
IT planning and Agency mission.
Data standards and policies, as discussed, produce uniformity and compliance across an entity. Implementing
policies and standards that enable EPA's organizational units to work in synchrony, whether it be in areas of data collec-
tion, processing, storage, access, or use, will produce these desired outcomes. For example, standardization of the way
in which key data elements are captured in EPA systems will go along way towards improving the quality of that data
type and making it easier to use/integrate. This is not to suggest that everything be standardized, instead standards and
policies that govern IT management must be set within the overall context of overall Agency direction and the IT plan,
i.e, the enterprise architecture.
6.4 ISSUES AND NEXT STEPS
• Management constructs for institutionalizing the enterprise architecture planning process need to be
established. Recent communications from the Inspector General have pointed to weaknesses in EPA's ability to:
implement an enterprise architecture planning process, establish sufficient authority for enterprise architecture
approval, and effectively establish an integrated (enterprise) program management approach to our information
systems planning. They further point out, as aa consequence of not having an enterprise architecture, our ability to
properly secure our IT environment, or to make appropriate IT investment decisions is thus weakened. Steps are
currently being taken to address Senior leadership steering roles and to ensure that the Capitol Planning and
Investment Control (CPIC) process is based upon the architecture. These pieces are essential to make the integra-
tion vision a reality.
• Implementation of agency-wide standards and policies more even and consistent. Standards and
policies are the "glue" that will hold the implemented EIA together. A number of standards and policies exist at
EPA, however their implementation status in systems and datasets is uneven across the Agency. Adherence to
standards is still largely up to system or dataset owners. A more reliable, systematic approach is needed to make
the use of standards, pseudo-standards, and policies more consistent across the enterprise. The bottom line is that
in an integrated environment, where multiple sets of users are dependent on data collected from a variety of
sources, it is imperative that standards be rigorously applied and enforced if the system is to achieve its intended
use. Completing the EIA and using it as a policy development framework is intended to facilitate this process.
Secondly, codifying when standards and policies take effect in Systems Life Cycle Management Policy will also
serve to harmonize implementation of standards.
• Administration of Core Services. Several corporate applications are being developed (CDX, Registries,
Enterprise Repository, System of Access) and there is no clear management plan. OEI must decide how best to
manage their operations and maintenance being cognizant of the distinction in roles between the business process
of managing data, and the technical management issues of supporting the technical environment and database
environment. In the Registries and Enterprise Repository, a new paradigm of shared-data management responsi-
bilities will exist for the data. This must involve the Programs. Yet managing the database environment may be
best handled as an OEI responsibility. Currently, operating the technology of centralized applications happens in
various offices within OEI, mostly outside the domain of the Office of Technology Operations and Planning.
Clearly, a preferred future management strategy should be explored and considered. Since many of these central
Model for Information Integration — 6.0 EIA Foundation 51
-------
services are on their third year of Systems Management Fund (SMF) funding, that discussion needs to happen in
FY2002.
• Need to improve IT competencies of the Agency IT workforce. One of the core responsibilities identified as
required by the Clinger-Cohen Act (CCA) is the insurance that agencies maintain appropriate and current IT
competencies within the workforce. The basic architectural concepts behind the integration vision are based upon
current industry trends in data warehousing. Yet, many of our systems managers don't have the time to keep
current in these areas. Our approach to implementing new technologies may result in sub-optimal benefit if EPA is
not aware of the current knowledge on a wide breath of information management issues. For example, simply
migrating data flow formats to XML and not looking at new paradigms of information sharing opposed to 'feeding
EPA information' is functionally 'paving the cow path' — something OMB clearly warns agencies against doing.
The IT Workforce Development team (OTOP) should prioritize training in appropriate areas.
• Trust, competence, and standards of service. If system owners begin to use centrally provided data services, or
retool their out year planning assuming a core component is there, it must be there. System owners must be
assured of reliable, secure, and efficient service, else the option to go outside the system will always be the pre-
ferred path. This is both in terms of systems performance and customer service.
• IT Sequencing Plan must coordinate with transition towards Exchange Network - Much unnecessary expense will
occur if individual systems are retooled towards complying with new internal EPA architecture policies separately
from when they retool information exchanges with external partners. While there will be a push in both directions,
careful coordinated planning is recommended to minimize transition costs where feasible.
Because of the centrality of the Enterprise Architecture and of the development of the EIA component, it is useful to
detail some of the specific next steps required of the Enterprise Architecture Team in FY2002. These include:
• Baseline Architecture
- Complete the EIA baseline architecture
- Review and analyze the baseline architecture - ensure that the business process, data, and applications
architecture are accurate.
• Technical Options Research
- Conduct technical research on applications architecture conceptual structures.
- Conduct research on metadata architecture strategies, and identify appropriate approach commensurate
with evolving target architecture.
• Conduct technical research on system of access options.
- Specifically research Customer Relationship Management (CRM) strategies,
- Examine portal development options.
• Target Architecture Development
- Begin EIA target architecture via series of collaborative architectural development sessions with national and
52 Model for Information Integration — 6.0 EIA Foundation
-------
Regional Program professionals.
- Develop the target E1A in three major phases for the duration of FY2002.
- For each phase, define the business processes and data models for the Enterprise Repository, define
options for Program system linkage to core Registries, define sequencing plan for CDX alignment.
- Seek outside peer-review assistance.
Transition Management
- Develop a transition plan (Sequencing Plan) to move EPA from its current state to the future vision.
- Develop a change management process for the Enterprise Architecture and specifically EIA components.
- Develop a configuration management strategy for core EIA components.
- Develop a performance metrics strategy for EA/EIA implementation
- Develop OE1 assistance strategy for programmatic migration to the EA/EIA.
Model for Information Integration — 6.0 EIA Foundation 53
-------
7.0 SUMMARY OF ISSUES AND NEXT STEPS
7.1 OVERVIEW
The preceding chapters outlined a vision for the integration of EPA information resources. This model builds on the
major investments the Agency has already made in to achieve integration, and introduces a number of new concepts and
mechanisms that have significant implications for the organization, management, and funding of the Agency's information
activities.
This document builds on the Agency's existing commitment to a Central Data Exchange to meet a variety of Agency
and data partner needs through participation in the National Environmental Information Exchange Network. It goes well
beyond the concept of registry as embodied in the Facility Registry System and Substance Registry by proposing a
system of linked accessible registries in a new Agency resource called the Enterprise Repository. The Enterprise
Repository also contains an Agency Data Warehouse, which while leveraging the functionality and expertise of
Envirofacts, significantly extends the centrality and scope of this storage and access mechanism.
This vision explicitly addresses how information is used to meet business needs by proposing decision-support tools
as a distinct information management function. It links quality and consistency concerns with this emphasis on decision
support by proposing that the tools associated with this function be directed to the Agency data warehouse, not pro-
grammatic databases.
This vision builds on our existing innovative Web site services and desktop functionally by proposing access, internal
as well as with the public and data partners, as a major objective and function of an integrated architecture. Finally, this
vision proposes a significant change for Program and Regional systems within an integrated Enterprise Architecture.
As noted in the individual chapters, the success with which this vision is implemented within the Agency depends on
the ability of EPA decision-makers to address and resolve a number of key issues. The following list contains those
issues that are common across most of functions and components proposed in this document for EPA's Environmental
Information Architecture:
1. Agreement on the nature and scope of integration components. As summarized above, this document
proposes both new mechanisms for integration and extends existing mechanisms in important ways. In each case,
senior management acceptance of, and willingness to work for detailed design and implementation of these mecha-
nisms is critical for success. This must start with agreement of the nature and scope of each of these mechanisms
including agreement on the implications for existing systems and for modernization efforts.
2. Agreement on general timeframe for implementation and transition. EPA senior managers need to agree
on an overall time frame for implementation and on separate timelines for progress on each of the major compo-
nent area. This commitment will ensure sustained progress on multiple components of this vision. Closely linked to
agreement on the time frame for implementation is agreement on how the transition from our current operations to
the target Enterprise Architecture will occur. This includes collaborative development of outcomes and milestones
to track progress. For example, it is difficult to track progress across the core components against a common set
of Programmatic Systems.
3. Agreement on Governance and stewardship functions. This vision assumes that responsibility, stewardship
and control can be exercised separate from direct management of each information processing step. It assumes as
well that Program business needs can be met effectively by Agency-wide services. Neither assumption sits well
with a parochial view of the world. This vision will require active effort to develop organizational arrangements
54
Model for Information Integration — 7.0 Summary of Issues and Next Steps
-------
that work adequately from the outset and can be quickly and smoothly adjusted to reflect experience gained and
changing circumstances. Agreement on these organizational arrangements by senior managers and sustained effort
on their part to implement these within their Programs and Regions is central to the credibility of the overall effort.
4. Agreement on Resources Issues. The transition to core component services like the Central Data Exchange
or an Enterprise Repository has the potential to make more efficient use and extract greater value from limited
resources. However, the transition also raises three important resource issues. In order to achieve an integrated
IT environment, a resourcing strategy is needed to sustain it.
• Funding Core Components. Many of the primary supporting projects in this model are investments
funded through OEI base budget, Agency Integration funds, and Systems Modernization fund. As these
core components take on more customers, add services, and mature into an operations & maintenance,
they may require a modification or concomitant adjustment of funding sources. Working Capital Fund
stands as the current solution.
• Component Service Disincentives. During the transition to core component services, some Programs
will have to fund "legacy processes," as well as pay for the use of new core component services, and
provide staff to help negotiate the transition. These circumstances make transition to core component
services a challenging prospect. A budget and resource strategy should recognize these circumstances
and find ways to encourage the transition and make it more equitable.
• Tracking IT Costs. The same IT functions presented in this document (Collection, Process & Stage,
Storage, and Decision Support) are generally carried out in association with each independently man
aged system. The discrete costs associated with these functions, however, have never been explicitly
tracked and are likely to be bundled under each system's operation and maintenance costs. Identifying
these costs would enable objective cost/benefit analyses for integrating with core component services.
These discrete costs would also enable benchmarking of what it will take to design, implement, and
sustain core component services.
• New Costs. New costs, for example to sustain an enterprise portal or decision support services, are
implied by this model. These new costs will result both from the need to expand functionality of exist
ing systems (e.g., CDX) as well as the need to create new mechanisms for integration (e.g., the System
of Access) Once expanded or created, these Agency resources will need to be operated, maintained, and
tracked.
7.2 MANAGEMENT ACTIONS
To address the issues raised above, a number of management actions must be undertaken. Listed below are the
major steps required, each of which may entail several planning and implementation activities.
• Determining the process by which each of the issues noted above will be addressed and the timeframe within
which this should occur.
• Finalizing the management plan for internal EPA integration and obtain Agency agreement on it. (The plan should
address the specific issues and products noted in this section)
• Finalizing how the integration effort will be managed within OEI and across the Agency. This includes the critical
need to address how governance and stewardship will be exercised
Model for Information Integration — 7.0 Summary of Issues and Next Steps 55
-------
Completing the Enterprise Architecture and the Agency-wide plan for transition to it.
Several information management activities require some form of senior management stewardship: integration/
architecture, quality assurance, geospatial information management, and administrative systems architecture to
name a few. Senior management must develop a more streamlined, holistic approach to more efficiently govern
these efforts and to encourage connections, where appropriate.
56 Model for Information Integration — 7.0 Summary of Issues and Next Steps
-------
Active Metadata — Metadata used by online tools to perform operations, such as validating fields mentioned in
queries.
Decision Support Tools — Any device which enables analysis of many units to aid in learning, discovery, and prob-
lem solving.
Application—The term application is a shorter form of application program. An application program is designed to
perform a specific function directly for the user or, in some cases, for another application program. Examples of appli-
cations include word processors, database programs, Web browsers, development tools, drawing, paint, image editing
programs, and communication programs.
Business Intelligence — A series of components that capture organizational data from disparate sources and presents
it to users in a user-friendly format.
Centralized Data Registries—EPA's core "registry" systems: Facility Registry System (FRS), Terminology Registry
System (TRS), Substance Registry System (SRS), Environmental Data Registry (EDR), and Environmental Information
Management System (EIMS).
Component—A technology, policy, plan, or service that enables more efficient and effective management of informa-
tion resources and supports EPA's participation in the National Environmental Information Exchange Network
Data—In science, data are a gathered body of facts. In computing, data are information that has been translated into
a form that is more convenient to move or process.
Database—A database is a collection of data organized so that its contents can easily be accessed, managed, and
updated. The most prevalent type of database is the relational database, a tabular database in which data are defined so
that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dis-
persed or replicated among different points in a network.
Datamart—A subsets of a data warehouse customized to address a business need and support a specific tool.
Data Schema—The structure of data in a database. Usually indicated by showing data fields, field formats, field
attributes, and their relationships to each other.
Dataset—Named set of data with formatting (data schemas) and content.
Data Staging—Data cleanup, data schema extraction, and loading of the dataset into an Enterprise Repository
database.
Data Warehouse—A persistent collection of summary and detailed data whose data schemas are fully coordinated.
Data Warehouse Services — A centrally coordinated set of databases that enables integration and standardization of
cross-media, cross-program data for environmental analysis and reporting.
Enterprise Repository — Services providing access (including queries) to datasets accessible by a common access
mechanism, and whose requirements for data integration meet the more stringent requirements of a data warehouse.
Model for Information Integration — Appendix A
57
-------
Enterprise Technical Architecture—A comprehensive series of principles, guidelines, diagrams, and standards that
enable an organization to align the acquisition, development, and coordination of its information technology (IT) assets
with its business goals and functions
EPA Enterprise Portal—An interface through which people and organizations access electronically the EPA and its
services.
EPA Environmental Information Architecture (ELA)—The collection of services, processes, data, and infrastruc-
ture supporting EPA internally and its external stakeholders.
Exchange Network Services—Provides the operational capabilities required for participation in the Exchange
Network
EPA Node—The collection of services, processes, data, and infrastructure supporting EPA services for the Exchange
Network.
Foundation—A collection of services and policies needed to implement/maintain the EIA. For example, it addresses
security planning, use and management of metadata, data standards, use of XML, and consistency with EPA's enterprise
architecture.
Information—Information is data presented to meet user expectations. Data presentation must be user friendly and
impart some meaning to the data.
Holdings—Collection of what parts comprise the EIA, and also a directory to all items in which users of the EIA are
interested.
Integration—The unification of processes, data, applications, and technology, either logically, physically, or in
combination to achieve efficiency and more effective use of information and technology.
Metadata — Data about data.
Metadata and Holdings Catalog — Contains information about EPA's data assets. Provides a single source for
tracking metadata
Node (Exchange Network)—A participant's single, managed portal for providing and receiving information via the
National Environmental Information Exchange Network.
Node Catalog (Exchange Network) — Information and associated network metadata (e.g., trading partner agree-
ments, description of the information) available at an Exchange Network node.
Passive Metadata—Metadata, such as documentation, is not used directly by tools. It is usually for human refer-
ence.
Platform—The operating systems, database management systems, and/or computers undergirding an application.
Policy—A statement that is binding on entities within its scope.
Portal — (1) a heterogeneous set of services available at a Web site; (2) a gateway to services
58 Model for Information Integration — Appendix A
-------
Process — A baseline set of steps to accomplish a task or perform a service
Program/Lab/Regional Systems—Those systems used by individual Program Offices, Laboratories, and Regions to
accomplish their particular missions or goals.
Pseudo-standard —A convention that has evolved into use at EPA, but which has yet to receive official approval by
EPA.
Registry — (1) An official and authoritative list of specific, well-defined items of interest to an organization; (2) A
specially compiled index for finding related items grouped by subject area; (3) A source of metadata.
Registry Services—Provide services for accessing and using authoritative lists of names or identifiers.
Service—A conceptual capability provided by an entity.
Staging—Preparing data for loading into a repository or dataset. Data are formatted consistently, content corrected,
and metadata extracted. Usually this is part of data quality engineering.
Standard — A set of criteria, or a convention.
System—An actually implemented entity.
System of Access — A tool that will allow customers (Public, Partners, and EPA Employees) to locate and access to
Decision Support Tools according to their authorization level
System of Registries — See Centralized Data Registries.
Tool—A device or capability that aids in accomplishing a task. In this document, tools are used synonymously with
"application."
Model for Information Integration — Appendix A 59
-------
"America the Unready." The Economist( 1). 22 December 2001:25.
Carr, Judith. AIT Governance: Models for E-Government.@Gartner Commentary on Government Gartner RAS
Services. Online. GartnerGroup, Inc. 18 Oct 00.
Clinger-Cohen Act of 1996 (formerly, Information Technology Management Reform Act [ITMRA]), Public Law 104-
106.10 February 1996.
Cook, Melissa. Building Enterprise Information Architectures. Reengineering Information Systems.
Upper Saddle River, NJ: Prentice Hall PTR, 1996.
Drucker, Peter, F. The Essential Drucker, Selection of Management Works of Peter Drucker. New York, NY: Harper
Collins Publishers, 2001.
Federal Architecture Workgroup (FAWG). Enterprise Interoperability and Emerging Information Technology
Committee of the Federal Chief Information Officer Council. A Practical Guide of Federal Enterprise Architecture
Version 1.0. Washington, DC: February, 2001.
Forman, Mark. "Achieving the Vision of e-Government." Quicksilver Task Force Meeting. Washington, DC. 03
August 2001.
Inmon, W.H., Imhoff, C, Sousa, R. Corporate Information Factory (2nd ed.), New York: John Wiley and Sons, 2001.
Kimball, Ralph, Reeves, L. Ross, M. and Thornthwaite, W. The Data Warehouse Lifecycle Toolkit, New York: John
Wiley and Sons, 1998.
Lal,Rashmi. Email description of TRI Explorer. 27 August 2001.
Merriam Webster's Collegiate ® Dictionary. Entry for, "integration." From Merriam Webster Online ©2002 by
Merriam-Webster, Incorporated publisher of Merriam-Webster ® dictionaries.
Microsoft, Inc. Business Intelligence. How Agencies Can Breathe New Life Into Old Data. Online Article at http://
www.microsoftgovemment.com/bi/. As of 24 January 2002.
Phifer, G. Berg, T. "Portal:" The Most Abused Term in IT. Gartner Research Note. 25 September 2000:1.
Petruccelli, Kathy. "QIC Investment Subcommittee Update." Quality and Information Council Meeting. Washington,
DC. Ronald Reagan Building, 08 August 2001.
Sowa,J.F. andJ.A. Zachman. 1992. Extending and Formalising the Framework for Information Systems Architec-
ture, IBM Systems Journal, Vol. 31, No. 3.
Spewak, Steven. Enterprise Architecture Planning. Developing a Blueprint for Data, Applications, and Technology.
New York, NY: John Wiley & Sons, Inc, 1992.
60
Model for Information Integration — Appendix B
-------
SRA International, Inc. Blueprint for EPA Information Integration. Deliverable 3.3 under the Information Infrastructure
and Architectural Support Contract (68-W-99-038), Work Assignment #046.30 November 2001.
—.(2) Risk Assessment for the Interim Central Data Exchange (CDX) Facility, Deliverable 2-2, Work Assignment
Number 041,19 April 2001.
State/EPA Information Management Workgroup (IMWG). Blueprint for a National Environmental Information Ex-
change Network. 30 October 2000.
State/EPA Interim Network Steering Group (INSG). Implementation Plan for National Environmental Information
Exchange Network. (Draft). 11 January 2002.
Sullivan, John. Durman, Gene. "Proposed Target Architecture." Presentation at EPA IIP Retreat. Arlington, VA, 26
June 2001.
"Timely Technology." The Economist (2). 02 February 2002:5.
U.S. Environmental Protection Agency (EPA), Agency Information Integration: FY 2002 Priorities. Quality and Infor-
mation Council Briefing & Discussion. Washington, DC. Ronald Reagan Building, 17 October 2001.
—. (2) Business Operations Supported by Geospatial Tools, White Paper, 2001.
—. (3) The Costs and Benefits of the National Environmental Information Exchange Network. Phase One: Preliminary
Assessment of the Central Data Exchange and Selected Flows. Washington, DC: Office of Environmental Information,
November 2001.
—. (4)Data and Information Quality Strategic Plan. (Draft). Washington, DC: Office of Environmental Information,
January, 2002.
—. (5) Enterprise Architecture Planning Process: FY2002 Project Management Plan. Office of Environmental Infor-
mation, (Draft) 15 November 2001.
—. (6) Enterprise Architecture. Submission to U.S. Office of Management and Budget. Office of Environmental
Information, 29 March 2001.
—. (7) Environmental Indicators Initiative. Memorandum. Washington, DC: Office of the Administrator, 13 November
2001.
—. (8) FY 2001 Information Integration Initiative Management Plan. Washington, DC: Office of Environmental Infor-
mation, 07 December 2001
—. (9) Geospatial Baseline Assessment. Washington, DC: Office of Environmental Information, 29 June 2001.
—. (10) Information Agenda. Washington, DC: Office of Environmental Information, (Draft) 16 January 2002.
—.(11) Information Integration Program Management Plan. Washington, DC: Office of Environmental Information,
'Draft) 31 January 2002.
. (12) Information Quality Vision. Washington, DC: Office of Environmental Information, (Draft Presentation,
Date?
Model for Information Integration — Appendix B 61
-------
. (13) A Management System for Information Quality. Washington, DC: Office of Environmental Information, (Draft
White Paper), Date?
—. (14) Metadata Strategy for the Environmental Protection Agency. (Draft) Washington, DC: Office of Environmental
Information, 31 July 2001.
—. (15) An Overview of EPA's System of Registries, (Draft white paper), August, 2001
—. (16) Personal Use of Agency Equipment, 1998. (Policy) http://intranet.epa.gov/rmpolicy/im/equipuse.htm, as of 19
January 2002.
—. (17) Public Access Strategy. (Draft) Washington, DC: Office of Environmental Information, 24 January 2002.
—.(18) REI: A Plan for Change-Our Commitments (Action Plan)02 December 1997
http://www.epa.gov/reinvent/onestop/acplan/plan.htm as of 18 January 2002.
—. (19) System Life Cycle Management, 1994. (Policy) http://www.epa.gov/irmpoli8/pohnan/, as of 19 January 2002.
—. (20) Window to My Environment: A "Geographic Portal" to Community-Based Environmental Information, White
paper, August, 2001.
U.S. Office of Management and Budget, Executive Office of the President. Guidelines for Ensuring and Maximizing the
Quality, Objectivity, Utility, and Integrity of Information Disseminated by Federal Agencies. (Final). Washington, DC: 28
September 2001.
62 Model for Information Integration — Appendix B
-------
What Other Registries Are Needed (EPA(15), 2001)
The list below provides some suggestions for types of registries that might be useful in supporting EPA's business
needs. Theses registries could be developed, in many circumstances, from information in part already present in OEI
systems. From an information management perspective, these may not necessarily be separate, standalone applications
but could be different views of common, shared information maintained in a central location as part of the Centralized
Data Registries.
Business Sector Registry
A sector registry is envisioned to provide a Standard Industrial Code (SIC)/North American Industry Classification
System (NAICS) crosswalk in support of the migration of Agency data from SIC to NAICS codes in response to the
Agency's SIC/NAICS data standard. This registry would be based on the North American Industry Classification
System (NAICS) and the Standard Industrial Code (SIC) code set. The registry would support analysis of EPA data
by industrial sector and would support anticipated further specialization of NAICS values. This registry could link to
the FRS and the Organization Registry.
Regulation Registry
This registry is envisioned to provide a crosswalk between statutes and regulations, based on data elements included
in the draft Enforcement/Compliance data standard. This authoritative listing of laws and regulations of interest to EPA
would support any data standard or program system dealing with enforcement/compliance issues. This registry could
link to the SRS. Much of the content required to initially populate this registry is currently stored in the TRS and the
SRS.
Geopolitical Registry
This registry could store location information including counties, states, townships, tribes, and congressional districts,
that would support maintenance of location information in most EPA program databases. This registry could link to the
FRS, the Place Registry, and the Organization Registry.
Organization Registry
This registry would provide an authoritative list of organizations within EPA (Regions, offices, laboratories) and
States, other federal agencies, tribes, and corporations that are doing business with EPA. This registry would have
linkages to most other registries, as it could store information on data stewards and submitting organizations participating
in the Exchange Network, as well as corporations associated with facilities in the Facility Registry System. Many of
these organizations are currently included in the EDR registries.
Model for Information Integration — Appendix C
63
-------
APPENDIX D — EPA'S NATIONAL "REINVENTING ENVIRON-
MENTAL INFORMATION (REI)" SYSTEMS
In 1998, EPA identified 13 national systems below as priorities for re-engineering efforts including data standards
implementation and modernization to accept electronic reports (EPA( 18), 1998). Building on progress made with
these systems the EIA will continue to use them as a foundation for further integration design and planning.
• AIRS Air Quality System (AQS)
• AIRS Air Facility Subsystem (AFS)
• Permit Compliance System (PCS)
• SRMP (System for Risk Mgmt Planning)
• Biennial Reporting System (BRS)
• RCRAInfo
• CR-ERNS (Continuous-Emergency Response Notification System)
• Safe Drinking Water Information System (SDWIS)
• Toxic Release Inventory System (TRIS)
• Water Quality Information System (STORET)
• National Compliance Database (NCDB)
• OECA Docket (DOCKET
• Envirofacts
64
Model for Information Integration — Appendix D
------- |