Model for Information
Integration


A Preview of the
ERA'S Target
Architecture [EIA]
Components of
    Information
    Of FICE OF
    ENVIRONMENTAL
    INFORMATION
      July
                           2DD2

-------
                                  Table of Contents

Preface
   Purpose of the Document	1
   Intended Audience	1
   Scope of the Document	1
   Relationship to Other IT Planning & Design Documents	2
   Structure of the Document	3
   Acknowledgment	4

Chapter 1:  Introduction
   1.1  Integration Defined	5
   1.2  Why an Environmental Information Architecture	5
   1.3  Progress	7
   1.4  Overview	9
   1.5  Benefits	12

Chapter 2:  EIA: Connect & Exchange
   2.1  Background	15
   2.2  Critical Features and Operations	16
   2.3  User Registration Services	16
   2.4  Data Collection and Exchange Network Services	16
   2.5  Access Services	18
   2.6  Benefits of the Enterprise Portal	20
   2.7  Issues and Next Steps	21

Chapter 3:  EIA: Process & Stage
   3.1  Background	23
   3.2  Critical Features and Operations	24
   3.3  Benefits of Integrating Program and Regional Systems	27
   3.4  Issues and Next Steps	27

Chapter 4:  EIA: Store for Use
   4.1  Background	29
   4.2  Critical Features and Operations	30
   4.3  Data Warehouse Services	30
   4.4  Registry Services	31
   4.5  Geospatial Data Services	34
   4.6  Metadata Services	35
   4.7  Benefits of the Enterprise Repository	36
   4.8  Issues and Next Steps	37

Chapter 5:  EIA: Use
   5.1  Background	40
   5.2  Critical Features and Operations	40
   5.3  Benefits of Decision Support Tools	42
   5.4  Issues and Next Steps	43
                             Model for Information Integration — Preface

-------
                              Table of Contents (continued)
Chapter 6:  EIA: Foundation
   6.1  Background	45
   6.2  Critical Features and Operations	46
   6.3  Benefits of the Foundation (Architecture, Standards, and Policy)	50
   6.4  Issues and Next Steps	51

Chapter 7:  Summary of Issues and Next Steps
   7.1  Overview	53
   7.2  Management Actions	54
Appendix-A: Glossary of Terms
Appendix-B: References
Appendix-C: Other Registries
Appendix-D: EPA's "Reinventing Environmental Information (REI)" Systems
                              List of Figures and Tables

Figure 1: EPA Enterprise Architecture Business Domains	1
Figure 2: Relationship of Architecture and Exchange Network Documents	2
Figure 3: June 26,2001 figure depicting components of EPA integration architecture	8
Figure 4: Information Lifecycle	9
FigureS: The Model for Information Integration	10
Figure 6: Enterprise Portal Manages Flows in and out of Agency	14
Figure 7: Operational Database Prepare Data for Storage and Use	23
FigureS: Enterprise Repository Decision Support Components	29
Figure 9: Envirofacts Information Warehouse	31
Figure 10: Decision Support Tools Use Data in the Enterprise Repository	40
Figure 11: EPA Enterprise Architecture Framework	47
Figure 12: EPA Enterprise Architecture Conceptual Framework (Expanded)	48
Figure 13: Data Standards Setting Process	49
Table 1: Alignment of Information Quality Lifecycle with Model for Information Integration	9
Table 2: EIA Core Components & Functions	11
Table 3: EPA Enterprise Portal User Sets	17
Table 4: Exchange Network Partner Requirements & Supporting Projects	18
Table 5: Current and Anticipated Use of CDX	19
Table 6: EPA's Registry Systems	33
Table 7: Foundation Components and Activities	46
                             Model for Information Integration — Preface

-------
PREFACE
PURPOSE OF THE DOCUMENT

  This document is to be used as a high-level architecture guideline for integration at the Environmental Protection
Agency (EPA). It proposes a series of "core components," i.e.,  technologies, policies, plans, and services that will
enable more efficient and effective management of information resources and will support EPA's participation in the
National Environmental Information Exchange Network. This document will also serve as a basis for business case
analysis, user requirements analysis, and other decision-making needed to complete the target Environmental Informa-
tion Architecture (EIA).

INTENDED AUDIENCE

  The intended audience of this document includes: (1) members of the Quality and Information Council; (2) managers
and staff who oversee Program system development and operations; and (3) Agency staff who will develop the EIA.
This document requires that the reader have some basic knowledge of Information Technology (IT) management and
enterprise architecture development.

SCOPE OF THE DOCUMENT

  EPA is in the initial stages of developing an enterprise architecture, i.e., a definition of its business and a description of
the processes, data, applications, and technology that support it (Spewak, 1992). This document addresses core
components in the EIA, one of the three business domains of EPA's overall enterprise architecture (Figure-1). When
complete, the EIA will define the processes, data, applications, and technology that support environmental management.
As depicted in Figure 1, the EIA is segmented further to reflect the Agency's major business functions. The EIA will
serve as a blueprint for future design and implementation of an integrated infrastructure.
                                                         Environmental Information
                                                             Architecture (EIA)
          Administrative
            Architecture
 Research &
Development
 Architecture
                                                            Ambient Monitoring
Substance Risk & Hazards
                                                             Pollution Sources
                    Figure 1: EPA Enterprise Architecture Business Domains
                               Model for Information Integration — Preface

-------
RELATIONSHIP TO OTHER IT PLANNING AND DESIGN DOCUMENTS

  Over the last two years, a number of key planning documents encompassing the Network, EPA's infrastructure, and
individual projects have been generated under the aegis of the IIP. Figure-2 arrays these documents to show their
relationship. The rest of this section briefly describes the purpose of each document and its relationship to the Model
for Information Integration.
                                     Information Agenda (Vision)
                  Information Integration Program Management Plan (Mgmt. & Oversight)
      Activity
      Architecture
      (Defining)
      Construction
      (Design &
      Implementation)
EPA Architecture (Internal)
   Model for Information Integration
   Geospatial Baseline Assessment
   EPA Enterprise Architecture-EIA,
   Administrative, Research and
   Development (R&D)
   Architecture Sequencing Plan
   Individual Component Systems
   Design, Implementation, or
   Modernization plans
National Environmental Information
Exchange Network (External)
   Network Blueprint
                                                                Network Implementation Plan
           Figure 2:  Relationships of Architecture & Exchange Network Documents
  Information Agenda (EPA(IO), 2002) sets a cross-Agency vision for how information and technology will be
managed to support EPA's mission and overall strategy. The Enterprise Architecture provides the blueprint for achieving
this vision.

  Information Integration Program Management Plan (EPA(ll), 2002) provides an umbrella structure for
tracking and linking the design and implementation of EPA's enterprise architecture with the National Environmental
Information Exchange Network.

  EPA Enterprise Architecture (EPA(6), 2001). EPA's enterprise architecture will define the business, information,
technologies, and transitional processes necessary to support Agency mission, strategy, and to respond to changing
needs. The Enterprise Architecture is broken into three business domains: Environmental Information; Administrative;
and Research and Development. This document defines the core functions and components of the target EIA.

  Geospatial Baseline Assessment (EPA(9), 2001). This document is a product of EPA's National Geospatial
Program, it contains an overview of geospatial data, tools, and technologies used throughout EPA and how geospatial
resources are used to implement the Agency's mission. As such it is a direct input to baseline EIA.  The Geospatial
Blueprint, due in Spring, 2002, will help link the three architectures together because geography (i.e., place) is a com-
mon variable to all three of the EPA enterprise architecture business domains.
                                 Model for Information Integration — Preface

-------
  EPA Project Documentation. This model uses existing blueprints, plans, and documentation for EPA systems,
Programs like the National Geospatial Program, and EPA's Data and Information Quality Strategic Plan (EPA(4),
2002).  For example, it highlights the Central Data Exchange (CDX) design to illustrate EPA's participation in the
Network and to inspire the design of a set of access services that complement and build upon the existing CDX ser-
vices.

  Blueprint for a National Environmental Information Exchange Network (State/EPA 1MWG, 2000) ex-
presses an initial conceptual model of the Exchange Network. It is an analog to this document. Both documents are
linked by way of the descriptions of EPA's node responsibilities as a partner on the Exchange Network.

  National Environmental Information Exchange Network Implementation Plan (State/EPA INSG, 2002)
provides a roadmap for building the Exchange Network. Contains a business case, implementation strategy, and
identifies key projects, objectives, and key milestones for tracking progress.

STRUCTURE OF THE DOCUMENT

  This document begins with an introductory chapter that provides an overview of EPA's IT environment. The first
chapter also cites drivers for integration, an overview of the conceptual model for integration, and illustrates the align-
ment with EPA's Information Quality Lifecycle. The remaining chapters are then organized by IT function and introduce
the core components for integration. These chapters introduce the function; background on how this functionality is
currently carried out; critical features and operations; and existing EPA projects and efforts that either directly or poten-
tially play a role.  Most importantly, these chapters highlight key technical and policy issues that must be analyzed,
debated, and decided upon as part of the development of the target EIA. The last chapter presents key technical and
policy issues associated with defining and implementing the EIA in the next few years. A glossary (Appendix -A), a list
of references (Appendix-B), Other Registries (Appendix C), and a list of EPA national systems (Appendix D), are
provided for the interested reader.

ACKNOWLEDGMENT

  The components proposed in this model evolved from two key preliminary architecture documents: (1) the FY2001
Information Integration Initiative Management Plan (EPA, 2000); and (2) March 2001 EPA Enterprise Architecture
submission to the Office of Management and Budget (EPA(6), 2001).

  The development of this document began with a June 26,2001 IIP Team retreat with over thirty EPA staff. In this
meeting a proposed target EIA (Sullivan, et al, 2001) was presented and discussed. The conceptual model presented in
this document is based on a number of interviews and meetings with project managers who oversee many of the
projects described in this document.

  This document builds upon the Blueprint for EPA Environmental Information Integration deliverable produced Con-
tract 68-W-99-038, Work Assignment #046.
                                Model for Information Integration — Preface

-------
The following EPA staff contributed to this document:
                 Primary Authors
        Heather Anne Case
        Linphord Darlington
        Eugene Durman
        John Sullivan
Bewanda Alexander
Jennifer Cranford
Evangeline Cummings
Diane Esanu
Patrick Garvey
Sara Hisel-McCoy
Barbara Jarvis
Matthew Leopard
Debra Villari
Jeffrey Wells
Jeffrey Worthington
                              Key Contributors
John Armstead
Wendy Blake-Coleman
Tim Crawford
Ron Decesare
Larry Fitzwater
Debra Forman
Steven Goranson
Bill Grabsch
Steve Hufford
Rashmi Lai
Bill Muldrow
Ling Wan
Dave Wolf
Steve Young
                           Model for Information Integration — Preface

-------
1.0 INTRODUCTION
  This document proposes a set of the core component technologies, policies, plans, and services that will enable
integration at EPA. It shows how these components support four high-level IT functions. This model is input to the
target Environmental Information Architecture (ELA). It is intended to be a vehicle for discussion and development over
the 2002 fiscal year.

1.1    INTEGRATION DEFINED

  Integration has a variety of meanings at EPA. In the context of systems, it is used synonymously with
"interoperability," i.e., the ability of a system to use the parts of another system. For processes, integration is synony-
mous with "consolidation," i.e, the process of uniting.

  In the context of data and information it has special meanings:

  Graphical Integration - Data elements co-located on a map, table, or statistical graph.

  Data Integration/Reconciliation - Linking a single data element across multiple programs or media (facility, chemical,
or substance) to reconcile differences in how EPA Programs/States collect the same data.

  Analytical Integration - Summation of multiple indicators into a single index of performance, compliance, or health.

  Database Integration - Combining data elements from multiple Program systems or databases into a single collabora-
tive data schema for cross-media, cross-program reporting.

  Merriam Webster defines integration as the ability to, "form, coordinate, or blend into a functioning or unified whole"
(Merriam Webster, 2002). For the purposes of this document, integration is broadly defined as the unification of
processes, data, applications, or technology.  Integration can be physical, virtual, or in some combination.
Data and databases need not be centralized to be integrated. For example, distributed data may be virtually
integrated through the use of multi-system queries. The overall purpose is to increase efficiency and enable
more effective use of data and technology resources.

1.2   RATIONALE FOR AN  ENVIRONMENTAL INFORMATION ARCHITECTURE (EIA)

  The target EIA will be derived from a thorough analysis of existing Agency goals, business processes, data, applica-
tions, and technology associated with environmental management. Once complete, it will enable an enterprise view, i.e.,
the ability to look across EPA's collection of information and technology resources, to better align these resources to
support Agency goals and operations.

HISTORY

  To date, an enterprise view has been hampered by the decentralized nature of EPA's Information Technology/Infor-
mation Management (IT/IM) environment. This arose largely from the manner in which the EPA Programs were
created. EPA's enabling statutes are generally single media in scope which has led to the independent program manage-
ment and systems development. EPA is not the single entity many outside its walls may perceive, but a disparate set of
environmental and administrative programs.
                        Model for Information Integration — 1.0 Introduction

-------
  These circumstances are common to private and public sector organizations. Historically, the falling prices of technol-
ogy and the growing ease in which technology could be implemented promoted decentralized IT management, typically
along organizational boundaries (Cook, 1996).  While this supports flexibility it can also lead to redundant processes, a
limited enterprise view, and use of incompatible technologies. These issues were recently illustrated at a national level
when U.S. government agencies scrambled to coordinate efforts to respond to the September 11,2001 terrorist
attacks. Since then many officials recognize and share a serious concern about the "lack of intelligence sharing by the
government" (The Economist(l), 2001).

  The effects are felt by EPA stakeholders, the public, and EPA employees. States and the regulated community have
to "feed" multiple Program systems, using a variety of transmission formats, to fulfill environmental reporting obligations.
The public, seeking information about their safety, receive data that mirrors EPA's organizational structure or falls short
of providing a complete picture of environmental conditions of interest. And finally, these circumstances have slowed
EPA from pursuing the cross-media environmental protection strategies necessary to address the emerging environmen-
tal problems. Seemingly simple queries to support, for example, a cross-agency chemical initiative are burdensome
tasks for EPA.

  The IIP mission is to transform the Agency's business and operations through the use of information technology and
policy. This will require evolution from our current environment to a distributed environment marked by a cohesive,
interdependent set of processes, data, applications, and technology. A distributed IT/IM environment is highly desirable
as it maximizes the flexibility of decentralized computing and the coordination advantages of a centralized approach
(Cook, 1996).

  The challenges that EPA faces are much like those the U.S. government must overcome to fuse different levels of
government together to produce a sound, cohesive base of intelligence. The problems we face in filling the gaps and
making the linkages, whether it be across government or across EPA, are caused by budget distribution, lack of coordi-
nation, and political will (The Economist(l), 2002).

  In order to achieve a distributed environment, the Agency must adopt data and technology standards, promote the
use of some centrally managed services and systems, and approach IT investment and decision-making with an, "All
For One - One For All," mindset. When organizations within an enterprise agree that it's not their area's data or
system, but instead resources to be shared by the enterprise, will the benefits of distributed computing will be realized
(Carr, 2000). The EIA is the mechanism for planning and managing this change.

FEDERAL REQUIREMENTS

  Many government agencies have undertaken enterprise architecture efforts to assess their IT resources and the extent
to which they support organizational goals. At the federal level, enterprise architecture efforts are conducted not only as
a good business practice, but to fulfill the requirements of the Clinger-Cohen Act which requires that Agency's Chief
Information Officers maintain and implement a sound integrated architecture (Clinger-Cohen Act, 1996).

  While this requirement has been on the books for over six years, many agencies have been slow to develop and
incorporate an enterprise architecture into their planning and investment processes. Because of this lag, the Office of
Management and Budget (OMB) has begun to monitor Agency IT investment budgets with a close eye towards redun-
dant technology investments and how an Agency manages those investments to accomplish their mission (Petruccelli,
2001).
                              Model for Information Integration — 1.0 Introduction

-------
E-GOVERNMENT

  Enterprise architecture has also received considerable attention in light of the Bush Administration's focus on "e-
govemment," defined as, "The use of digital technologies to transform government operations in order to improve
effectiveness, efficiency, and service delivery,." (Forman, 2001). The focus on e-government has been further expressed
as the "Quicksilver Initiative," managed out of OMB. Governed by the President's Management Council, the Quicksil-
ver Initiative rallies Federal agencies to "simplify and unify" processes and technology to deliver better government
services to stakeholders and the public. E-government funds have been allocated to federal agencies who apply technol-
ogy to enable: (1) intergovernmental exchange; (2) government to citizen services; (3) government-to-business services;
and (4) internal efficiency and effectiveness.

  This last aspect of the Administration's e-gov strategy, "internal efficiency and effectiveness," is receiving particular
attention. This is demonstrated in part by OMB's growing scrutiny of federal IT investments, in which perceived
duplicative funding is being questioned and in some cases rejected. It is also likely that this scrutiny will continue as IT
budgets shrink given the re-distribution of federal dollars to support emerging military and homeland security priorities.

  The development and management of an enterprise architecture is the means by which EPA will achieve "internal
efficiency and effectiveness." EPA's baseline enterprise architecture will be completed early in FY02 (EPA(5), 2001)
and will define existing business processes, data, applications, and technology. This "as is" inventory, analyzed within
the context of the Agency's mission, is basis for identifying redundancies and proposing IT configurations that will enable
the Agency to be more effective in achieving its mission.

  Efficiency, defined as "doing the job right," and effectiveness defined as, "doing the right job," (Drucker, 2001) need
both be the goals of EPA's target EIA.  It is important to note that streamlining redundant data collections, applications,
and processes will improve efficiency, however, it does not ensure that the Agency is "doing the right jobs" to be most
effective in carrying out its mission. For example, the QIC has endorsed the Central Data Exchange as the single portal
for all incoming environmental data flows (EPA(l), 2001). In some cases, Programs are even coupling the use of CDX
with consolidated reporting requirements (EPA(3), 2001). This generates efficiency gains in two areas: business pro-
cess and data collection. However, this decision does not ensure that the data collected, or the manner in which it is
collected, improves EPA's ability to carry out is mission. This situation underscores the need to examine architectural
options within the context of Administration priorities, governmental trends, and emerging scientific research so that
technology is used not just to "pave the cow paths" for efficiency's sake, but to use technology to transform the business
of environmental protection.
1.3   PROGRESS

  An initial vision for the EIA was described in the Enterprise Architecture submission to OMB, dated March, 2001
(EPA(6), 2001) and presented at the June 26,2001 IIP Team retreat to promote project coordination (Figure 3).

  The conceptual model presented in this document updates Figure 3. The systems identified in this figure are de-
scribed in terms their existing and potential functionality. For example, the CDX portal concept is expanded to include
a common set of registration services and access services which are described as an enterprise portal. Within a func-
tional framework this model also provides more detail on how the components interact.

  Also pertinent to this conceptual model is a set of principles approved by the QIC in October, 2001 (EPA(l), 2001).
Based on this approval Agency Programs are expected to:
                         Model for Information Integration — 1.0 Introduction

-------
  Partners
Exchange
 Network
EPA Integration
                                                      System
                                                        of
                                                      Access
                                                  (Internal/External)
                                                  Tools and Access/
                                                    Mechanisms
                                     Modernized
                                     Program and
                                      Regional
                                      Systems
                           Foundation
       Information Architecture and Data Standards
Figure 3: June 26,2001 figure depicting components of the
                EPA Integration architecture

        document.
•     Rely on a single portal (Central
Data Exchange) for all environmental
data flows. The Exchange Network
Blueprint calls for each partner to
have one node on the Exchange
Network.  In this model, the CDX is
the major part of an Enterprise Portal
concept. It serves to manage all
incoming data flows and exchanges
via the Exchange Network.

•     Adhere to data standards. Data
standards are the cornerstone of the
Exchange Network and an important
part of the foundation of the Environ-
mental Information Architecture.  EPA
currently has six data standards in
place. The idea that data standards
will be managed and enforced within
EPA is a guiding principle of this
   Rely on registries as the Authoritative Data Sources. There is a core set of data that is used repeatedly throughout
   the Agency and among stakeholders that forms the basis for most cross-media, cross-program analysis. This
   information includes items such as facility information and chemical information. Establishing one authoritative
   source for this type of data is an essential step in reducing duplication and incompatibility among EPA's data.
   Registries can also serve as automated normalizing agents to all standardized data received and transferred by the
   Agency. Information that varies from standardized protocol would be normalized or would include a notice
   indicating the discrepancy.

   Provide access to integrated Agency data and shared datasets. The ability to seamlessly access and analyze data
   from various Program Offices has long been considered an ability that would help EPA and environmental stake-
   holders get a better picture of the environment. This can be achieved by making the data that is needed for cross-
   media analysis easily accessible using standard access methods and protocols. In this model the System of Access
   and Enterprise Repository are proposed to support this principle.

   Consolidate and integrate systems consistent with the vision. All Programs are expected to continue to modernize
   their systems to meet their changing information needs. Just as this model seeks to promote integration across
   Programs, many Program Offices already have efforts underway to consolidate and integrate the systems within
   their offices.

   Rely on Enterprise Architecture for all IT planning and investment. The Enterprise Architecture will serve as the
   primary tool for planning and managing information technology. This document will be the basis for developing the
   target EIA.
                            Model for Information Integration — 1.0 Introduction

-------
1.4   OVERVIEW

INFORMATION LIFECYCLE

  At the outset, it is important to establish a basic framework
to identify and assess the IT functionality that supports data
management: from data collection planning through data use.

  The Agency recently drafted a Data and Information
Quality Strategic Plan which contains six major recommenda-
tions to address the Agency's overarching data and informa-
tion quality vulnerabilities (EPA(4), 2002). This plan estab-
lishes an information lifecycle as depicted in Figure-4. The
lifecycle is broken into five universal stages (planning - collec-
tion/analysis - assessment - transfer/storage - use) and steps
within the stages. Each step is further defined to help identify
vulnerabilities, develop corrective actions, and measure
quality.
Information Life Cycle Model
(or identifying data quality vulnerabilities
           Planning
         1 ..        2
       Figure 4: Information Lifecycle
  Across the Agency numerous processes, data, applications, and technology support each of the five major stages of
the information quality lifecycle. Integration can occur at many points, e.g., at the point at which reporting requirements
are consolidated, or at the point at which data are analyzed, e.g. the geographical integration through the use of a
geospatial tool. The model in this document presents 4 major IT functions that align with the later stages of the Informa-
tion Lifecycle: from the point at which data are collected by EPA from an external source (Transfer/Storage) to the point
at which it is used (Use), i.e. from Steps 9-12 in Figure #4.

  Table-1 shows how the IT functions described in this document align with the information quality lifecycle and mea-
surable quality characteristics.
Information Lifecycle Stage
Planning
Collection/ Analysis
Transfer Storage
- Data entry
- Transfer/Process
- Archive to master database
Use
Measureable Quality
Characteristics
completeness, correctness,
representativeness, validity, consistency,
variability, etc.
completeness, accuracy, timeliness,
measurement quality, adherence to
standards, oversight/audits
completeness, correctness, conforms to
specifications, e-format, verification,
validation, error checks
completeness, integrity, usability,
accessibility, presentation
Model for Information Integration -
IT Function
Not in the scope of
this document
Connect and Exchange
Connect and Exchanae
Process and Staqe
Store for Use
Use
        Table 1: Alignment of Information Lifecycle with Model for Information Integration
                         Model for Information Integration — 1.0 Introduction

-------
                                                                          Data or Information?
                                                             This document defines data as information trans-
                                                             lated into a form that is more convenient to move or
                                                             process. Information is data presented to meet
                                                             user expectations. Data presentation must be user
                                                             friendly and impart some meaning to the data
                                                             (Worthington,2001)
  The connection between these two efforts is important, hi
late FY2001, OMB finalized guidelines that require federal
agencies to develop and institutionalize practices to, "Ensure
and Maximize the Quality, Objectivity, Utility, and Integrity of
Information Disseminated by Federal Agencies," (OMB,
2001).

  The OMB guidelines require that Agencies set a quality
performance goal and measure progress towards that goal,"
throughout the creation, collection, maintenance, and dissemination," of information. These stages align well the EPA
Information Lifecycle and can be linked to the IT functions in this model. Information Lifecyle stages and follow-on
guidelines should take into consideration and leverage IT functions in this model and other emerging information value
chain models (EPA(12), 2002). One approach is to use the architecture as the structure in which to implement the Data
and Information Quality Strategic Plan recommendations and follow-on development of quality indicators.

MODEL FOR INFORMATION INTEGRATION

  It is important to note that in EPA's current computing environment each of the four IT functions presented in this
document are carried out a by a majority of EPA's Program systems. This functional framework, thus, is a useful mecha-
nism to inventory and analyze business processes and technology to identify redundancies. The framework can be used
to develop solutions for more efficient use of EPA's IT resources.

  The model proposed in this document recasts existing EPA systems, services, and policies and proposes new ones in
terms of this IT functional framework. The new EIA core components and their associated function are depicted in
Figure 5 and defined in Table 3.
   It is important to note that in this
 model operational processing is
 supported by the Central Data Ex-
 change and the operational databases
 components. The decision support
 processing is supported by the Enter-
 prise Repository and analytical tools.
 Collectively these decision support
 components support business intelli-
 gence, i.e., they capture organizational
 data from separate sources and
 present it to decision makers in a user
 friendly way (Microsoft, 2001).

   To illustrate how the components
 work, a sequence of activities is
 presented for two common informa-
 tion transactions.
                                                            EPA Target Architecture
                                      EXCHANGE NETWORKS


                                                   4—
                                                                             USE
                                         Non-government
                                           Partners
                                          Government
                                           Partner.
                                                          EPA Users
                                                           Intranet   ] -«1>•  program Support
                                                                          Public Access
                                                                    ~~  *  Decision Support
                                                          Extranet
 STORE for USE
  Enterprise
  Repository
 Metadata Holdings
    Catalog
Shared Geospatial Data
 Central Registries
 Data Warehouse
  Operational
 Databases &
 Applications
                                                                        Management Practices
                                                                (Architecture, Policies, Standards, Security)
                                               Figure 5: The Model for Information Integration
10
                               Model for Information Integration —1.0 Introduction

-------
ENVIRONMENTAL REPORTING

  An external user, an Exchange Network partner, or member of the regulated community, registers once to submit and
access information. Access rights are defined by user type and nature of the data submission. Following registration, a
user logs on and connects to EPA services through an Enterprise Portal. The Enterprise Portal is simply defined as a
gateway to the many services the Agency maintains. The user accesses CDX services and transmits the data. CDX
extracts data, validates, perhaps against a registry, and sends a confirmation to the user. Data are passed onto the
Program where it is temporarily stored for further quality control services. Validation and verification against registries
may occur, as well as error correction and quick informal data analyses. Once complete, a Data Steward approves the
data for storage in Enterprise Repository. Data in the Enterprise Repository is considered authoritative and ready for
Agency-wide and external analysis.

DATA AND TOOL ACCESS

  An internal or external information user logs on and connects to EPA services through an Enterprise Portal. The
System of Access, using the data contained in the key registries directs the user to data held in the Enterprise Repository
as well as a suite of decision support tools. Communication channel and level of access will be dependant on user type.
  Within the EIA, the high-level IT
functions are supported by core
components. In turn, the components
are defined in terms of services,
policies, and processes. These are
further supported by projects or
systems. This is basis for organization
of the remaining chapters.

  It is important to note that this model
presents a cohesive set of technologies
and services, when in fact, many of the
supporting projects operate indepen-
dently. Some of the core components,
like the Enterprise Portal are, at this
time, merely concepts presented for
discussion. This model also presents
very prominent data warehouse and
emphasizes a processing role for
Program and Regional systems. There
are a variety of options for maintaining
Program and Regional systems- they
will be explored as part of the EIA
research discussed in Chapter 6.
OOigOMM DT
tftHBSGta
Connect and Exchange
(Operational Processing)
Process and Stage
(Operational Processing)
Store for Use
(Decision Support)
Use
(Decision Support)
Foundation
[po&rsws
(TsHffljpSGiXSiifi
EPA Enterprise Portal
Program Systems
Enterprise Repository
Decision Support Tools
Enterprise Architecture
Data Standards Program
IT Policies
Secunly
CCuMsgi] (FsaffiEtgi)
User Registration and
Authentication Services
Data Collection and
Exchange Network Services
Access Services
Operational Database
Services Transformation,
Load, Maintenance, and
Quality Control Services
Data Warehousing Services
Central Data Registry
Services
Geospalial(Geo) Services
Metadata & Holdings
Catalog Metadata Services
Environmental Analysis
Services
Analytical Models &
Guidelines
Administration and 1RM
IMSDBB^C SSlSOEpSCOtag
IftepeoisD
CDX registration, TSSMS
CDX
System of Access,
www epa gov
EPA Public Access
Strategy
e g , SDWIS, AQS
Envirofacts
Geospatial Data Services
FRS,SRS,TRS
Integrated Geo Database,
Geospatial Bluepnnt
EIMS, EDR, IRRS
Window to My Env
TRI Explorer
Guidance for Data Quality
Assessment Practical
Methods for Data
Analysis,
Data Standards Program
XML TAG
Security Program
                                              Table 2: EIA Core Components & Functions
                        Model for Information Integration — 1.0 Introduction
11

-------
 1.5   BENEFITS

   The overall benefits of integration are increased efficiency, effectiveness, and quality. These benefits will be derived
 from efforts to streamline processes, to standardize some aspects of data and technology, and to promote the use of
 common core component services. Benefits of the components themselves are described in each chapter but generally
 fall into these four broad categories:

 BETTER ALLOCATION OF RESOURCES

   Adopting the core components described in this document has the potential to free Agency Programs from the
 mechanics of information management, i.e., the collection, staging, and storage of data. These core component services
 will permit more cost effective allocation of resources, both budget and human, to focus on the planning and guidance
 needed to "get the right data" for supporting the Agency's mission and goals.

 IMPROVED QUALITY

   The enterprise architecture is the means by which organizations plan for quality (Spewak, 1992).  Spewak makes a
 direct link between Deming's 14 Points of Quality and the 14 Points of data quality derived through the definition and
 implementation of an architecture. The enterprise view and prospective IT planning derived from the architecture will
 enable the Agency to implement standards and policies, simplify processes, and implement a common structure for
 monitoring data quality throughout the lifecycle. These will lead to overall improvements in quality.

 BETTER USE DATA AND INFORMATION

   By relieving Agency Programs from the mundane operational tasks of "keeping the data right" resources can be
 focused on "getting the right data" to manage an environmental problem. This can range from identifying new data
 needs, arranging data partnerships, or developing data collection guidelines or standards to assure consistency and
 comparability. EPA resources may also be re-allocated to focus on the development of analytical models and analytical
 tools that will enable the Agency to focus on business intelligence, i.e., the ability to predict the future impact of current
 decisions (Inmon, et al, 2001).

 RESPONSIVENESS TO THE NEEDS OF EPA STAKEHOLDERS AND INFORMATION USERS

   Integration also has direct benefits for information users as many of the EIA components will enable either timely
 access to needed information or more productive transactions with the Agency. These benefits can be cast in terms of
 the type of EIA information user.

   EPA Decision-makers - The Centralized Data Registries within the enterprise repository are intended to provide easy
 access, through linkages, references, and the maintenance of authoritative lists to the wealth of existing data and infor-
 mation currently held throughout the Agency. Metadata maintained in these registries will also promote a better under-
 standing of the data. Currently EPA scientists and other analysts have "to fish" either by Web browser or through word
 of mouth to look for existing data and information resources. Once received, data may not be adequately characterized
 or documented to ensure credible analysis. Because this can be a frustrating and time consuming, contractors are often
 hired to do this, or worse, work is replicated because information is "hidden" somewhere in the Agency.
12                             Model for Information Integration — 1.0 Introduction

-------
  Data Partner - The Central Data Exchange will be EPA's, "front door for environmental reporting." To date, States
with the delegated authority to collect environmental data and information have had to "feed" multiple Program systems,
using a variety of transmission formats, to fulfill reporting obligations. This is burdensome and has slowed States in their
efforts to integrate and modernize their own systems. The purpose of the National Environmental Information Exchange
Network, thus, is to reduce the burden of reporting and to decouple the State IT management from that of EPA.

  Member of the public - The system of access presented in this model will enable members of the public to get to the
data, information, and tools they need, when they need them. While great strides have been made over the last eight
years to improve public access through resources like, www.epa.gov, the Envirofacts warehouse, and tools like
AirNow and Surf-your watershed, members of the public, like the EPA scientists,  still have to search. This is com-
pounded by the lack a familiarity with EPA's organizational structure. The System of Access will help to get external
users to the resources they need when they need it.

SUMMARY OF INTEGRATION STRATEGY

  The EIA is the mechanism to plan integration that create efficiencies or enable more effective use of information. This
model presents a set of core components that will help to streamline the mechanics  of information collection, processing,
storage, and access across the Agency. The implementation and adoption of some set of core component services has
the potential to produce significant efficiencies to save time, money, and human resources.

  This model also contains a set of decision support components - the Enterprise Repository and Decision Support
tools - which will support more effective use of information. The Enterprise Repository and Decision Support tools
together are proposed to capture data from disparate sources to present them to users in an easy-to-use format. Their
purpose is to enable study and management of key issues that require data from multiple sources both inside and outside
of EPA. Examples include urban sprawl, pesticide spray drift, total maximum daily loads to a waterbody, and homeland
security These components, along with analytical models and guidelines, can enable analysis of existing conditions from
multiple sources to forecast trends and environmental outcomes. Investing in these decision support capabilities will help
to: (1) link decisions and policy to environmental results; and (2) identify the activities that will make EPA more effective
in carrying out its mission.
                        Model for Information Integration — 1.0 Introduction                             13

-------
 2.0  ENVIRONMENTAL  INFORMATION ARCHITECTURE —
 CONNECT AND EXCHANGE
   This model begins by
 presenting the basic functional-
 ity needed to support informa-
 tion users, both internal and
 external, as they interact with
 EPA's information and tech-
 nology resources. The Con-
 nect and Exchange function is
 the basis for analyzing EPA's
 management of user connec-
 tions and the flow of informa-
 tion into and out of the Agency.

   The Connect function
 involves the coordinated
 management of user identifica-
 tion, registration, and security
 procedures allowing informa-
 tion users easy interaction with
 the Agency.
                                                      Connect and Exchange
                                                                  Enterprise Portal
                                                                                  Flows out of
                                                                                  Decision Support
                                                                                   Components
                                                                                       i	
                                                                                            I
                                                                                       i
                                                                                       i
                                                                                       i
                                                                                       i
                                                                                Flows into the Agency
                                                                                       i
                                                                                       i
                                                                                       I
                                       Figure 6: Enterprise Portal Manages Flow
                                                In and Out of the Agency
  The Exchange function
encompasses coordinated
management of the information flows into and out of the Agency.  It must be supported by processes, technologies,
and services that streamline data submission transactions. The Exchange function must also allow information users to
locate and access the data.

  In this model, the concept of an Enterprise Portal is proposed as a means to encourage the design of a complemen-
tary set of processes, technologies, and services that support the Connect and Exchange function. This concept was
created in response to recent QIC approval to, "Maintain a single portal (i.e., CDX) for all environmental data flows,"
(EPA( 1), 2001). Currently, the CDX portal provides user registration and supports incoming data flows for the Na-
tional Environmental Information Exchange Network. In this model, the portal concept is expanded to include access
services.

  The Enterprise Portal will provide the following services:

  •   User Registration Management
  •   Data Collection and Exchange Network Services through the CDX Portal
  •   Access Services through the System of Access

  Each service is described in the "Critical Features and Operations" section.
14
                      Model for Information Integration — 2.0 EIA: Connect & Exchange

-------
                                                         EPA Enterprise Portal

                                                      An interface through which
                                                      people and organizations
                                                      electronically access EPA's
                                                      environmental information.
2.1 BACKGROUND

ENTERPRISE PORTAL

  There are many definitions of a portal (Phifer, et al, 2001). In this document a
portal is simply a gateway to services. The Enterprise Portal is the means by which
to structure access to services and includes an interface people and organizations
use to electronically connect to EPA's and access environmental information.

  In EPA's current computing environment, it is often difficult for users to connect to the EPA environment to either
transmit data or access information. The longstanding approach to connecting directly to EPA's environment requires
that EPA, State, local, and tribal users to maintain multiple logon ID's and interact with many contacts, To send data to
EPA, a State or regulated entity may be required to make data submissions to several different Programs. To retrieve
information from EPA, a user must first locate the appropriate data and access tools. This can be a difficult task be-
cause EPA has a variety of tools, each tool is independently managed, and there is little coordination among the tools.
To retrieve information from EPA, a user must diligently explore EPA's Web site to locate the appropriate tool. Depend-
ing on which tool the users chooses (s)he may get a different result because the tools may use different data sources.

  An Enterprise Portal is proposed to help coordinate the management of user ID's. Users will need only one user ID
that will support their needs regardless of the type of transaction. Also, unlike the current access services which empha-
size access for the public and external stakeholders, the target environment described in this model provides special
focus on the access needs of EPA employees.

  The Enterprise Portal concept was derived largely from the
National Environmental Information Exchange Network ("the
Network"). In order to be a Network partner, EPA must
maintain a node as follows (State/EPA IMWG, 2000):
                                                      EPA Node

                                     The collection of capabilities, processes, data and
                                     infrastructure supporting EPA services for the
                                     National Environmental Information Exchange
                                     Network. EPA's node is the CDX
  •  Each Network partner has only one node, although that
     node may handle many kinds and types of data.

  •  The node is the only route for Network delivery and
     receipt of information.

  •  The node is the single place for each member to present its standard node catalog of available information and
     associated network metadata e.g., their Trading Partner Agreements. To be on the Network, the node must
     present data and associated information.

  •  The node is the single place where each member implements the essential transport, security, and query protocols
     described in the Exchange Network Blueprint and specified in a TPA.

  •  The node is the only place where a member's compliance with a TPA can be demonstrated or evaluated.

  CDX is recognized as EPA's node of the Exchange Network.  As described in Table #3 there are other features of
the Enterprise Portal, like the System of Access, and other core components like the Enterprise Repository that support
access by Network partners.
Model for Information Integration — 2.0 EIA: Connect & Exchange
                                                                                                    15

-------
 2.2 CRITICAL FEATURES AND OPERATIONS

   Operationally, the Enterprise Portal will provide three groups of services:

   •  User Registration Services will coordinate management of user connections to the EPA's computing environment.
   •  Data Collection and Exchange Network Services will provide a central point for all EPA data collections and
      support EPA's Node on the Exchange Network.
   •  Access Services will provide coordinated management of the flow of information out of the Agency through the
      System of Access.

 2.3   USER REGISTRATION SERVICES

   A key feature of the Enterprise Portal is the control of user access to data. These are somewhat divided between
 external and internal users:

 REGISTRATION FOR EXTERNAL USERS

   CDX customer registration supports EPA registration and authentication functions for legal filings to EPA's Programs
 by the regulated community and Network flows with our many external agency partners. TSSMS ID supports regis-
 tration functions for both internal and external (States, Tribes, Universities) users of our EPA Program systems.

 REGISTRATION FOR INTERNAL USERS

   Other registration services include ORACLE ID, LAN ID, and Email ID to name a few. These are internal registra-
 tion services used to maintain security over EPA's internal systems. User Registration Services will provide coordination
 to allow a user to have only one EPA user ID with access to multiple EPA systems. An initial segmentation of informa-
 tion users, their access characteristics, and the communication channel they may use to conduct business with the
 Agency is provided in Table-3.

 PRIMARY SUPPORTING SYSTEMS

   EPA has several systems that perform Registration Services but none of them may be considered a primary support-
 ing system.  Currently, Registration Services are supported by the Time Sharing Services Management System
 (TSSMS), Oracle DBMS, the EPA LAN, and CDX. Additional analysis is required to develop a strategy to coordi-
 nate these activities.
16                      Model for Information Integration — 2.0 EIA: Connect & Exchange

-------
 User Set   Description
                         Access Characteristics
                                       Communication
                                       Channel
 Anonymous
 Public
Any Person
Read-only access to EPA datasets,
registeries, analysis capabilities, and
unrestricted information.
Internet
 Registered
 Public
Any person that pre-
registers with EPA for
password-protected
             access
Read-only access to EPA datasets,
registeries, analysis capabilities, and
restricted information made available
according to an account registration
agreement.
EPA extranet, Virtual
Private Network, or
secure Internet channel
 Regulated
 Facility
Data submission from
regulated community by
authorized representa-
tives
Legal transmissions, electronic signa-
ture, Public Key Infrastructure.
Secure Internet Chan-
nel
 Exchange
 Network
 Parnters
Authorized representa-
tive of other Federal
Agencies, States, Tribes.
Must pre-register with
EPA for authorized
password-protected
access
Access to Exchange Network capabilities
at EPA mode.  For example:
• Supporting data exchange services
using data exchange templates (DET's).
• Providing Specific data for authorized
  requesters.
• Providing a catalog of available
  information and metadata on the node.
• Supporting access to registeries;
access to reliable and authoritative
sources for commonly used data.
EPA extranet, Virtual
Private Network, or
secure Internet channel
 EPA Users
Any EPA authorized user.
Must pre-register with
EPA for authorized
password-protected
access
Access to all services at EPA Enterprise
Portal.  Access to some capabilities may
require further authorization before a
user can access them.
                                                                              EPA intranet
                             Table 3: EPA Enterprise Portal User Sets
2.4 DATA COLLECTION AND EXCHANGE NETWORK SERVICES

  Data Collection Services provide: (1) a central point for collection of all data submissions to various EPA Program
systems and databases; and (2) the functionality and interface protocols necessary for EPA to maintain a node on the
Network.

  Early in FY'01 the State/EPA Information Management Workgroup endorsed the Exchange Network Blueprint
which sets the minimum requirements for Network participation. It is important to note that these services encompass
both incoming and outgoing flows of data. Hence it is envisioned that some of these services will be provided by CDX,
EPA's node on the Network, while others will be supported in a complementary way by the System of Access and the
Registries (Chapter 4). These distinctions are noted in the summary of the following requirements:
                      Model for Information Integration — 2.0 EIA: Connect & Exchange
                                                                                    17

-------
Exchange Network Partner Requirement
Support the creation of a single point of exchange for all EPA
Programs systems data
Support data exchange services using data exchange templates
(DET's), including data flows to and from Exchange Network
Partners, as governed by Trading Partner Agreements (TPA's)
Provide specified data for authorized requester
Provide a catalog of available information and metadata on the
node
Support access to registries; access to reliable and authoritative
sources for commonly used data or code sets made available on
the Network
Provide information on Network data standards; used for building
DET's
Providing security support; PKI, SSI, Secure HTTP, support for
security levels 1-4
Primary Supporting Project(s)
CDX
CDX (flows into Agency) System of
Access (flows out of Agency to partners)
System of Access
Metadata & Holdings Catalog through
System of Access
System of Access/Centralized Data
Registeries
Environmental Data Registry through
System of Access
CDX & Foundation Component
       Table 4: Exchange Network Partner Requirements and Primary Supporting Projects


 PRIMARY PROJECT - CENTRAL DATA EXCHANGE (CDX)

   When fully implemented, the Central Data Exchange (CDX) will serve as a single point of entry into EPA for environ-
 mental compliance reporting in both electronic and paper forms.  CDX is also a record-keeping and distribution point
 for submissions to various Agency systems and databases. It supports all submitters of environmental information,
 including industry, States, EPA Programs and systems, and other Federal agencies. CDX enables data transfer with
 Exchange Network partners as well as access to the EPA data holdings for the purpose of data confirmation, data
 update, and data status review.

   CDX provides the following services:

   •  User registration
   •  Secure transmission of legal submissions
   •  Data receipt and data receipt and notification
   •  Maintain user mailboxes
   •  Archive data
   •  Manage security (e.g., authentication, encryption, virus scans)
   •  User support
   •  Training assistance
   •  Data format translation - XML
   •  Transaction documentation, including transfer transactions
18                      Model for Information Integration — 2.0 EIA: Connect & Exchange

-------
Implemented Underway Proposed FY02 To Be Decided
National Emission
Inventory (NEI)
Unregulated
Contaminent Monitor-
ing Rule (UCMR)
Facility Registry
System Flows

Permit Compliance
System (PCS)*
Resources Conserva-
tion and Recovery Act
Info (RCRAInfo)*
Toxic Release Inven-
tory System (TRIS)*
Toxic Substances
Control Act Forms
TSCATS Forms
Safe Drinking Water
Info System (SDWIS)*
Air Quality System
(AQS)*
Storage and Retrieval
System (STORET)*

Air Facility Subsystem (AFS)*
System for Risk Management
Planning (SRMP)*
Continuous Release Emer-
gency Response Notification
(CR-ERNS)

                           Table 5:  Current and Anticipated Use of CDX
  In early FY'02 the QIC discussed a goal set forth for implementing CDX to support flows to EPA's "REI systems"
(Appendix-D) by FY04 (EPA(l), 2001). A summary of current and proposed system flows, including additional
systems appears in Table-5.
                                                                        System of Access
                                                           A tool that will allow information users (Public,
                                                         Partners, and EPA Employees) to locate EPA's data
                                                         and the tools to access and analyze that data accord-
                                                         ing to their authorization level.
2.5 ACCESS SERVICES

  The primary goal of Access Services is to streamline access
to Agency information resources. Access Services will
provide the policies, procedures, standards, and technology to
allow seamless access to EPA's data without requiring the user to understand a great deal about EPA's computing
environment or organizational structure.  Access Services will be supported by the "System of Access".

  It should be noted that Access Services are intended to provide access to internal as well as external data users. In
the past, EPA has focused on making integrated environmental information available to the public, with the assumption
that the public access tools could also support the needs of environmental decision makers within the Agency. Access
Services will provide a coordinated approach to managing access to EPA's data and tools that will make the information
and tools easier to access for internal as well as external users.

PRIMARY SUPPORTING PROJECT  — SYSTEM OF ACCESS

  When fully implemented, the "System of Access" component will allow internal and external information users to
easily locate EPA's data and the tools to access and analyze that data according to their authorization level. Through the
System of Access users will be able to connect to EPA's environment, log in (if necessary), search EPA's information
holdings, select items of interest, and download information or use the appropriate EPA-provided tools to access, or
analyze the information. The System of Access will allow read access to the data and will not support data collection or
data maintenance.

  At it highest conceptual level, the System of Access is not only a "system" in the traditional sense of an application, it
is also a "system" in the sense that it combines features of the Enterprise Portal, Decision Support Tools, and the
Enterprise Repository into a cohesive unit that provides seamless access to EPA's data. Critical to the success of the
                      Model for Information Integration — 2.0 EIA: Connect & Exchange
                                                                                                    19

-------
 System of Access is the ability to coordinate among the various Decision Support Tools. As a coordination mecha-
 nism, the System of Access will provide the structure to make it easier for Program Offices and other stakeholders to
 develop and deploy Decision Support Tools that draw on the data in the Enterprise Repository.

   System of Access services will be accessed through the EPA Enterprise Portal. Portal visitors will not actually see
 the term "System of Access" at the EPA Enterprise Portal Web site but will be using the System of Access when
 accessing appropriate hyperlinks.

   System of Access services will include:

   •  A search capability that allows users to search the Centralized Data Registries, the Metadata and Holdings Cata-
      log, and the XML registry for DETs TPA's, databases, geospatial tools, and Decision support tools.

   •  A data/information selector - After performing a search, the user should be able to select an item from their search
      results list and be linked directly to the data or the appropriate analytical tool to retrieve the desired information.

   •  Convenient, flexible linkages to tools that provide the access and analysis of EPA's data holdings, i.e., Decision
      support tools.

   •  A customizable portal that allows users easy access to commonly-used features.

   •  Access to related metadata information needed to understand the meaning and the implications of the data and
      recommended background and contextual information, such as caveats and explanations, that promote informed
      and responsible data use.

   •  A "help" capability, with well-organized materials prepared in advance, as well as some user support/response
      function, and that may include pointers to other Network nodes.

   •  A feedback mechanism that gives visitors the ability to provide feedback, as well as an automated capability to
      capture and summarize relevant performance measures.

   •  An error correction feature that allows users to report erroneous information for EPA to investigate and resolve.

   The exact technical requirements and design options for the System of Access will be addressed as part of the Data
 Warehouse Master Plan recently funded by the 2002 Systems Modernization Fund (SMF).

 PRIMARY SUPPORTING PROJECT — EPA's PUBLIC ACCESS STRATEGY

   The implementation of Access Services must align with EPA's emerging Public Access Strategy. The purpose of this
 Strategy is to define the direction and scope of EPA's public access activities over the next 5- 7 years. This Strategy
 represents the Agency's commitment regarding what EPA provides, to whom it provides it, and how it operates in
 developing and disseminating data and information products. These are complex issues involving broad, inter-related
 topics, many of which include high degrees of uncertainty. This Strategy attempts to approach these issues from a high
 enough level to present the interconnections, and from a low enough level to set meaningful strategic action. It attempts
 to assess the Agency's and its audiences' current needs, to project future public access trends of both, and to position
 the Agency's public access efforts in ways that will be most beneficial.
20                       Model for Information Integration — 2.0 EIA: Connect & Exchange

-------
2.6 BENEFITS OF THE ENTERPRISE PORTAL

  A single portal for receiving and accessing environmental information makes it easier for the Agency to manage its
data flows and easy for customers to work with EPA. Other benefits of the EPA Enterprise Portal include:

  •  User Registration Services will make it easier for users to work with EPA. Coordinated management will require
     users to maintain only one user ID to access multiple EPA systems. These services will also reduce duplication of
     effort by establishing one point for the management of user registrations.

  •  Data Collection and Exchange Network Services wi 11 reduce the burden on EPA's regulated community by
     streamlining the data submission process. Instead of a separate data submission for each Program, a regulated
     entity can make one data submission and the data will be parsed and relayed to the appropriate Program Offices.

  •  System of Access Services makes it easier for customers to interact with EPA by providing one location for users
     to access all EPA Decision support tools. Access Services also support more efficient use of resources (human
     and budget) by coordinating the development and deployment of Decision support tools. Finally, the System of
     Access leads to more consistent environmental analysis by using a single source, the Enterprise Repository for
     EPA's analyses.

2.7 ISSUES &  NEXT STEPS

ISSUES

  •  The Enterprise Portal currently exists as a concept. It is presented in this model to encourage simplification and
     unification of the processes EPA has created over the years to interact with stakeholders, data partners, intermedi-
     aries, and the general public (EPA(17), 2002).  There are compelling reasons to streamline processes like environ-
     mental reporting, however further analysis is needed to more fully identify and characterize all interactions with the
     external world so that streamlined services fully meet the needs of Programs and/or do not replicate existing
     processes.

  •  If the Agency pursues the Enterprise Portal concept, its design should take into consideration the EPA's emerging
     Public Access Strategy (EPA( 17), 2002).  The model in this document presents a rough cut of user segments and
     is entirely "e-centric," i.e., it does not address a number of, "human touch," services that currently support public
     access to information. The Public Access Strategy principles and guidance for non-electronic access, as well as
     market research on customer segments and information needs can provide more depth to the Enterprise Portal
     design.

  •  Agreeing on the scope of the services to be provided.

     - Registration services for internal users are currently fragmented. The Agency needs to reach formal agree-
        ment on the goals, objectives, and scope of integrated user registration services for internal users. Similarly, the
        Agency needs to determine the degree and sequence under which the TSMSS system should be integrated
        into the CDX registration services for the Agency's diverse set of external users.

     - While the principle of CDX was agreed upon within the Agency in 1998, and mandated by the QIC in early
        2002, a concrete decision on when/whether legacy exchange operations should be retired in lieu of the CDX
        portal has not been discussed. One key factor that influences this decision is the readiness of State data
        partners and the degree to which integrating key legacy flows into CDX are desirable or practical.


                      Model for Information Integration — 2.0 EIA: Connect & Exchange                 21

-------
   - Access services have typically been viewed as intended to benefit the general public. While serving the public is
      essential, this view assumes that the needs of EPA internal users and data partners are similarly met by general
      purpose access. The Architecture Team must account for the needs of serving internal EPA users. This should in
      turn guide the scope of the access services.

      Resolving resource issues associated with provisioning portal services. At this point most costs of data
      exchange are carried within Program and Regional budgets. How are the Operating and Maintenance costs of
      CDX to be supported as it takes on a major role in providing Agency wide services? Similarly, most access costs
      are implicitly or explicitly carried within Program and Regional budgets. How are the costs of integrated access to
      be allocated, particularly if this includes substantial focus on access by internal EPA users and data partners? A
      draft CDX funding plan is under development.

      Scope of System of Access services. This model places greater value on the data that is maintained and used
      by the Decision Support components (data warehouse, registries, and tools) because it is envisioned that this data
      are approved for use by both internal and external users.  This raises important questions about control of user
      access e.g., just to the centrally maintained data stores, and if so how this transition will be planned and managed.

 NEXT STEPS

   1. Determine options for the EPA Portal Services. This includes a requirements analysis and user assessment (in-
      cluding EPA users and data partners) to determine needed portal service needs. Architectural options are particu-
      larly needed for registration and access services, including for the latter, the scope of information to which access is
      granted (to all or part of EPA holdings, to external sources, including the multi-partner content of the Exchange
      Network).

   2. Determine the operational and technical options for portal structure and function as part of defining the overall
      architecture of these functions. This includes identification of appropriate policies, procedures, and standards that
      may be needed to support an Enterprise Portal. On a technical level, EPA needs to describe the mechanisms of
      and process for establishing Web-based and desk-top access to the services available.

   3. Clarify the role of CDX in providing access services, including user registration. In supporting the Network
      Services for the Agency, CDX provides "access" for EPA's partner on the Exchange Network.  This must be
      coordinated with the Access Services to ensure consistency and to avoid duplication of effort. For example,
      reporting or accessing through TSMSS often requires multiple IDs and passwords, as well as the deployment of
      secure remote. An enterprise portal will need to determine when and if migrating/integrating certain aspects of
      current TSMSS to the CDX portal is desirable or should be left as a stand alone. Similarly the degree to which
      TSMSS services remain or are integrated with other internal registration systems (ORACLE DBMS, NOVEL
      directory etc) needs to be worked out.

   4. Clarify CDX support for Exchange Network Services. While substantial agreement exists on the Network
      exchange functions of CDX, clarification is needed on aspects of CDX function, including its role in: (1) data
      exchange with geospatial systems; (2) transfer of multi-Program consolidated reports from States; (3) exchange of
      confidential information; and (4) exchange of datasets created within EPA (e.g., from a Region); (5) role in receiv-
      ing non- electronic submissions; and (6) verification and validation of data against a DET or data standard.
22                       Model for Information Integration — 2.0 EIA: Connect & Exchange

-------
PROCESS AND STAGE
                                                                  Process and Stage
                                  Figure 7: Operational Databases Prepare Data for Storage & Use
  After data has been collected by
CDX, it passes to the Programs for
further refinement and review prior
to use. Process and Stage is the
next function in this model where
data will be manipulated and trans-
formed prior to being Stored for
Use. This is a major function for
EPA because it includes the often
lengthy series of steps included in
the "clean-up" of Program and
Regional data that is often necessary
prior to use. Once complete, the
processed data are transferred to a
data warehouse part of the enter-
prise repository. The Process and Stage function thus includes the principal quality control checks necessary to meet the
Agency's quality objectives for exchanging data.

  The Process and Stage function is supported primarily by the Program/Regional Systems component (databases and
supporting applications), although some Process and Stage services are supported by CDX and the Enterprise Reposi-
tory (ER) component.

  Since Program and Regional systems already exist and have multiple functions, a key focus of this chapter is defining
how the roles and functions of Program and Regional systems might evolve as this target architecture is implemented.

3.1 BACKGROUND

  In EPA's current computing environment, Program and Regional systems serve all the IT functions presented in this
model. They support the collection, processing, storage and access to environmental information in their domain. In
carrying out these functions, these systems are linked in only a limited way to other systems with parallel functions. The
responsibilities of a Program or Region to collect, authenticate, and analyze environmental information are tightly
coupled to the actual data processing associated with each step.

  This document envisions an arrangement under which a Program or Region retains responsibility for, and authority
over, the quality of data collected within its domain. This includes data revision, correction, characterization, and
appropriate uses, but shifts responsibility for most of the actual data processing steps to enterprise services.  While the
exact scope and architecture of a data warehouse is not yet defined, this document envisions that Programs and Regions
would move data ready for analysis and use to a data warehouse which is part of the Enterprise Repository.

  Version control and updates of data in the ER is the responsibility of the Program data owners. The data will be
made valuable for use by any office, Program, employee, partner or stakeholder with interest, including the general
public, as appropriate.
                          Model for Information Integration — 3.0 EIA: Process & Stage
                                                                                                    23

-------
                                                              Program Systems are defined as those sys-
                                                              tems used by individual Program Offices and
                                                              Regions to accomplish their particular missions
                                                              or goals. Program systems include, for ex-
                                                              ample, SDWIS, PCS, RCRA Info, and NEI. This
                                                              definition includes the Toxics Release Inventory
                                                              System (TRIS), which, while organizationally not
                                                              in a media Program office or Region, collects
                                                              primary data across environmental media for a
                                                              Programmatic purpose

                                                              Regional systems are data collections and the
                                                              associated applications that collect and use
                                                              programmatic and environmental data not
                                                              required in national Program systems but
                                                              needed to meet Regional needs.

                                                              Operational Database is defined as a database
                                                              used to hold data while it is being checked for
                                                              completeness and accuracy. Data contained in
                                                              an operational database is considered interim
                                                              and subject to change.
  Consistent use of standardized, documented, published
access methods will allow interested parties to develop Decision
support tools that utilize the data. As outlined in Chapter 5.
Programs and Regions would redirect most access tools, i.e.,
applications that read the data for analysis and display from local
data sources, to the data warehouse for the relevant information.
Program Offices could design and develop additional access
tools, reports, and analysis tools to access and/or retrieve the
data to leverage the increased consistency and completeness of
data in the warehouse.

  Much of data collection, data storage for use and access tools
would be centrally administered under direction from the Pro-
grams and Regions whose business needs must be met.  The
Process and Stage function would remain largely with the Pro-
gram and Regional office responsible for the collection of a
particular set of information

3.2 CRITICAL FEATURES AND OPERATIONS

PROGRAM & REGIONAL SYSTEMS MODERNIZED FOR
INTEGRATION
   This chapter addresses a more complex relationship between the function (Process and Stage) and the component
 (Program and Regional systems) than exists for some of the other functions. For example, it is envisioned that the
 exchange function discussed in Chapter 2 will be addressed in part by CDX over time, which is a new component
 expressly created for this purpose. In Chapter 4, the data warehouse is proposed as an expansion of Envirofacts, a
 component which, while preexisting still serves primarily the Store for Use functions envisioned in this document. By
 contrast, Program and Regional systems are neither new nor have their functions been limited to the agency-wide
 function envisioned for them in this document-the processing and staging of information prior to transfer to a data
 warehouse.

   The features of Program/Regional systems modernized for integration are: (1) these systems function in the agency-
 wide integrated architecture; and (2) the Program/Region-Specific functions that may still need to be met by these
 systems. In addressing these functions, the implications for Program and Regional systems as they modernize within an
 integrated agency architecture are also noted.

   The Function of Program/Regional Systems in Agency-wide Integration. In this architecture, the Process and
 Stage function becomes the primary role of Program and Regional systems. This function includes the operational
 Program/Regional database and the tools that allow the Program and Regional data owners to create, update, and
 delete the operational data. The Process and Stage function includes the services of data transformation,  quality control,
 interim data maintenance and transfer to the enterprise repository. These functions are elaborated below.

   Program/Region-Specific functions of integrated systems. As suggested above, certain needs are unique to
 Program/Regional Systems. These business needs may require specialized data and forms of analysis that have little or
 no agency wide implication. For these functions, full agency wide integration may be of minimal benefit.  This implies
 that on an as-needed and limited basis Programs and Regions will independently maintain their own interfaces and
 databases and conduct the four major IT functions to meet their needs.
24
                           Model for Information Integration — 3.0 EIA: Process & Stage

-------
   Implications for Program and Regional systems. At the most basic level, this document implies that Program
and Regional systems need to evolve to serve two functions-the process and stage function within the overall agency
architecture and the function of meeting of Program/Region specific information needs that can not be effectively ad-
dressed with agency-wide resources.

  To accomplish this evolution, the first need is conceptual and cultural. Program and Regional data managers will need
to recognize that their current data operations are bundles of functions (exchanging, processing, storing and using) that
can be separated and managed in different ways. They will need to accept the distinction between: (A) authority over/
responsibility for information content, quality and use and (B) management of information processing steps. "A" is
always a Program/Regional role. "B" should be done centrally or locally based on efficiency and effectiveness in meeting
agency-wide and Program/Regional business needs. The distinction between data "in process" and data "ready for use"
needs to be recognized explicitly. This distinction is key to recognizing the distinct roles of an operational database and
a warehouse in meeting Agency-wide and Program/Regional business needs.

  The evolution of Program/Regional systems also requires system changes. To ensure seamless transferabihty and
consistency, Program and Regional operational databases will need to conform to the policies and standards of the
Enterprise Repository and in particular, those of the data warehouse.

  Other system changes might also be required. The metaphor of Program/Regional systems as "stove pipes" or silos
oversimplifies the actual situation. Within Programs and Regions are multiple "mini-silos," ad hoc and special-purpose
databases and data collections that complicate the ability of Programs and Regions to ensure the quality and consistency
even of information that is uniquely their own. Consolidating and rationalizing this collection of "mini-silos" is key to
fulfilling the Agency-wide Process and Stage function, to better addressing unique business needs and to leveraging
agency-wide resources such as a warehouse to meet business needs that are common across Programs and Regions.

  Many of these changes will require planning and systematic review of how information is used within a Program or by
a Region.

DATA MAINTENANCE AND QUALITY CONTROL SERVICES
  In this proposed model, CDX performs the exchange and extraction                     te Steward
                                                                      EPA Program and Regional staff with
                                                                   knowledge of, and responsibility for, data.
functions, by collecting incoming data, performing some initial validation
functions, and transferring the data to an interim Program or Regional
operational database for further processing. The next step, data trans-
formation, is part of the Process and Stage function and is typically
performed by the Program/Regional Systems. It includes: (1) recasting data contents to conform to existing standards
as necessary (e.g., date and latitude/longitude); (2) carrying out a defined set of validation checks to assure the com-
pleteness and quality of the data; (3) developing aggregate records as needed (e.g. summarizing a series of hourly
monitoring observations into a daily average); (4) assuring consistency with applicable agency-wide registries and data
standards.

  The Process and Stage function also includes the generation of the appropriate metadata linked to the data as it
moves to a data warehouse for analysis and use. Ensuring that this information is developed and properly linked to the
dataset is another of the key quality functions exercised in this function.

  Once the Program and Regional stewards of the process and stage function are satisfied with the data, the process of
transferring the data to a warehouse can begin. This includes source-to-target mapping, where data elements in the
submission are mapped to data elements in the host EPA warehouse, and the mechanics of the actual data transfer to
populate the warehouse as described in Chapter 4 as part of the "Store" function.

                           Model for Information Integration — 3.0 EIA: Process & Stage                  25

-------
 OPERATIONAL DATABASE SERVICES

    While this series of checks, reviews, and modifications is occurring, the data reside in an operational Program or
 Regional database. This database may have associated applications designed to track process issues such as the
 degree to which a particular data submission has been cleaned-up or the overall completeness of a data exchange. Data
 transformation within this operational database can occur as an incremental process, as a result, for example, of on-
 going dialogue with a series of data submitters. These operational databases should be accessible to queries originating
 from an Agency data warehouse. However, due to the transitory nature of the data they contain, operational databases
 are not intended to function as the primary source of data for analysis. This role assumes a central data warehouse.
 However a study to determine the appropriate ER model for EPA is pending. One approach maybe to link these
 database via shared data and standards to achieve virtual integration.

    To serve Program or Regional needs, the operational database may also serve as an archive for data provided in one
 level of detail but used at another. For example, hourly observations from a particular monitor may be received but a
 daily average constructed for oversight purposes. The stream of raw hourly data might reside in the operational data-
 base under agreed upon retention rules while the daily average when checked for quality, would be transferred to a data
 warehouse.

    There are a variety of options for maintaining operational Program and Regional databases. Program offices can elect
 to move their data directly into the ER and let OEI take responsibility for their data storage needs. This method creates
 the greatest cost savings. If a Program prefers to maintain control of its database management functions, the Program/
 Region may be required to bring its databases into compliance with the ER policies, procedures, and standards that
 enable some form of virtual linkage.

 PRIMARY SUPPORTING PROJECTS

    Because the Process and Stage function is carried out in a variety of ways across the Agency, there are no key
 supporting projects to highlight this specific function. Rather, work is necessary within each Program to modernize and
 integrate key systems.

    The newness of the Program and Regional system role means that systems which contain all the elements to be
 deemed "modernized for integration" and which specifically serve this specific Process and Stage role do not exist.
 However, aspects of the model and key steps in the planning process can found in several Program office efforts.
 OECA has taken on the "mini silo" problem in the design of its ICIS system which consolidates a number of legacy
 enforcement and compliance information systems. OW has initiated a thorough review of data needs leading to mod-
 ernization plans for its major systems explicitly linked to agency-wide integration components. OPPTS has consolidated
 a number of its operational systems in OPPIN. OAR has initiated an information planning process within OAQPS that
 has highlighted the usefulness to the Program of distinguishing between operational and warehoused data. Also notable
 are Program efforts to implement key data standards. As systems implement data standards they will be increasingly
 prepared to exchange and integrate data and information.
26                         Model for Information Integration — 3.0 EIA: Process & Stage

-------
3.3 BENEFITS OF INTEGRATING  PROGRAM AND REGIONAL SYSTEMS

  There are a number of benefits in assigning Program/Lab/Regional Systems the roles and functions outlined above.
These include:

  •  Programs and Regions wi 11 be able to make better use of existing resources by reducing or eliminating the number
     of "mini silos" with duplicate operations and functions by focusing on the Process and Stage function of their
     overall information management activities.

  •  Because the existing Program and Regional infrastructure evolves rather than is replaced, achievement of enter-
     prise goals can leverage the major investments already made in Program and Regional systems

  •  Data stewards (Program and Regional staff with knowledge of and responsibility for the data) have the authority
     to maintain the integrity, quality, and timeliness of the data.

  •  The development, transmittal, and collection of metadata becomes an explicit function and responsibility prior to
     the associated information being stored and or released for use.

  •  Linking the operational Program/Regional databases to the Enterprise Repository make more data accessible for
     use by the Agency.

  •  Programs and Regions can dedicate more resources to information planning and decision support, as well as
     continue to support unique, mission-critical needs.

3.4 ISSUES AND NEXT STEPS

ISSUES

  •  Need to ensure broad understanding, acceptance, and implementation of the role envisioned for inte-
     grated Program/Regional systems. This includes senior leadership willingness to actively support a plan
     whereby the E1A is translated into a series of specific steps by OEI and Program/Regional systems staff.

  •  The appropriate balance must be defined between enterprise wide service roles and those retained in
     Program and Regional systems. While striking the right balance may follow some general criteria, each case
     should be presented to the QIC or similar senior management body for consideration. This balancing must recog-
     nize that schedules drive decisions so that decisions made in the short- term must be reviewed with longer-term
     considerations in mind.

  •  Incentives for system owners to migrate their systems into the EIA technical discipline. While significant
     benefits to integration exist, it is also important that financial charge back systems such as the Working Capital
     Fund (WCF) be structured to provide incentives to participate, or at least to avoid creating disincentives.

  •  A strategy to achieve Program/Region information integration needs to be developed and agreed upon
     based on services provided through the EIA. An appropriate mix of incentives, guidance, and technical
     assistance must be available to support Program and Regional offices during initial integration efforts as "mini-stove
     pipes" are re-configured within their domain.
                          Model for Information Integration — 3.0 EIA: Process & Stage                 27

-------
   •  Implementation of Data Standards. Data standards play a fundamental role in the EIA. As new systems are
      designed and legacy systems are re-engineered, there must be strong incentives for standards implementation.

   •  Data stewards play a very important role in this model: they approve operational data prior to its Storage for
      Use. They serve as "Sentries for quality." As the EIA evolves, the Agency must closely examine the role of the
      Data Steward: (1) to identify and leverage existing roles; (2) to ensure enough resources are allocated to support
      this role; (3) to ensure error correction and approval procedures are consistently followed; and (4) that their
      efforts support EPA's Quality System, as well as, decisions forthcoming about the Data and Information Quality
      Strategic Plan (EPA(4), 2002) recommendations.

 NEXT STEPS

   There are a number of specific steps Programs and Regions can take:

   1. Implement existing EPA data standards. These data standards include Biological Taxonomy, Chemical Identifica-
      tion, Date, Facility Identification, Latitude/Longitude, SIC/NAICS.

   2. Track the development of other standards and anticipate what will be required via the Environmental Data Regis-
      try to the extent possible. (Enforcement and Compliance, Geolocation, Permitting, Tribal Identifiers)

   3. Work with OEI to develop Exchange Network DET's and TPAs, and begin receiving data thru CDX.

   4. Modify Program systems to rely on registries to meet metadata needs and in some cases to conform to standards.

   5. Make the Program system data available in the Enterprise Repository.

   6. Modify applications to access the data in the Enterprise Repository.

   7. Take specific steps to use and share enterprise data, tools, and services.

   Additional steps that address the "mini-silo" problem include:

   8. Examine existing systems within Programs for the purpose and consistency with each other. Consolidate these
      where appropriate, in light of an office/Region wide understanding of data and information needs, as part of
      modernization efforts.

   9. Consider the design and development of Program/regional repository to extract data from several operational
      systems into an architecture that allows Program and Regional business needs to be consistently and efficiently
      met. This should be linked to Agency-wide repository efforts to produce an architecture that allows all reposito-
      ries to be accessed seamlessly.
28                          Model for Information Integration — 3.0 EIA: Process & Stage

-------
4.0  ENVIRONMENTAL INFORMATION ARCHITECTURE —
      STORE FOR USE
                                                                               Store for Use

                                                                              Enterprise Repository
                                                                                 Data Warehouse
                                                                                      Public
                                                                                     Internal
                                                                                Central Data Registries
                                                                                   Geospatial Data
                                                                             i  [Metadata Holdings Catal
  This next chapter addresses the storage
of data, specifically databases that enable
easier access to the Agency's information
resources and decision support. The IT
function, Store for Use, is the basis for
examining the way EPA's enterprise data
are stored, managed, and made available
for use. This function begins when Data
Stewards (HQ Program, Regional Pro-
gram, or State) approve the data found in
the operational databases (Chapter 3) and
make it available to the rest of the Agency
and other public users as appropriate. The
Store For Use function consists of the
activities necessary to ensure the data are
available and ready for analysis.

  The Store for Use function will be
supported by a set of coordinated data-
bases that are collectively referred to as the
Enterprise Repository (ER). These
databases are managed under a consistent
set of policies, procedures, and standards that promote data integration. The concept of the Enterprise Repository is
derived from the QIC approved principle that "EPA will provide access to integrated Agency data and shared
datasets." Figure-8 shows the collection of databases within the Enterprise Repository.

4.1 BACKGROUND

ENTERPRISE REPOSITORY

  In EPA's current IT environment, the procedures for processing data and then appropriately storing it for use vary
widely across the Agency.  Variations in the procedures governing version control, access control, archiving, error
correction, and documentation of these processes make it
difficult to locate and access data, particularly if a user is not
closely associated with the Program. These circumstances
make analyses difficult to replicate, which casts doubt on the
credibility of EPA analyses.
                                                 Figure 8: Enterprise Repository Decision
                                                           Support Components
                                                                   Enterprise Repository (ER)
                                                        A centrally coordinated set of databases that conform
                                                          to common policies, procedures, and standards
  One way that EPA currently addresses the problem of access is by copying (or linking) data from various Program
Office databases and storing it in the Envirofacts Information Warehouse. Currently databases "in" Envirofacts share a
common set of access methods, policies, procedures and standards, which allow for efficient storage, access, mainte-
nance, and integration of the data. In EPA's decentralized computing environment, this approach has proven to be the
only way to integrate EPA's environmental data and make it accessible for public use. However, this arrangement is
imperfect. Because data supplied to Envirofacts is a copy of a Program database, error corrections at the Program
                                 Model for Information Integration — 4.0 Store for Use
                                                                                                  29

-------
 database level are not always reflected in Envirofacts. Also, database copy updates are voluntary and unpredictable.
 This arrangement also does not address the inconsistency in policies and procedures that guide processing and storage
 across the Agency. Finally, the types of data available in Envirofacts are limited because Envirofacts is voluntary, and
 Programs are charged to use it through the Working Capital Fund (WCF).

 4.2 CRITICAL FEATURES AND OPERATIONS

   The target IT environment proposed in this model seeks to establish a robust data warehousing environment that is
 specifically structured to satisfy the query and reporting needs of EPA's internal and external information users. Program
 Office participation in the design of the data warehouse is critical to ensure requirements of all stakeholders are satisfied
 and to ensure the inter-office coordination necessary to maintain the quality of the data. The key to the success of this
 target environment will rest on the effectiveness of the policies, procedures, and standards that underpin data warehouse
 development and implementation.

   A data warehouse can be implemented in large organizations such as EPA in a variety of ways (Inmon, et al, 2001).
 A single "global" data warehouse is one approach. Another approach is to build "local" warehouses for Divisions or
 Offices and to connect them virtually to create a "global warehouse." Model selection should be based on which model
 best meets Agency business needs. The process for choosing among these models is noted in the "Issues and Next
 Steps" section of this chapter.

   Critical to the success of the Enterprise Repository is not the technology, but the management structure that surrounds
 it. The Enterprise Repository will provide policies, procedures, and standards for database management, query sup-
 port, and database administration. It is important to note that the Enterprise Repository will not be a single physical
 storage unit, but a coordinated set of databases.  Although theses databases may be independently managed, they will
 conform to a common set of policies, procedures, and standards to enable data integration and access to information
 maintained across the Agency. Databases that are part of the ER simply conform to ER policies.

   The Enterprise Repository will store EPA's "enterprise data " broadly defined as data that enables cross-media,
 cross-program analysis. Examples include data reported by States or the regulated community to EPA, geospatial data,
 or any other data that supports analysis, assessment, and modeling. Further analysis is needed to define the full scope,
 including the needs of both internal and external analysts.

   The Enterprise Repository will provide the following services:

   •  Data Warehouse Services.
   •  Registry Services.
   •  Geospatial Data Services.
   •  Metadata Management.

 4.3 DATA WAREHOUSE SERVICES

   The Data Warehouse will be the part of the Enterprise Repository that supports EPA's enterprise reporting needs.
 This document defines a data warehouse as a collection of integrated subject-oriented databases designed for decision
 support.  With consultation by the Program data stewards or systems owners, the Data Warehouse will be designed to
 illustrate the appropriate relationships among all the data holdings and to ensure that data integration and data consolida-
 tion are maintained at the highest possible level. The data in the warehouse will be reviewed, processed, and cleared by
 the Program before it is copied (or linked) from  the operational databases.
30                             Model for Information Integration — 4.0 Store for Use

-------
  Decision Support tools will analyze the data in the Data Warehouse for analysis. The standard access methods,
query support, and documentation provided by the Data Warehouse will make it easier for EPA and stakeholders to
build tools that access the data for analysis, reporting, and decision support. A Data Warehouse Master Plan develop-
ment study was recently funded and will explore EPA's
requirements and technical options for a data ware-
house.
   Data Warehouse (DW) - A collection of integrated subject-
   oriented databases designed for decision support (Inmon, et
   al,2001).

   Decision Support - Analysis of many units to aid in learn-
   ing, discovery, and problem solving (Inmon, et al, 2001).
  Primary Supporting System - Envirofacts Infor-
mation Warehouse

  The Envirofacts Information Warehouse currently
provides Data Warehouse Services at EPA. The
Envirofacts Warehouse is a collection of EPA environmental databases, largely derived from EPA source databases.
The Envirofacts Web site is an interface to the Envirofacts warehouse that allows the public access to numerous EPA
environmental databases using the internet. Envirofacts users can retrieve environmental information from databases on
Air, Chemicals, Facility Information, Grants/Funding, Hazardous Waste, Risk Management Plans, Superfund, Toxic
Releases, and Water Discharge Permits, Drinking Water, Drinking Water Contaminant Occurrence, and Drinking Water
Microbial and Disinfection Byproduct Information. Users may retrieve information from several databases at once, or
from one database at a time.  Online queries allow users to retrieve data from these sources and create reports, or
generate maps of environmental information.

  In its current configuration, the Envirofacts Information Warehouse is more like a "Federated Repository" than a
"Data Warehouse." A Federated Repository is a collection of databases from independent systems that have been
combined (either virtually or physically) into a common repository. This common repository allows the databases to be
integrated using common fields such as facility ID and chemical ID. The databases in this "Federated Repository" all
share a common set of access methods, policies, procedures and standards, which allow for efficient storage, access,
and maintenance of the data.

  The weakness of the Federated Repository is that the queries of multiple databases can sometimes produce inconsis-
tent results since the data elements belong to
independent (or "federated") databases with
independent data schemas. In a true data
warehouse, the databases are integrated into
one heterogeneous data schema designed
specifically to handle queries with an enter-
prise focus.

4.4 REGISTRY SERVICES
  Within the ER, a special set of databases
called the Centralized Data Registries will
provide the authoritative source of datasets
that are critical to data integration and infor-
mation exchanges between EPA and its
partners. Registry Services provide the means
for coordinating the management, access, and
use of EPA's Centralized Data Registries.
     Demographics 2000 Database
     Integrated GeoSpatial Database
     National Shapeflle Repository
Figure 9: Environfacts Information Warehouse
                                  Model for Information Integration — 4.0 Store for Use
                                                    31

-------
                                                                                    Data Schema
                                                                          The structure of data in a data
                                                                        warehouse usually indicated by data
                                                                        fields, formats, field attributes, and
                                                                        relations to each other.
WHAT is A REGISTRY?

  This document defines a registry as "an official and authoritative list of
specific, well-defined items of interest." The Facility Registry System (FRS)
is the Agency's keystone registry as it supports better management and
integration of data most closely associated with EPA's bottom line: regulation
of facilities. Within this broad definition, a registry can provide one or more of the following functions:
   Registration - As the name implies, registries allow users to
 register new items (add new items to the authoritative list) or
 screen existing items (check to see if items are already regis-
 tered). For example, Programs and some States are currently
 populating and validating facility identifiers in FRS. It is envi-
 sioned that eventually the CDX will use FRS to validate or add
 facility IDs submitted in reports via the Exchange Network.
                                                                        Centralized Data Registries
                                                              EPA's core "registry" systems: Facility Registry
                                                            System (FRS), Substance Registry System (SRS),
                                                            Environmental Data Registry (EDR), and Environ-
                                                            mental Information Management System (EIMS)
                                                            Previously known as, "System of Registries."
   Linkages - A registry can be used to establish linkages between items of interest. For example, currently the same
 facility may be called Facility A in one system and Facility B in another system. The Facility Linkage Application within
 FRS performs the link between unique IDs in each system.
   Metadata Reference Database - In addition to listing the authoritative
 terms, they may also store additional information about the item of interest or
 links to other related information. For example, the FRS contains the name and
 address of the parent company of a facility.
                                                                                       Registry
                                                                              An official and authoritative list
                                                                            of specific, well-defined items of
                                                                            interest to an organization
   Validation/Verification - Serve as automated normalizing agents to all standardized data received and transferred
 by the Agency. Information that varies from standardized protocol would be normalized or would include a notice
 indicating the discrepancy. For example, in the CRS the validation/verification function will identify where a chemical
 name does not match the reported Chemical Abstract Service Number.

   Discovery - Assist users in information searching and discovery. May also serve as indexes for finding other informa-
 tion in the Enterprise Repository.

   Cross-Referencing - Allowing for multiple representation of common data in disparate locations and systems.
 Application systems would not necessarily be required to change their representation of common data but could use the
 registry to map to equivalent concepts expressed elsewhere.

   At EPA, all registries serve as official and authoritative lists of specific, well-defined items of interest to EPA. Some
 registries are indexes that let information users know what's available and other registries are simply sources of
 metadata. At a minimum, a registry will perform the first function and optionally may perform functions two and/or three.
 Key registry systems are described below in Table-6.
32
                              Model for Information Integration — 4.0 Store for Use

-------
Registery Name Primary Supporting Project(s)
Facility Registration System (FRS)
Environmental Data Registery (EDR)
Substance Registry System (SR)
This registry contains both the Chemical Registry
System (CRS) and the Biology Registry System
(BioRS) as subsets
Environmental Information Management System
(EIMS)
Terminology Reference System (TRS)
Information Resource Registry System
XML Registry
Registration, Linkage, Metadata Reference,
Validation/Verification, and Discovery
Registration, Linkage, Metadata Reference Data-
base, Validation/Verification, Discovery, and Cross
Referencing
Registration, Linkage, Metadata Reference Data-
base, Validation/Verification, and Discovery
Registration, Linkage, Metadata Reference Data-
base, Validation/Verification, Discovery, and Cross
Referencing
Linkage, Metadata Reference Database
Registration, Linkage, Metadata Reference Data-
base, and Discovery
Registration, Linkage, Metadata Reference Data-
base, and Cross Referencing
                                  Table 6: EPA's Registry Systems

PRIMARY SUPPORTING SYSTEM- FACILITY REGISTRY SYSTEM (FRS)

  As discussed, the purpose of FRS is to provide EPA with a central database of facility identification records and to
provide links to all facility-oriented Program system records. The independent management of EPA Program Systems
has lead to multiple unique identification for single facilities.

  A strategy under consideration is whether FRS should evolve into the only source of facility identification. This
approach requires two fundamental changes: (1) OEI will own and maintain the physical facility identification record;
and 2) Agency Programs will become dependent on the FRS as their only source for facility identification data. The
strategy is to seek out all reliable sources of accurate facility identification data and, with appropriate documentation,
populate the FRS records. This will improve efficiency and quality, however this approach has both technical and
policy implications. The issues must be analyzed and debated as part of the "Registry Linkage" options study planned
(Chapter 6).

PRIMARY SUPPORTING SYSTEMS - EDR, EIMS, IRRS, & XML

ENVIRONMENTAL DATA REGISTRY (EDR)

  The Environmental Data Registry (EDR) is a comprehensive, authoritative source of reference information about the
definition, source, and uses of environmental data. The EDR catalogs major data collections and helps locate environ-
mental information of interest. As the major tool supporting the Agency's data standards program, the EDR records and
disseminates information about Agency data standards and the standard-setting process. The EDR is also affiliated with
the Substance Registry System with the Chemical Registration System and Biological Registry System as subsets, and
the Terminology Reference System.
                                 Model for Information Integration — 4.0 Store for Use
33

-------
 ENVIRONMENTAL INFORMATION MANAGEMENT SYSTEM (EIMS)

   EIMS is a repository of information products and metadata. EIMS stores, manages, and delivers descriptive infor-
 mation, i.e., metadata, for data sets, databases, documents, models, multimedia projects, and spatial information. The
 EIMS user community includes environmental scientists, resource managers, and other stakeholders—both within
 EPA, the research community, and from the general public. Users can search within EIMS to find information sources
 of interest based upon topic or defined criteria related to types of environmental resources, geographical extent, date, or
 content origin. The EIMS repository of scientific documentation, accessed with standard web browsers, places a virtual
 library on the desktop of EPA staff and others with Internet access. The EIMS architecture also supports the manage-
 ment of complex data, such as remote sensing data, Geographic Information System (GIS) coverages, and other types
 of data.

 WHAT OTHER REGISTRIES ARE NEEDED?

   There are plans to link the EDR and EIMS in a meaningful way so that both can provide key inputs to the effort to
 support the development of the Central Data Registries and EPA's node catalog for the Network. Currently Facility and
 Chemical identification have received the most attention as key data elements for integration. However, "regulation" and
 "business sector" have also been cited as useful for cross-program integration. Additional analysis needs to define
 Agency needs, and determine how these needs can be fulfilled using existing registries. Appendix C provides prelimi-
 nary suggestions for other registries that may support EPA's business needs. Efforts are already underway on the
 registries needed.

 INFORMATION RESOURCE REGISTRY SYSTEM (IRRS)

   An information resource registry would start with an application systems registry to catalog information resources as
 required by numerous Federal regulations, policies, and oversight agencies. (One specific need is related to the require-
 ment that the Agency certify that all of its information systems meet security requirements.) This information was for-
 merly stored in the Information Systems Inventory (ISI) until the mid 1990's.  Much of the information from this system
 was used in populating information resource metadata records as part of the Government Information Locator Service
 (GILS) effort. Application system information currently exists in the EDR registries and could be extracted to populate
 this registry. Additionally, the results of the Y2K inventory could contribute to this registry population. OEI is currently
 working with the EPA's Enterprise Architecture initiative to construct this registry. It is projected to be operational
 (although not fully populated) by the end of calendar year 2002.

 XML REGISTRY

   An XML registry/repository is envisioned to make reusable data components and specifications available. The
 Agency has identified the need for a registry to manage XML objects, to support consistency of XML development,
 and to support effective and consistent technology use and implementation. As XML objects, including tags and
 schemas, are closely related to data elements, this registry could be implemented as part of the EDR registries. The
 Data Standards Branch in conjunction with the EDSC has contracted with the National Institute of Standards and
 Technology to create a pilot XML registry. It will be operational by the end of April 2002. This registry will be oper-
 ated on a trial basis for six to nine months. At the end of the trial period, a decision will be made as to where the
 registry will be hosted and whether it will be integrated into the EDR.
34                             Model for Information Integration — 4.0 Store for Use

-------
4.5 GEOSPATIAL DATA SERVICES

  Geospatial Data Services provide the means for coordinating the management, access, and use of EPA's Geospatial
Information. Geospatial data are currently acquired and managed independently by many different EPA Program
Offices and, as a result, there is often duplication of effort and resources.

  There are several components necessary to eliminate redundancies and better leverage resources. First, a Geospatial
Data Index (GDI) will enable users to easily find and identify geospatial data holdings within EPA, easily access
metadata about that data, and where web linkage exists, access that data. It is envisioned that EIMS (the Agency's
federal geographic data standard node for FGDC/NDSI) will be the engine for the GDI. Second, one core enterprise
geospatial dataset necessary to implement key EPA business operations will be accessible to staff and partners. This
accessibility will be achieved via one or more integrated geospatial databases and/or through linkages to master files
housed at partner organizations. Third, the full implementation of the Agency data standard for latitude/longitude will
further enhance data exchange and sharing.

  It is envisioned that the geospatial technical infrastructure will consist of a series of linked Headquarters, Regional,
and ORD Laboratory nodes which will support seamless access to distributed geographic data and services by EPA
staff, partners, and stakeholders.
  The key success factors are: (1) having metadata associated with all data; and (2) the computing and telecommunica-
tions capacity for anyone in the Agency to access that data whether it be on the integrated Headquarters server or on an
integrated Regional and/or ORD Laboratory server.

PRIMARY PLANNING  MECHANISM - GEOSPATIAL BLUEPRINT

  In June 2001, EPA's Office of Environmental Information (OEI) completed an assessment of the current use of
geospatial data and technologies throughout the Agency. The resulting "Geospatial Activities Baseline Assessment"
(EPA, 2001) describes the use of geospatial data and technologies in support of the Agency's business operations, and
documents current data sets, hardware, software, users, expenditures, applications, and issues related to geospatial
technologies. More than 350 individuals across all Regions and headquarters Program offices actively contributed to
the development of the Baseline, confirming the pervasive and critical role that these data and technologies play in the
Agency. Contributors indicated that most of the Agency's business operations are tied explicitly to geographic locations
and are currently supported to some extent by the use of geospatial technologies. Many users, however, expressed
additional needs for geospatial data and analyses, as well as concerns about their ability to fully utilize the technologies
due to a variety of issues that characterize the present EPA organizational and information management environment.

  The Geospatial Blueprint, slated for completion in the Spring of 2002, will likely recommend mechanisms to eliminate
redundancies and provide for more efficient management and use of geospatial information by managing geospatial
information as a corporate resource.

  The Geospatial Blueprint will likely recommend an approach to more effectively organize, coordinate, and leverage
geospatial activities on an enterprise-level within EPA and with its partners in environmental protection. The intent is to
have the Agency operate on a common vision, move as an organization in a defined direction, and create an environ-
ment where geospatial data/tools are shared resources and incorporated into daily operations.

4.6 METADATA SERVICES

  Metadata are data about data, and it is critical to locating and understanding EPA's data holdings. Metadata Services
within the ER provide coordinated management of metadata across the Agency.


                                  Model for Information Integration — 4.0 Store for Use                  35

-------
   In EPA's current IT environment, metadata have been buried in a myriad of independent systems which make it
 difficult to use. Format and content of metadata information vary across EPA. Some metadata records are duplicated
 and inconsistent (EPA(14), 2001). Moving data to a data warehouse has little effect if users cannot locate and identify
 the data in the warehouse.
   Metadata Services help to ensure a single source of consis-
 tent and current information, version control, and availability. An
        Metadata and Holdings Catalog
The authoritative source of EPA's metadata. Also
 enterprise commitment to metadata management is a foundation     referred to simply as the "Holdings Catal°9-"
 for future data standards that will support consistent data
 interpretation.

 METADATA AND HOLDINGS CATALOG

   The Metadata and Holdings catalog will be a special registry that supports Metadata Services.

   It is envisioned that the Holdings Catalog will support metadata management by serving as the authoritative source of
 tracking and managing metadata for EPA. Just as information about EPA-regulated facilities is stored in the Facility
 Registry, information about EPA's information resources will be stored in the Metadata and Holdings catalog. The
 Metadata Strategy currently under development at EPA supports the establishment of an authoritative source of
 metadata and describes how this registry can be used as a vital tool in the management of EPA's metadata.

   The Metadata and Holdings Catalog will support the System of Access by serving as the source of information about
 EPA's data holdings. This will provide the System of Access the capability to allow users to search for information
 about EPA's databases on a user's authorization level. The Holdings Catalog may contain information such as a listing
 of datasets, registries and their contents, record field names, associated data types, and formatting for records in the
 warehouse or in the operational data stores. The Holdings Catalog may also contain active metadata used by Access
 and Decision support tools to perform online queries.

   With regard to operations, the Metadata and Holdings Catalog will be part of the Enterprise Repository. As required
 for the other databases discussed, the Holdings Catalog will be required to conform to the policies, procedures and
 standards of the Enterprise Repository. The Holdings Catalog may have an operational component where the data are
 staged, managed, and reviewed for quality. It may also have a public access component that is part of the Data Ware-
 house. Additional research needs to be done to identify the detailed requirements and specifications for the Metadata
 and Holdings Catalog. This research will be done through EPA's Metadata Strategy.

 EXCHANGE NETWORK NODE CATALOG

   The Exchange Network Blueprint envisions that a subset of the Metadata and Holdings Catalog will be made avail-
 able as the Exchange Network Node Catalog for EPA's network node. Each partner in the network will provide a
 similar Catalog detailing the holdings of that partner being made available to the Exchange Network. Each network
 node will thus have a catalog advertising the information resources which are available on that node and information on
 how to access that resource.
36                             Model for Information Integration — 4.0 Store for Use

-------
4.7 BENEFITS OF ENTERPRISE REPOSITORY

  The Enterprise Repository provides the framework for managing EPA's data assets as a corporate resource by
applying a common set of policies, procedures, and standards to EPA's Data Warehouse, the Centralized Data Regis-
tries, Geospatial Data, and the Metadata Holdings Catalog. This coordinated management allows the Enterprise
Repository to provide the following benefits:

  •  More Efficient Data Management and Use of Resources
  •  Improved Access and Use of Agency Data
  •  Improved Data Quality

MORE EFFICIENT DATA MANAGEMENT AND USE OF RESOURCES

  The Data Warehouse promotes more efficient data management and use of agency resources by helping maximize the
expertise within the Agency for database modeling, development, management, and access/security The integrated
design and management of the Data Warehouse will promote more efficient data retrieval. The consistent standards and
documentation of the warehouse will allow for a more efficient and faster development schedule for tools and applica-
tions. As a result of the Data Warehouse, the Agency will gain more staff for analysis and Program operations. Expert
Program staff can spend more time on the data rather than on the operations of the database system.

  The Centralized Data Registries will help promote more efficient data management and use of agency resources by
reducing redundant data collection and storage, and  providing an accessible source of authoritative data that can be
reused or linked to other Program databases and enable conformance with some data standards.

  Geospatial Services will promote more efficient use of resources by helping reduce the duplication of effort in the
acquisition, management and use of geospatial data and tools. The Metadata and Holdings Catalog will provide the
database and the management structure to improve overall coordination of EPA's metadata assets and make EPA's data
assets easier to share and reuse.

IMPROVED ACCESS AND USE OF AGENCY DATA

  The Data Warehouse will help improve access and use of Agency data by making EPA's mission-critical data cen-
trally accessible to all EPA employees, partners, and stakeholders from one logical location. The Data Warehouse will
also provide the consistent data and query support needed to produce the consistent, replicable, cross-media analysis
and reporting.

  Consistent standards and documentation of the environmental data, combined with coordinated management of
geospatial data will help EPA provide analytical tools that portray a more accurate picture of environment. The Central-
ized Data Registries will provide the linkages between the databases that is necessary to enable effective, consistent,
cross-media analysis. Cataloguing the data and tools through the Metadata and Holdings Catalog will make it easier to
locate the data and the tools to access and analyze it. The policies of the ER will also help coordinate access to the
tools through the Systems of Access.
                                 Model for Information Integration — 4.0 Store for Use                  37

-------
 4.8 ISSUES AND NEXT STEPS

 ISSUES

   The primary QIC/senior EPA management issues associated with the Enterprise Repository concern the Data Ware-
 house. These include:

   •  Defining the function and scope of the Data Warehouse.  While EPA has used Envirofacts as a data ware-
      house, it has not committed to the basic principles embodied in the broader warehouse concept put forward in this
      document. Perhaps most prominent is the redirection of Program applications to the Enterprise Repository.
      Agreement on the basic scope and function issue and on the general architecture of the warehouse is essential
      before final design and implementation begin. This will be examined as part of the Data Warehouse Master Plan
      recently funded through the SMF investment process.

   •  Agreeing on the data and transition activities for the Warehouse. This includes Agency- wide agreement on
      what constitutes "enterprise data," the processes for providing data to the ER, as well as definition of roles and
      responsibilities of OEI and Programs.

   •  Resolving resource issues associated with the Warehouse.  In terms of scale, the data warehouse concept
      proposed is essentially new to EPA. It contemplates that most existing applications will be redirecting their input
      data to the data warehouse and that most of the Agency data will be moved from operational data systems in
      Programs and Regions to the warehouse. This involves development, operational, and data transfer costs not
      currently budgeted or incorporated within overall maintenance costs of specific Program systems.

   Issues also exist with regard to other elements of the Enterprise Repository.

   •  Determining the scope of the Centralized Data Registries and the Metadata and Holding Catalog. The
      Agency must reach a management level agreement on the number of registries and how they will work together to
      provide an authoritative summary of the key entities (e.g., facilities, chemicals, regulations) that EPA addresses in
      order to fulfill its functions. This includes agreement on the architecture of registries and of the Metadata and
      Holdings Catalog.

   •  Making the Registries Authoritative Sources. Registries serve the important function of providing an enter-
      prise a holistic view of its information and technology resources. But registries are only authoritative if they are
      populated, representative, regularly refreshed, and easy to use. Measures must be taken to ensure that EPA's
      Centralized Data Registries contain regularly refreshed data.

   •  Building a "System of Registries." This model proposes a cohesive, interdependent set, i.e., a "system" of
      authoritative registries. Currently EPA maintains a number of separate registries and is proposing to build additional
      new ones (EPA(15), 2001). Prior to the design and implementation of the Metadata & Holdings Catalog pro-
      posed in this model, EPA should look at all the registries currently in operation and develop a strategy to stream-
      line and connect them in a meaningful way. Although the EDR, SRS, and TRS have been developed and operated
      together, other registries come packaged with their own development and maintenance needs to make them an
      "authoritative source." As the number of separate registries increases, the less likely that EPA staff will want or be
      able to keep them current and authoritative.
38                             Model for Information Integration — 4.0 Store for Use

-------
  •  Over the last few years the concept of a "Place Registry," has been bandied about as a critical need for the
     Agency. Clearly, "place" is a key integrating element. "Place" can be designated as a point - a lat/long value for a
     discharge pipe (currently supplied by the FRS), or a polygon, e.g., a watershed area, or a designated wetland
     area. Many have argued, however, that the latter are actual geographic (geospatial) coverages that can be
     purchased or accessed in a data partner's database. On the other hand there are others who feel these "places"
     (polygons) of interest should be registered and inventoried in an EPA Registry for integrated analysis.  In the
     coming year, the, "Place" registry needs a clear definition including its relationship, if any, to the FRS and the other
     Centralized Data Registries and the Geospatial Data Services.

  •  Resolving resources issues associated with registries and the catalog. While the resource issues associated with
     these elements are less than those associated with the data warehouse,  it is important to address increased devel-
     opment costs associated with modification of registries to provide more general access and increased data trans-
     actions associated with flow of information among registries and with Program systems.

  •  Establish and implement geospatial data management policies. While some core enterprise geospatial data will be
     continued on the Headquarters Integrated Geospatial Database, others will be stored on servers in the Regional
     and ORD laboratory nodes of the enterprise geospatial system. It will be critical to apply uniform data standards
     to all the geospatial data and make it accessible via the geospatial technical infrastructure.

NEXT  STEPS

  There are a number of next steps associated with the Enterprise repository.

  1.  Identifying Data Warehouse architectural, procedural and technical options. This includes an examination of
     agency-wide and Program/Region specific data warehousing requirements. Included here is the question of what
     data belongs in the Data Warehouse.  Once requirements have been identified, options for meeting those require-
     ments need to be developed. These options could range from status quo, to purchasing a commercial data
     warehousing solution, to developing a customized in-house data warehouse solution. The options analysis might
     recommend a central data warehouse, or a set of small data warehouses for each office, or some combination of
     all of the above.

     Procedurally, needed are policies addressing migration requirements for all eligible databases and datasets as well
     as a procedure for identifying exceptions. Also needed are options for interim access methods for getting data
     from legacy systems as the elements of the enterprise repository are being developed and implemented.

  2.  Finalizing the architecture of geospatial data and services within the overall Enterprise Architecture. This is needed
     to complete the process on fully integrating geospatial with other programmatic environmental information.

  3.  Defining architectural, procedural and technical options for the Centralized Data Registries. Options for the
     scope, number, and architecture of Agency registries needs to developed. Key here is determining how to link
     registries and to grant access to them seamlessly given the differing content and data structures of the current
     registries.

  4.  Defining architectural, procedural and technical options for the Metadata and Holdings Catalog. The relationship
     between the Metadata and Holdings Catalog and the datasets it references needs to be defined as well as the
     scope, e.g., whether it includes references to holding outside EPA in the overall Exchange network.  Critical also is
     the relationship between EPA's node catalog and other Network node catalogs with the overall development,
     implementation, and on-going operation of a metadata strategy for the Agency.


                                  Model for Information Integration — 4.0 Store for Use                  39

-------
  5.0 ENVIRONMENTAL INFORMATION ARCHITECTURE — USE
   The last function envisioned in this model is the
 Use function. The EIA components that support
 data use are mainly Decision Support Tools.
 However, the System of Access, the Enterprise
 Repository, as well as the models and algorithms
 that support analysis and decision making also
 play a role.

 5.1 BACKGROUND

   In EPA's current computing environment data
 are used in a variety of ways to support business
 needs ranging from one-time direct queries for
 compliance information to cross-media
 geospatial analysis.
                                                Decision Support
                                                     Tools
               Figure 10: Decision Support Tools Use Data in
                          the Enterprise Repository
   The Use function is also supported by a variety of Decisions Support Tools that are either customized or purchased
 to meet a specific analytical need. Further, the data sources used are also highly variable and include Programmatic or
 Regional systems, centrally managed stores like Envirofacts, federal sources, or commercially available datasets &
 geographic coverages.
                                                    Tool - Any device or that aids in accomplishing a task.
                                                  this document tools are often used synonymously with
                                                  "application."
                                                                  In
                                                    Decision Support Tool - Tools which enable analysis of
                                                  many units to aid in learning, discovery, and problem solving
  There are three major issues associated with the
current cross-agency support of the Use function that
the EIA must address:

  (1) As discussed in Chapter #4, the procedures for
     storage, archiving, and access of information are
     highly variable across the Agency. Tools avail-
     able to the public may not always draw upon the same data sources or datasets that were used to generate key
     EPA reports that have been made public. Thus, it is possible that the results of analyses will vary, casting doubt
     upon the credibility of EPA's analyses and decisions.

  (2) Because of the independent ownership and management of tools across the Agency, they are often with hard to
     locate.

  (3) The methodologies and tools employed may not always be appropriate for the intended use of the data. This is
     not an issue that can resolved with technology.  It involves the deployment of data use and quality assurance
     guidelines, training, and oversight mechanisms (e.g., peer review.)

5.2 CRITICAL FEATURES AND OPERATIONS

  Key to this chapter are the tools themselves. Decision support tools will provide a variety of services to support the
access and analysis of EPA's data. The services may be enabled with graphical displays or interfaces similar to those
available like Arc View, ArcExplorer, and IBM Data Explorer. Specific capabilities are dependent upon the business
need, user needs, data sources, and the applications implemented.
40
Model for Information Integration — 5.0 EIA Use

-------
  In order to eliminate redundant purchases of datasets and other Computer-Off-The-Shelf software (COTS) and
improve the credibility of analyses, three key policies are critical: (1) enterprise data must be made accessible in the ER;
(2) tools must use data deemed ready for use (e.g., "Stored for Use," and located in the ER); and (3) design and
deployment of tools must be coordinated, and tracked agency wide via mechanisms like the Integrated Resource
Registry System to ensure they are available via the System of Access.

  In this model, decision support tools are made available via the web-enabled Enterprise Portal and located using the
System of Access. In some cases, access to locally provided services may include click-throughs via a networked
computer desktop (e.g., a Windows desktop on a Novell network).

  The types of Decision support tools that Program Offices might provide are quite varied. Below are some example
capabilities, summarized from review of a number of EPA tools (EPA(9), 2001):

  •  Review answers to database queries on EIA datasets.
  •  Prepare reports.
  •  Plot variables at different scales.
  •  Create maps featuring overlays.
  •  Drill down spatial data from national, to state, and to county levels.
  •  Review trans-boundary air toxics movement provided by remote sensors.
  •  Explore relationships of different environmental media in ecosystems.
  •  Use TRI data for overlaying risks and hazards on a chemical-by-chemical basis.
  •  Monitor and assess the status and trends of national ecological resources.
  •  Perform analyses involving multiple independent datasets, some of which are outside the EPA (e.g., health data).
  •  Estimate spatial distribution ofbiogenic emissions.
  •  Model human exposure to urban air pollution.
  •  Display database extracts in a geographic context on maps.

  Other critical features of the Use function are the policies, guidelines, and procedures that promote credible data and
information to support the use of decision support analyses. The Agency currently supports a Peer Review process as
well as a Quality System which directs the collection and use of data and information and support credible analysis.
Unfortunately, the procedures associated with these systems are overlooked.

   These circumstances are somewhat fueled by the ease with which tools and data can be purchased and used. The
drive for analytical innovation has sometimes lead to creative uses of data not intended at the initial point of collection.
The coordinated management of tools and data, either associated with the ER or the System of Access, must take into
account the analytical controls established to ensure credibility of EPA analyses. These include the recommendations in
the analytical Best Practices and EPA Guidelines to Ensure and Maximize the Quality, Objectivity, Utility, and Integrity
of Information Disseminated, both forthcoming in FY2002.

PRIMARY SUPPORTING PROJECT - TRI EXPLORER

  Toxics Release Inventory (TRI) Explorer is a Web-based analytical tool that allows users to generate reports on
specific chemicals and reported chemical releases by industry sectors, environmental media, geographic area, and
individual facilities. TRI data users can compile their own reports on-line. The TRI Explorer allows users to easily
determine what toxic chemicals might be present in their neighborhood, how the reported releases are changing over
time, and how their own situation compares to other communities around the country. TRI Explorer provides data for
all reporting years since  1988. The data are synchronized with the published Public Data Release documents (Lai,
2001). Data used in the TRI data is from Envirofacts.


                                       Model for Information Integration — 5.0 EIA Use                  41

-------
 PRIMARY SUPPORTING PROJECT - WINDOW TO MY ENVIRONMENT

   Window to My Environment (WME) is another Web-based analytical tool. WME combines interactive maps, with
 links to federal, state, and local environmental data, to provide the public with information on environmental issues and
 conditions affecting their community or location of interest. Developed through an EPA-State partnership, WME
 answers popular questions about a community's air, land and water, as well as what is being done locally to protect the
 environment

   Particular features of WME include:

   •  Interactive Mapping Tools: WME allows the user to control the area the user can map and view the location of
      regulated facilities, monitoring sites, waterbodies and watersheds, and demographics. As well as traditional
      geographic designations like streets, counties, schools, and so on. The user can zoom, pan, and move all around
      the area, and watch the information the user receives dynamically shift before the user's eyes to reflect the new
      area the user has chosen. Then the user can look at the three dimensional view of local area land use patterns in
      the user's area.

   •  Data on "Ambient" Environmental Conditions: WME provides daily Ultra-violet (UV) Index reading. Advice is
      also available on health effects of exposures to sunlight, locations/reports from local air and water quality monitor-
      ing sites, land cover characteristics, and more.

   •  Access to Analytical and Reporting Tools: WME links to EPA's Envirofacts, TRI Explorer and Surf Your Water-
      shed tools, as well as State tools like Pennsylvania's "E-Facts" and Delaware's "Environmental Navigator,"
      providing the user with the ability to generate custom reports on specific chemicals, facilities and trends in a
      selected area.

   •  Local Governmental Services and Contacts: WME links the user to dozens of government and non- government
      organizations and contacts with information on local issues in the user's area of interest.

 5.3  BENEFITS OF DECISION SUPPORT TOOLS

   A coordinated suite of decision support tools, which draw upon a consistent set of data, are used in accordance with
 guidelines and processes that support credible analysis is invaluable. A coordinated suite of tools will also:

   •  Save the Agency money. Coordination will eliminate redundant tools, datasets, and partnerships to gain access to
      the same types of data. This will enable the Agency to leverage resources for other analytical endeavors.

   •  Help to better serve analysts, both internal and external by: (1) making tools more prominent and accessible; (2)
      enabling consistent content and interfaces; (3) meeting graphical and geographical analytical requirements; (4)
      enabling "what if scenarios with data; (5) supporting analyses of multiple independent datasets.
42                                Model for Information Integration — 5.0 EIA Use

-------
5.4 ISSUES AND NEXT STEPS

ISSUES

  The issues currently identified for Decision Support Tools are:

  •  Linking Tool Design, Management, and Use to Quality Guidelines and Best Practices. As discussed in Chapter
     1, EPA will soon have to implement guidelines to support the quality, objectivity, utility, and integrity of information
     disseminated. These emerging guidelines must be integrated into future tool design, management, and use to
     document the credibility of EPA analyses.

  •  Datamarts. Many integrated architecture models, like the Corporate Information Factory (Inmon, et al, 2001)
     highlight the usefulness of datamarts, i.e., subsets of a data warehouse which are customized to address a business
     need and support a specific tool. They are beneficial as they are, "owned" and managed at a departmental level;
     they are less expensive than maintaining a departmental warehouse; and they are flexible and can be customized
     for departmental reporting and analysis (Inmon et al,  2001). The Master Plan for a Data Warehouse effort should
     take into consideration these benefits for Programs as well as the potential role of datamarts in the development of
     the Enterprise Repository.

  •  Linking Use of Data to Planning for Data. The Information Quality Lifecycle adopted in this model is cyclical, i.e.,
     the Use of data and information influences future Planning for data. As the Agency uses data to understand
     emerging environmental problems it is unclear whether there are formal mechanisms in place to capture gaps and
     new data needs. Governor Whitman recently launched and Environmental Indicators Initiative. In the coming year
     the Agency will produce a "First Report that will provide an inventory of EPA indicators, identify promising
     indicators that allow us to report on the environment,  as well as identify data gaps..." (EPA(7), 2001). This Report
     should influence future planning for data. The Report should also influence the types of data made available in the
     ER and the future design of tools to support the results.

  •  Committing to enterprise-wide decision support. EPA has historically focused on data collection and monitoring.
     It has spent large sums in developing systems to collect and store this data. Typically, attention to analysis and use
     of the data to support internal decision-making has been ad hoc and resources to do so limited. The vision of
     Decision Support Tools contained in this document requires Agency agreement on two themes that depart from
     this traditional approach: (1) more attention and resources are needed to develop and utilize tools which support
     decision-making by organizing our information, (2) this effort needs to be systematic, multi-program in scope, and
     focused on leveraging resources to address agency-wide as well as specific Program and Regional needs. The
     second theme implies a commitment to having many Program applications used for decision making use the data
     warehouse rather than a Program database.

  •  Agreeing on Scope, timelines and transition to decision support services.  As an essential element of commitment
     to a vision of enterprise-wide decision support, agreement is needed on the scope, timeline for implementation,
     and transition plan for this function.

  •  Addressing decision-support resource issues. As envisioned, this decision support function is a new agency-wide
     commitment. Current costs for this activity are often folded into on-going development and operation of the
     systems from which the decision support tools obtain  the data. Separating this cost from these legacy systems,
     and funding agency-wide services meeting this function, will require careful consideration from senior leadership.
                                       Model for Information Integration — 5.0 EIA Use                  43

-------
  NEXT  STEPS

    1.  Building a registry of current decision support tools available to the Agency. It is proposed that this be part of the
       Information Resource Registry System, and build on tools identified in EPA Public Access Tool Inventory man-
       aged by OEI and the tools identified in the Geospatial Baseline Assessment (EPA(9), 2001). This is necessary to
       understand our current assets and to focus on tools that might be expanded for agency-wide use, thus leveraging
       our existing knowledge.

    2.  Developing options for a decision support architecture - essentially an Agency applications architecture, and for a
       transition plan that can achieve this in a reasonable and timely fashion.
44                                Model for Information Integration — 5.0 EIA Use

-------
6.0   ENVIRONMENTAL  INFORMATION  ARCHITECTURE -
       FOUNDATION
  So far, this document has introduced a series of IT functions and a proposed set of "core component" technologies,
policies, plans, and services. In order for these components to be interconnected and interdependent, they must be
designed according to a blueprint (the enterprise architecture) and implemented and managed in accordance with
standards and policies. The Foundation component, thus, is EPA's enterprise architecture itself, and the standards and
policies that govern IT development and management. The Foundation is the "glue" that links the IT functions and
connects core components together in a meaningful way.

  This section briefly describes some of the history of standards, policies, and enterprise architecture development at
EPA; the process for developing an enterprise architecture; and a more detailed description of how EPA's architecture
effort is divided. It then presents some of the key standards and policies framed within the layers of an enterprise
architecture that are pivotal to systems interoperability and data integration.

6.1   BACKGROUND

  As discussed in Chapter 1, EPA's information and information technology has largely been divided and independently
managed along programmatic lines. Over the years, declining budgets, the need for information that crosses program-
matic boundaries, government-wide IT policies, and stakeholder demands have driven the Agency to develop and
adopt policies and standards to ensure consistency in some aspects of technology design, implementation, and manage-
ment

  For the purposes of this document, a standard is a set of criteria, or a convention (FAWG, 2001) used to maintain
consistency across multiple entities. For example, EPA has a data standard for calendar date representation in Agency
information systems. A policy is a statement that is binding on entities within its scope. At EPA, some standards  are
expressed as policies through the Agency Directives System. Policies help to maintain consistency and compliance
within an organization. Standards enable interconnection of processes, applications, and information (Cook, 1996).
Typically, standards and policies are developed with input from parties affected and carry with them some penalty for
violating them (Spewak, 1992).

  In general EPA's information services organization, (initially the Office of Information Resources and Management,
and over the last two years, the Office of Environmental Information) leads Agency-wide processes to create IT-related
policies and standards.  These policies range from a topic as broad as, "System Life Cycle Management," (EPA( 19),
1994) to something as narrow as the, "Personal Use of Agency Equipment" (EPA(16), 1998).

  The development of these policies and standards has been an important initial step towards linking systems, integrat-
ing information, and streamlining processes. For example, EPA has developed and approved a Facility Identification
standard which has been instrumental in enabling an integrated view of a single facilities' performance across several
environmental Programs. Another is the policy to use Lotus Notes as the Agency's standard for e-mail communication.
While burdensome for some to move to a new system, the standardization of e-mail software has made intra-agency
communications more efficient and led to significant cost savings.

  However, the IT policies and standards to date have been created in the absence of a blueprint describing how  the
Agency will collectively manage its information and technology assets in support of the Agency mission and goals. In the
absence of an enterprise architecture, IT policy and standards setting has been slow, piecemeal fashion, yielding limited
gains in integrating processes, data, and systems.
                               Model for Information Integration — 6.0 EIA Foundation
45

-------
 6.2   CRITICAL FEATURES AND OPERATIONS OF THE FOUNDATION

   As indicated in Table-7 there are a number of activities that are common to the enterprise architecture and standards
 and policy development. Common to all of these is the use of collaborative processes, involving multiple stakeholder
 groups for development and implementation.
Foundation Component Activity Example
Enterprise Architecture
Standards
Policies
Development/Defining
Planning
Management
Identifying business need
Development
Implementation
Identifying business need
Development
Implementation
Baseline, target architectures
Sequestering plan
Overseeing implementation
Facility ID standard
XML standards
Security policy
Technology policy
                           Table 7: Foundation Components and Activities
 ENTERPRISE ARCHITECTURE

   An enterprise architecture is simply a definition of an organization's business and a description of the processes, data,
 applications, and technology that support it (Spewak, 1992). Typically enterprise architecture efforts start by develop-
 ing a baseline - a portrayal of the existing business. Once the baseline is complete, the "to be" or target portrayal of
 processes, data, applications, and technology is undertaken and generally captured in an enterprise's strategic thinking
 (FAWG, 2001). The strategy for transitioning from the baseline to target is the sequencing plan which includes a
 schedule of multiple, concurrent, and incremental builds that evolve the enterprise (FAWG, 2001).

   EPA's enterprise architecture will:

   •  Allow the CIO and Quality and Information Council (QIC) to make more informed IT investment decisions;
   •  Enable data integration by documenting the desired relationships among EPA's applications and data stores;
   •  Improve interoperability of EPA's applications by minimizing the number of system interchanges through reliance
      upon standards and common data repositories; and
   •  Allow the Agency to respond more quickly to changing business requirements by establishing direct relationships
      between its IT portfolio and business functions

   As depicted in Figure-11 EPA's architecture allows the Agency to ensure its investment in information technology is
 aligned with the Agency's mission and supports the Agency in its quest to meet its strategic goals.

   The enterprise architecture planning process provides a methodology to break down and model the Agency along 5
 distinct (horizontal) layers: business, data, applications, technology, and security.  Each architectural layer will be
 defined following a standard modeling protocol, resulting in a baseline and target environment model architecture being
46
Model for Information Integration — 6.0 EIA Foundation

-------
  defined by the Agency enterprise
  architecture team (See text box
  below).

    To accomplish this in manageable
  components, EPA's enterprise
  architecture development will
  happen in three semi-concurrent
  projects focused on three major
  business domains of the Agency:
  the environmental programs of the
  agency (the Environmental Informa-
  tion Architecture), the Agency's
  large research and development
  function (the Research and Devel-
  opment architecture), and the
  administration and finance functions
  (Administrative Systems Architec-
  ture).
                                   Conceptual Framework
                                  Environmental Business
                                            Architecture
                                  Information Architecture
                                                           Goals
                                                           Agency
                                                          Processes
                                                            Data
                                                         Applications
                                                         Technology
                                                           5
                                                           6

                                                           7
                                                           8
                                                           9

                                                           10
EPA's 10 Strategic Goals
Clean Air
Clean and Safe Water
Safe Food
Preventing and Reducing
Pollution
Effective Waste Management
Reduction of Global and Cross
Border Pollution
Right to Know Initiatives
Sound Science
Deterrent to Pollution and Greater
Compliance with the Law
Effective Management
                                       Figure 11: EPA Enterprise Architecture Framework
/
\
Business


Data/Metadata

Applications
Technology

Security
           Definition of EPA's Architectural Layers (EPA, 2001)
Models of functional areas (e.g., permitting, compliance, and monitoring), business processes
within the functional areas, and relationships between those functions and processes across the
Agency
Models of EPA's information holdings to identify holdings, what they are used for, and where
they are housed.
EPA's environmental and non-environmental information systems
EPA's network resources (hardware, network, non-system software) that enable the Agency's
applications
EPA's information security concerns related to business process, data storage and access,
system-level controls, and EPA's network and communications.
    The business process, data, and applications architecture will be individually defined in FY2002 for each these three
  business segments, and merged together later on. The technical infrastructure and security architectures, inherently
  'enterprise' in function, will be developed across all the three business domains from the start.  In addition to develop-
  ing an applications architecture for the Agency's traditional database environment, the architecture planning process
  must also focus on the unique needs of geospatial information and the document management needs of the Agency. For
  each information media (database, geospatial, and document) architectural 'views' will be created.  Since much work is
  underway in all of these areas, the enterprise architecture planning effort will serve as unifying core, thus its foundation
  status. Figure-12 depicts the three dimensional inter-relationships of the various architectural components.

    These architectural layers provide a framework for identifying and developing policies and standards needed for
  Agency wide integration whether it be at the business process, data, applications, technology, or security levels.

    The rest of this section highlights a few key standards and policies that can be closely associated with the data,
  technology, and security layers of EPA's enterprise architecture
                                   Model for Information Integration — 6.0 EIA Foundation
                                                                                                      47

-------
  DATA STANDARDS
    Within the Foundation key data
  standards are identified, developed and
  implemented. As depicted in Figure-13
  data standards are developed through a
  collaborative process involving Agency
  Programs and State data partners.

    EPA, as a participant in the Environ-
  mental Data Standards Council, has
  developed and approved six key data
  standards including:

    •   Facility identification standard
    •   Biological taxonomy data standard
    •   Chemical identification data stan-
       dard
    •   Date format standard
    •   Latitude/longitude standard
    •   Standard Industrial Code (SIC)/
       North American Industry Classifi-
       cation System (NAICS) standard

    In FY'02, EPA is working to finalize
  and approve standards for:

    •   Enforcement and Compliance;
    •   Permitting; and
    •   Tribal Identifiers.

  TECHNOLOGY POLICY
                                         EPA EA Design Methodology Framework
               EPA EA Conceptual
                  Framework
                                              Strategic Architecture
                                        (Mission, Vision, Goals, Pert Measures)
                                           Federal Business Arch;
                                     
-------
                                    EPA Data Standard Process
                         Stages
                                                   Responsible Party
               Proposal
                                    |   Submit Request for Data Standard  |
               Development
Implementation
I        I-
               Review
                                                              No —
                                                       Yes
                                    |   Form Subject Matter Action Team  |

                                    [ Review Existing Standards for Adoption |
                                    |     Develop Data Standard     | .
                                     Develop Business Rules Resolve Issues |
                                                       Yes
                                        Post Business Rules in EDR
                                    | Implement Data Standard in EPA Systems |
                              update
                                              Yes
                         Figure 13: Data Standards Setting Process

The Systems Life Cycle Management Policy will define the process which Programs must follow in deploying
information technology. The SLC policy will dictate an up-front architecture check-in phase in the systems devel-
opment process to assist Programs in ensuring that their systems development projects are architecturally compli-
ant from the project initiation phase.

A Capital Planning and Investment Control Policy will codify that Agency information technology develop-
ment projects be formally reviewed prior to funding, and thus provide the Agency the ability to ensure that new
projects are architecturally compliant with the integration vision and clearly linked to Agency mission. While the
CPIC process is not new, this policy will institutionalize its enforcement. In addition, the budget threshold for
project inclusion in the process is expected to drop significantly in the next two years, thus assuring that the major-
ity of the Agency's projects are routed through the process.

Other new polices under development such as EPA Standards of Behavior for Security of Information
Resources, PDA Policy, Telecommuting, and Remote access all reflect how changing business paradigms
affect the technical infrastructure. These policies must remain current with the integration vision. For example, the
proliferation of new hand-held computing devices will dictate architectural specifications to the design of the
System of Access. The relationship of new workforce habits and new technologies, must be coordinated to reach
the integration vision. All of this must remain current with Agency policy.
                            Model for Information Integration — 6.0 EIA Foundation
                                                                                     49

-------
 SECURITY POLICY

   Key concerns around security dictate the issuance of new security policies. By design, the integration vision for the
 Agency affords the opportunity to 'design-in' security (e.g., requiring inbound data flows to come through CDX.)
 Increasing access to information must always be balanced with increased security measures.

   Recommended security policy will include:

   •  Use of CDX for inbound data flows - will allow a consistent application of security measures at the CDX portal,
      versus many Programs creating individual portals and adding diversity in the deployment of security protocols.

   •  Use of "System of Access" for accessing EPA data collections - will control and limit the various routes into the
      Agency. Of course, exceptions will occur and must be planned for.
   •  Intrusion Detection and Perimeterization policies are needed. As more and more data are available via the System
      of Access, the potential damage that someone could do increases.  Policies must be established to require imple-
      mentation of practices that alert appropriate personnel when an unauthorized person enters the Agency network,
      and once detected, we must be able to immediately stop the intruder from going any further by establishing a
      secure perimeter around the intruder (i.e., perimeterization).

 PRIMARY SUPPORTING PROGRAMS AND WORKGROUPS

   Several existing EPA Programs and workgroups support facets of the Foundation component including:

   •  Enterprise Architecture Program - OEI's Office of Technology, Operations, and Planning manages the
      Agency's enterprise architecture planning process.

   •  Technology Architecture Change Management (TACM) process - The Agency has several mechanisms in
      place to manage the deployment of new technologies into the Agency. These existing mechanisms will be utilized
      in transition to the future target enterprise architecture. Given the rapid pace of technology change, we must have
      consistent technology standards upon which to build our infrastructure and a process to update them.  The
      Agency deploys the Technical Architecture Change Management (TACM) process to conduct research on new
      technologies, and to manage the implementation of new technologies into the Agency. A significant example is the
      desktop conversion task of deciding EPA's future direction in providing word processing, spreadsheet, and
      presentation functionality to EPA employees. These IT standards are housed in the IT Roadmap, a core compo-
      nent of the enterprise architecture.

   •  Data Standards Program - OEI's Office of Information Collection manages this program which is responsible
      for leading the collaborative identification, development and implementation data standards.

   •  XML Technical Advisory Group (XML TAG) - A cross-agency ad hoc workgroup that helps set standards
      related to XML. The purpose of the workgroup is to help address issues and influence policy related to the use of
      XML for Network and non-Network exchanges of data and information.

   •  Security Program - Managed out of OEI's Office of Technology, Operations, and Planning The Technical
      Information Security Staff (TISS) defines and oversees the implementation of security policy the EPA. Through
      the recent creation of a Security Program, the management aspects of handling security will complement the
      development of a security architecture.
50                           Model for Information Integration — 6.0 EIA Foundation

-------
6.3 BENEFITS OF THE FOUNDATION (ARCHITECTURE, STANDARDS, AND POLICY)

  Up to this point, component benefits have been presented according to how each saves the Agency money, improves
the quality of data, makes data and information easier to use, and is responsive to the needs of EPA employees, as well
as external stakeholders. Clearly, the enterprise architecture, and its associated transition plan have the potential to
achieve all of these simply by the comprehensive, global nature of this activity, and the clear, deliberate linkage between
IT planning and Agency mission.

  Data standards and policies, as discussed, produce uniformity and compliance across an entity. Implementing
policies and standards that enable EPA's organizational units to work in synchrony, whether it be in areas of data collec-
tion, processing, storage, access, or use, will produce these desired outcomes. For example, standardization of the way
in which key data elements are captured in EPA systems will go along way towards improving the quality of that data
type and making it easier to use/integrate. This is not to suggest that everything be standardized, instead standards and
policies that govern IT management must be set within the overall context of overall Agency direction and the IT plan,
i.e, the enterprise architecture.

6.4 ISSUES AND NEXT STEPS

  •  Management constructs for institutionalizing the enterprise architecture planning process need to be
     established.  Recent communications from the Inspector General have pointed to weaknesses in EPA's ability to:
     implement an enterprise architecture planning process, establish sufficient authority for enterprise architecture
     approval, and effectively establish an integrated (enterprise) program management approach to our information
     systems planning. They further point out, as aa consequence of not having an enterprise architecture, our ability to
     properly secure our IT environment, or to make appropriate IT investment decisions is thus weakened. Steps are
     currently being taken to address Senior leadership steering roles and to ensure that the Capitol Planning and
     Investment Control (CPIC) process is based upon the architecture. These pieces are essential to make the integra-
     tion vision a reality.

  •  Implementation of agency-wide standards and policies more even and consistent. Standards and
     policies are the "glue" that will hold the implemented EIA together. A number of standards and policies exist at
     EPA, however their implementation status in systems and datasets is uneven across the Agency. Adherence to
     standards is still largely up to system or dataset owners. A more reliable, systematic approach is needed to make
     the use of standards, pseudo-standards, and policies more consistent across the enterprise. The bottom line is that
     in an integrated environment, where multiple sets of users are dependent on data collected from a variety of
     sources, it is imperative that standards be rigorously applied and enforced if the system is to achieve its intended
     use. Completing the EIA and using it as a policy development framework is intended to facilitate this process.
     Secondly, codifying when standards and policies take effect in Systems Life Cycle Management Policy will also
     serve to harmonize implementation of standards.

  •  Administration of Core Services. Several corporate applications are being developed (CDX, Registries,
     Enterprise Repository, System of Access) and there is no clear management plan. OEI must decide how best to
     manage their operations and maintenance being cognizant of the distinction in roles between the business process
     of managing data, and the technical management issues of supporting the technical environment and database
     environment. In the Registries and Enterprise Repository, a new paradigm of shared-data management responsi-
     bilities will exist for the data. This must involve the Programs. Yet managing the database environment may be
     best handled as an OEI responsibility. Currently, operating the technology of centralized applications happens in
     various offices within OEI, mostly outside the domain of the Office of Technology Operations and Planning.
     Clearly, a preferred future management strategy should be explored and considered.  Since many of these central


                               Model for Information Integration — 6.0 EIA Foundation                  51

-------
      services are on their third year of Systems Management Fund (SMF) funding, that discussion needs to happen in
      FY2002.

    •  Need to improve IT competencies of the Agency IT workforce. One of the core responsibilities identified as
      required by the Clinger-Cohen Act (CCA) is the insurance that agencies maintain appropriate and current IT
      competencies within the workforce. The basic architectural concepts behind the integration vision are based upon
      current industry trends in data warehousing. Yet, many of our systems managers don't have the time to keep
      current in these areas. Our approach to implementing new technologies may result in sub-optimal benefit if EPA is
      not aware of the current knowledge on a wide breath of information management issues. For example, simply
      migrating data flow formats to XML and not looking at new paradigms of information sharing opposed to 'feeding
      EPA information' is functionally 'paving the cow path' — something OMB clearly warns agencies against doing.
      The IT Workforce Development team (OTOP) should prioritize training in appropriate areas.
    •  Trust, competence, and standards of service. If system owners begin to use centrally provided data services, or
      retool their out year planning assuming a core component is there, it must be there. System owners must be
      assured of reliable, secure, and efficient service, else the option to go outside the system will always be the pre-
      ferred path. This is both in terms of systems performance and customer service.

    •  IT Sequencing Plan must coordinate with transition towards Exchange Network - Much unnecessary expense will
      occur if individual systems are retooled towards complying with new internal EPA architecture policies separately
      from when they retool information exchanges with external partners. While there will be a push in both directions,
      careful coordinated planning is recommended to minimize transition costs where feasible.

    Because of the centrality of the Enterprise Architecture and of the development of the EIA component, it is useful to
 detail some of the specific next steps required of the Enterprise Architecture Team in FY2002. These include:

    •  Baseline Architecture
      - Complete the EIA baseline architecture
      - Review and analyze the baseline architecture - ensure that the business process, data, and applications
      architecture are accurate.

    •  Technical Options Research
      - Conduct technical research on applications architecture conceptual structures.
      - Conduct research on metadata architecture strategies, and identify appropriate approach commensurate
      with evolving target architecture.

    •  Conduct technical research on system of access options.
      - Specifically research Customer Relationship Management (CRM) strategies,
      - Examine portal development options.

    •  Target Architecture Development
      - Begin EIA target architecture via series of collaborative architectural development sessions with national and
52                            Model for Information Integration — 6.0 EIA Foundation

-------
  Regional Program professionals.
- Develop the target E1A in three major phases for the duration of FY2002.
- For each phase, define the business processes and data models for the Enterprise Repository, define
  options for Program system linkage to core Registries, define sequencing plan for CDX alignment.
- Seek outside peer-review assistance.

Transition Management
- Develop a transition plan (Sequencing Plan) to move EPA from its current state to the future vision.
- Develop a change management process for the Enterprise Architecture and specifically EIA components.
- Develop a configuration management strategy for core EIA components.
- Develop a performance metrics strategy for EA/EIA implementation
- Develop OE1 assistance strategy for programmatic migration to the EA/EIA.
                          Model for Information Integration — 6.0 EIA Foundation                  53

-------
 7.0  SUMMARY OF ISSUES AND  NEXT  STEPS
 7.1  OVERVIEW

   The preceding chapters outlined a vision for the integration of EPA information resources. This model builds on the
 major investments the Agency has already made in to achieve integration, and introduces a number of new concepts and
 mechanisms that have significant implications for the organization, management, and funding of the Agency's information
 activities.

   This document builds on the Agency's existing commitment to a Central Data Exchange to meet a variety of Agency
 and data partner needs through participation in the National Environmental Information Exchange Network. It goes well
 beyond the concept of registry as embodied in the Facility Registry System and Substance Registry by proposing a
 system of linked accessible registries in a new Agency resource called the Enterprise Repository. The Enterprise
 Repository also contains an Agency Data Warehouse, which while leveraging the functionality and expertise of
 Envirofacts, significantly extends the centrality and scope of this storage and access mechanism.

   This vision explicitly addresses how information is used to meet business needs by proposing decision-support tools
 as a distinct information management function. It links quality and consistency concerns with this emphasis on decision
 support by proposing that the tools associated with this function be directed to the Agency data warehouse, not pro-
 grammatic databases.

   This vision builds on our existing innovative Web site services and desktop functionally by proposing access, internal
 as well as with the public and data partners, as a major objective and function of an integrated architecture. Finally, this
 vision proposes a significant change for Program and Regional systems within an integrated Enterprise Architecture.

   As noted in the individual chapters, the success with which this vision is implemented within the Agency depends on
 the ability of EPA decision-makers to address and resolve a number of key issues. The following list contains those
 issues that are common across most of functions and components proposed in this document for EPA's Environmental
 Information Architecture:

   1. Agreement on the nature and scope of integration components. As summarized above, this document
      proposes both new mechanisms for integration and extends existing mechanisms in important ways. In each case,
      senior management acceptance of, and willingness to work for detailed design and implementation of these mecha-
      nisms is critical for success. This must start with agreement of the nature and scope of each of these mechanisms
      including agreement on the implications for existing systems and for modernization efforts.

   2. Agreement on general timeframe for implementation and transition. EPA senior managers need to agree
      on an overall time frame for implementation and on separate timelines for progress on each of the major compo-
      nent area. This commitment will ensure sustained progress on multiple components of this vision. Closely linked to
      agreement on the time frame for implementation is agreement on how the transition from our current operations to
      the target Enterprise Architecture will occur. This includes collaborative development of outcomes and milestones
      to track progress. For example, it is difficult to track progress across the core components against a common set
      of Programmatic Systems.

   3. Agreement on Governance and stewardship functions. This vision assumes that responsibility, stewardship
      and control can be exercised separate from direct management of each information processing step. It assumes as
      well that Program business needs can be met effectively by Agency-wide services. Neither assumption sits well
      with a parochial view of the world. This vision will require active effort to develop organizational arrangements
54
Model for Information Integration — 7.0 Summary of Issues and Next Steps

-------
    that work adequately from the outset and can be quickly and smoothly adjusted to reflect experience gained and
    changing circumstances. Agreement on these organizational arrangements by senior managers and sustained effort
    on their part to implement these within their Programs and Regions is central to the credibility of the overall effort.

  4.  Agreement on Resources Issues. The transition to core component services like the Central Data Exchange
    or an Enterprise Repository has the potential to make more efficient use and extract greater value from limited
    resources. However, the transition also raises three important resource issues. In order to achieve an integrated
    IT environment, a resourcing strategy is needed to sustain it.

    •  Funding Core Components. Many of the primary supporting projects in this model are investments
       funded through OEI base budget, Agency Integration funds, and Systems Modernization fund. As these
       core components take on more customers, add services, and mature into an operations & maintenance,
       they may require a modification or concomitant adjustment of funding sources. Working Capital Fund
       stands as the current solution.

    •  Component Service Disincentives. During the transition to core component services, some Programs
       will have to fund "legacy processes," as well as pay for the use of new core component services, and
       provide staff to help negotiate the transition.  These circumstances make transition to core component
       services a challenging prospect. A budget and resource strategy should recognize these circumstances
       and find ways to encourage the transition and make it more equitable.

    •  Tracking IT Costs. The same IT functions presented in this document (Collection, Process & Stage,
       Storage, and Decision Support) are generally carried out in association with each independently man
       aged system. The discrete costs associated with these functions, however, have never been explicitly
       tracked and are likely to be bundled under each system's operation and maintenance costs. Identifying
       these costs would enable objective cost/benefit analyses for integrating with core component services.
       These discrete costs would also enable benchmarking of what it will take to design, implement, and
       sustain core component services.

     •  New Costs. New costs, for example to sustain an enterprise portal or decision support services, are
       implied by this model. These new costs will result both from the need to expand functionality of exist
       ing systems (e.g., CDX) as well as the need to create new mechanisms for integration (e.g., the System
       of Access) Once expanded or created, these Agency resources will  need to be operated, maintained, and
       tracked.

7.2 MANAGEMENT ACTIONS

  To address the issues raised above, a number of management actions must be undertaken. Listed below are the
major steps required, each of which may entail several planning and implementation activities.

  •  Determining the process by which each of the issues noted above will be addressed and the timeframe within
    which this should occur.
  •  Finalizing the management plan for internal EPA integration and obtain Agency agreement on it. (The plan should
    address the specific issues and products noted in this section)

  •  Finalizing how the integration effort will be managed within OEI and across the Agency. This includes the critical
    need to address how governance and stewardship will be exercised
                 Model for Information Integration — 7.0 Summary of Issues and Next Steps               55

-------
      Completing the Enterprise Architecture and the Agency-wide plan for transition to it.

      Several information management activities require some form of senior management stewardship: integration/
      architecture, quality assurance, geospatial information management, and administrative systems architecture to
      name a few. Senior management must develop a more streamlined, holistic approach to more efficiently govern
      these efforts and to encourage connections, where appropriate.
56                   Model for Information Integration — 7.0 Summary of Issues and Next Steps

-------
Active Metadata — Metadata used by online tools to perform operations, such as validating fields mentioned in
queries.

Decision Support Tools — Any device which enables analysis of many units to aid in learning, discovery, and prob-
lem solving.

Application—The term application is a shorter form of application program. An application program is designed to
perform a specific function directly for the user or, in some cases, for another application program. Examples of appli-
cations include word processors, database programs, Web browsers, development tools, drawing, paint, image editing
programs, and communication programs.

Business Intelligence — A series of components that capture organizational data from disparate sources and presents
it to users in a user-friendly format.

Centralized Data Registries—EPA's core "registry" systems: Facility Registry System (FRS), Terminology Registry
System (TRS), Substance Registry System (SRS), Environmental Data Registry (EDR), and Environmental Information
Management System (EIMS).

Component—A technology, policy, plan, or service that enables more efficient and effective management of informa-
tion resources and supports EPA's participation in the National Environmental Information Exchange Network

Data—In science, data are a gathered body of facts. In computing, data are information that has been translated into
a form that is more convenient to move or process.

Database—A database is a collection of data organized so that its contents can easily be accessed, managed, and
updated. The most prevalent type of database is the relational database, a tabular database in which data are defined so
that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dis-
persed or replicated among different points in a network.

Datamart—A subsets of a data warehouse customized to address a business need and support a specific tool.

Data Schema—The structure of data in a database. Usually indicated by showing data fields, field formats, field
attributes, and their relationships to each other.

Dataset—Named set of data with formatting (data schemas) and content.

Data Staging—Data cleanup, data schema extraction, and loading of the dataset into an Enterprise Repository
database.

Data Warehouse—A persistent collection of summary and detailed data whose data schemas are fully coordinated.

Data Warehouse Services — A centrally coordinated set of databases that enables integration and standardization of
cross-media, cross-program data for environmental analysis and reporting.

Enterprise Repository — Services providing access (including queries) to datasets accessible by a common access
mechanism, and whose requirements for data integration meet the more stringent requirements of a data warehouse.
                              Model for Information Integration — Appendix A
57

-------
  Enterprise Technical Architecture—A comprehensive series of principles, guidelines, diagrams, and standards that
  enable an organization to align the acquisition, development, and coordination of its information technology (IT) assets
  with its business goals and functions

  EPA Enterprise Portal—An interface through which people and organizations access electronically the EPA and its
  services.

  EPA Environmental Information Architecture (ELA)—The collection of services, processes, data, and infrastruc-
  ture supporting EPA internally and its external stakeholders.

  Exchange Network Services—Provides the operational capabilities required for participation in the Exchange
  Network

  EPA Node—The collection of services, processes, data, and infrastructure supporting EPA services for the Exchange
  Network.

  Foundation—A collection of services and policies needed to implement/maintain the EIA. For example, it addresses
  security planning, use and management of metadata, data standards, use of XML, and consistency with EPA's enterprise
  architecture.

  Information—Information is data presented to meet user expectations. Data presentation must be user friendly and
  impart some meaning to the data.

  Holdings—Collection of what parts comprise the EIA, and also a directory to all items in which users of the EIA are
  interested.

  Integration—The unification of processes, data, applications, and technology, either logically, physically, or in
  combination to achieve efficiency and more effective use of information and technology.

  Metadata — Data about data.

  Metadata and Holdings Catalog — Contains information about EPA's data assets. Provides a single source for
  tracking metadata

  Node (Exchange Network)—A participant's single, managed portal for providing and receiving information via the
  National Environmental Information Exchange Network.

  Node Catalog (Exchange Network) — Information and associated network metadata (e.g., trading partner agree-
  ments, description of the information) available at an Exchange Network node.

  Passive Metadata—Metadata, such as documentation, is not used directly by tools. It is usually for human refer-
  ence.

  Platform—The operating systems, database management systems, and/or computers undergirding an application.

  Policy—A statement that is binding on entities within its scope.

  Portal — (1) a heterogeneous set of services available at a Web site; (2) a gateway to services
58                               Model for Information Integration — Appendix A

-------
Process — A baseline set of steps to accomplish a task or perform a service

Program/Lab/Regional Systems—Those systems used by individual Program Offices, Laboratories, and Regions to
accomplish their particular missions or goals.

Pseudo-standard —A convention that has evolved into use at EPA, but which has yet to receive official approval by
EPA.

Registry — (1) An official and authoritative list of specific, well-defined items of interest to an organization; (2) A
specially compiled index for finding related items grouped by subject area; (3) A source of metadata.

Registry Services—Provide services for accessing and using authoritative lists of names or identifiers.

Service—A conceptual capability provided by an entity.

Staging—Preparing data for loading into a repository or dataset. Data are formatted consistently, content corrected,
and metadata extracted. Usually this is part of data quality engineering.

Standard — A set of criteria, or a convention.

System—An actually implemented entity.

System of Access — A tool that will allow customers (Public, Partners, and EPA Employees) to locate and access to
Decision Support Tools according to their authorization level

System of Registries — See Centralized Data Registries.

Tool—A device or capability that aids in accomplishing a task. In this document, tools are used synonymously with
"application."
                              Model for Information Integration — Appendix A                           59

-------
 "America the Unready." The Economist( 1). 22 December 2001:25.

 Carr, Judith. AIT Governance: Models for E-Government.@Gartner Commentary on Government Gartner RAS
 Services. Online. GartnerGroup, Inc. 18 Oct 00.

 Clinger-Cohen Act of 1996 (formerly, Information Technology Management Reform Act [ITMRA]), Public Law 104-
 106.10 February 1996.

 Cook, Melissa. Building Enterprise Information Architectures. Reengineering Information Systems.
       Upper Saddle River, NJ: Prentice Hall PTR, 1996.

 Drucker, Peter, F. The Essential Drucker, Selection of Management Works of Peter Drucker. New York, NY: Harper
 Collins Publishers, 2001.

 Federal Architecture Workgroup (FAWG). Enterprise Interoperability and Emerging Information      Technology
 Committee of the Federal Chief Information Officer Council. A Practical Guide of Federal Enterprise Architecture
 Version  1.0. Washington, DC: February, 2001.

 Forman, Mark. "Achieving the Vision of e-Government." Quicksilver Task Force Meeting. Washington, DC. 03
 August 2001.

 Inmon, W.H., Imhoff, C, Sousa, R. Corporate Information Factory (2nd ed.), New York: John Wiley and Sons, 2001.

 Kimball, Ralph, Reeves, L. Ross, M. and Thornthwaite, W. The Data Warehouse Lifecycle Toolkit, New York: John
 Wiley and Sons, 1998.

 Lal,Rashmi. Email description of TRI Explorer. 27 August 2001.

 Merriam Webster's Collegiate ® Dictionary.  Entry for, "integration." From Merriam Webster Online ©2002 by
 Merriam-Webster, Incorporated publisher of Merriam-Webster ® dictionaries.

 Microsoft, Inc. Business Intelligence. How Agencies Can Breathe New Life Into Old Data. Online Article at http://
 www.microsoftgovemment.com/bi/. As of 24 January 2002.

 Phifer, G. Berg, T. "Portal:" The Most Abused Term in IT. Gartner Research Note. 25 September 2000:1.

 Petruccelli, Kathy. "QIC Investment Subcommittee Update." Quality and Information Council Meeting. Washington,
 DC. Ronald Reagan Building, 08 August 2001.

 Sowa,J.F. andJ.A. Zachman. 1992. Extending and Formalising the Framework for Information Systems Architec-
 ture, IBM Systems Journal, Vol. 31, No. 3.

 Spewak, Steven.  Enterprise Architecture Planning. Developing a Blueprint for Data, Applications, and Technology.
 New York, NY: John Wiley & Sons, Inc, 1992.
60
Model for Information Integration — Appendix B

-------
SRA International, Inc. Blueprint for EPA Information Integration. Deliverable 3.3 under the Information Infrastructure
and Architectural Support Contract (68-W-99-038), Work Assignment #046.30 November 2001.
—.(2) Risk Assessment for the Interim Central Data Exchange (CDX) Facility, Deliverable 2-2, Work Assignment
Number 041,19 April 2001.

State/EPA Information Management Workgroup (IMWG). Blueprint for a National Environmental Information Ex-
change Network. 30 October 2000.

State/EPA Interim Network Steering Group (INSG). Implementation Plan for National Environmental Information
Exchange Network. (Draft). 11 January 2002.

Sullivan, John. Durman, Gene. "Proposed Target Architecture." Presentation at EPA IIP Retreat. Arlington, VA,  26
June 2001.

"Timely Technology." The Economist (2). 02 February 2002:5.

U.S. Environmental Protection Agency (EPA), Agency Information Integration: FY 2002 Priorities. Quality and Infor-
mation Council Briefing & Discussion. Washington, DC. Ronald Reagan Building, 17 October 2001.

—. (2) Business Operations Supported by Geospatial Tools, White Paper, 2001.

—. (3) The Costs and Benefits of the National Environmental Information Exchange Network. Phase One: Preliminary
Assessment of the Central Data Exchange and Selected Flows. Washington, DC: Office of Environmental Information,
November 2001.

—. (4)Data and Information Quality Strategic Plan. (Draft). Washington, DC: Office of Environmental Information,
January, 2002.

—. (5) Enterprise Architecture Planning Process: FY2002 Project Management Plan. Office of Environmental Infor-
mation, (Draft) 15 November 2001.

—. (6) Enterprise Architecture. Submission to U.S. Office of Management and Budget. Office of Environmental
Information, 29 March 2001.

—. (7) Environmental Indicators Initiative. Memorandum. Washington, DC: Office of the Administrator, 13 November
2001.

—. (8) FY 2001 Information Integration Initiative Management Plan. Washington, DC: Office of Environmental Infor-
mation, 07 December 2001

—. (9) Geospatial Baseline Assessment. Washington, DC: Office of Environmental Information, 29 June 2001.

—. (10) Information Agenda. Washington, DC: Office of Environmental Information, (Draft) 16 January 2002.

—.(11) Information Integration Program Management Plan. Washington, DC: Office of Environmental Information,
'Draft) 31 January 2002.

	. (12) Information Quality Vision. Washington, DC: Office of Environmental Information, (Draft Presentation,
Date?

                             Model for Information Integration — Appendix B                          61

-------
 	. (13) A Management System for Information Quality. Washington, DC: Office of Environmental Information, (Draft
 White Paper), Date?

 —. (14) Metadata Strategy for the Environmental Protection Agency. (Draft) Washington, DC: Office of Environmental
 Information, 31 July 2001.

 —. (15) An Overview of EPA's System of Registries, (Draft white paper), August, 2001

 —. (16) Personal Use of Agency Equipment, 1998. (Policy) http://intranet.epa.gov/rmpolicy/im/equipuse.htm, as of 19
 January 2002.

 —. (17) Public Access Strategy. (Draft) Washington, DC: Office of Environmental Information, 24 January 2002.

 —.(18) REI: A Plan for Change-Our Commitments (Action Plan)02 December 1997
        http://www.epa.gov/reinvent/onestop/acplan/plan.htm as of 18 January 2002.

 —. (19) System Life Cycle Management, 1994. (Policy) http://www.epa.gov/irmpoli8/pohnan/, as of 19 January 2002.

 —. (20) Window to My Environment: A "Geographic Portal" to Community-Based Environmental Information, White
 paper, August, 2001.

 U.S. Office of Management and Budget, Executive Office of the President. Guidelines for Ensuring and Maximizing the
 Quality, Objectivity, Utility, and Integrity of Information Disseminated by Federal Agencies. (Final). Washington, DC: 28
 September 2001.
62                               Model for Information Integration — Appendix B

-------
  What Other Registries Are Needed (EPA(15), 2001)

  The list below provides some suggestions for types of registries that might be useful in supporting EPA's business
needs. Theses registries could be developed, in many circumstances, from information in part already present in OEI
systems. From an information management perspective, these may not necessarily be separate, standalone applications
but could be different views of common, shared information maintained in a central location as part of the Centralized
Data Registries.

  Business Sector Registry

  A sector registry is envisioned to provide a Standard Industrial Code (SIC)/North American Industry Classification
System (NAICS) crosswalk in support of the migration of Agency data from SIC to NAICS codes in response to the
Agency's SIC/NAICS data standard.  This registry would be based on the North American Industry Classification
System  (NAICS) and the Standard Industrial Code (SIC) code set. The registry would support analysis of EPA data
by industrial sector and would support anticipated further specialization of NAICS values.  This registry could link to
the FRS and the Organization Registry.

  Regulation Registry

  This registry is envisioned to provide a crosswalk between statutes and regulations, based on data elements included
in the draft Enforcement/Compliance data standard. This authoritative listing of laws and regulations of interest to EPA
would support any data standard or program system dealing with enforcement/compliance issues. This registry could
link to the SRS. Much of the content required to initially populate this registry is currently stored in the TRS and the
SRS.

  Geopolitical Registry

  This registry could store location information including counties, states, townships, tribes, and congressional districts,
that would support maintenance of location information in most EPA program databases. This registry could link to the
FRS, the Place Registry, and the Organization Registry.

  Organization Registry

  This registry would provide an authoritative list of organizations within EPA (Regions, offices, laboratories) and
States, other federal agencies, tribes, and corporations that are doing business with EPA. This registry would have
linkages to most other registries, as it could store information on data stewards and submitting organizations participating
in the Exchange Network, as well as corporations associated with facilities in the Facility Registry System. Many of
these organizations are currently included in the EDR registries.
                               Model for Information Integration — Appendix C
63

-------
 APPENDIX D — EPA'S NATIONAL "REINVENTING ENVIRON-
 MENTAL INFORMATION (REI)" SYSTEMS
   In 1998, EPA identified 13 national systems below as priorities for re-engineering efforts including data standards
 implementation and modernization to accept electronic reports (EPA( 18), 1998). Building on progress made with
 these systems the EIA will continue to use them as a foundation for further integration design and planning.

   •  AIRS Air Quality System (AQS)
   •  AIRS Air Facility Subsystem (AFS)
   •  Permit Compliance System (PCS)
   •  SRMP (System for Risk Mgmt Planning)
   •  Biennial Reporting System (BRS)
   •  RCRAInfo
   •  CR-ERNS (Continuous-Emergency Response Notification System)
   •  Safe Drinking Water Information System (SDWIS)
   •  Toxic Release Inventory System (TRIS)
   •  Water Quality Information System (STORET)
   •  National Compliance Database (NCDB)
   •  OECA Docket (DOCKET
   •  Envirofacts
64
Model for Information Integration — Appendix D

-------