OfFKEOF ENVIRONMENTAL INFORMATION Environmental Information Integration Requirements Technical Evaluation of Alternatives Development Plan Revised January 30, 2007 ------- Table of Contents Executive Summary 1 1. Requirements for Environmental Information Integration 4 1.1. Methodology 4 1.2. Findings 5 1.2.1. The strategic context for OEI environmental information services 5 1.2.2. What interviewees told us 7 1.2.3. Supply-driven versus demand-driven information management 9 1.3. Recommendation 11 2. Technical Evaluation of Information Integration Alternatives 12 2.1. Services-oriented Architecture (SOA) 12 2.2. Data Warehousing 15 2.3. Enterprise Architecture 16 2.4. IT Governance 17 2.5. Program Management Office (PMO) 19 3. Development Plan for Information Integration 21 3.1. Option 1: Maintain Status Quo 21 3.2. Option 2: Improve Governance 22 3.3. Option 3: Governance with Top-Down Information Needs 23 3.4. Option 4: Governance with Bottom-Up Information Services 24 3.5. Recommendation 24 3.6. Next Steps 25 Appendix A Interviewees 27 Appendix B Information Needs Areas 29 Appendix C Documents Reviewed 34 ------- Executive Summary EPA's Office oflnformation Analysis and Access (OIAA) within the Office of Environmental Information (OEI) engaged DecisionPath Consulting to identify the business requirements for integrated environmental information and make recommendations for how OEI can more effectively manage information, better serve its customers, and further its mission. To accomplish this effort, we interviewed 39 EPA employees, primarily from OEI, and reviewed a number of documents about EPA's mission, strategic plan, management challenges, enterprise architecture, and related business and IT topics. Major Findings Both the EPA employees with whom we spoke and the documents we reviewed depict OEI as an organization at a crossroad. Since its inception in 1999, OEI has focused on three major services. The first service is IT infrastructure and processing for the agency. The second is to implement a common mechanism for data intake from EPA partners - the National Environmental Exchange Network and its supporting structures, such as the System of Registries. The third is a supply-driven approach to providing information to both internal customers and the public. OEI must continue these efforts. However, in order to make a larger contribution to the agency and thereby ensure future funding, it must provide additional customer-oriented information services. Both interviewee comments and reviewed documents indicate that Neither OEI's internal customers nor its employees have a clear understanding of its future direction. OEI has no mechanism by which it engages with its customers to understand their information requirements and define information services that have value to them. OIAA collects data mainly to satisfy statutory requirements. It then uses a supply-driven approach to leverage that data for additional purposes and for additional customers. This supply- driven approach, which entails making the collected data available via self-service query tools, satisfies the needs of some customers. However, it does not adequately fulfill specific requirements for strategic business information. IT governance has focused on satisfying the requirements of environmental statutes and on 1T- centric activities such as enterprise architecture. OEI lacks an effective IT governance mechanism to help it prioritize and fund IT investments based on business value. OEI Strategic Context Since its formation in 1970, the Environmental Protection Agency (EPA) has had a decentralized organization of media-specific program offices and geographic regions. Political appointees head both the program offices and regions. Each program office has its own appropriations and its own portfolio of environmental statutes to execute. Historically, each program office and region had its own IT department that developed and managed its own information systems. In 1999, many but not all IT functions were centralized in a new organization, the Office of Environmental Information (OEI). To-date, OEI has focused on providing IT infrastructure and operations, and on implementing the National Environmental Exchange Network (and supporting structures, such as the System of Registries) as a common data intake mechanism by which EPA partners submit statutorily-required data to EPA. January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 1 ------- These efforts support the collection and processing of data for its primary purpose: satisfying the requirements of environmental statutes. Leveraging the collected data for additional purposes and to serve additional customers is an OEI challenge that falls primarily to OIAA. Examples of these additional purposes and customers include assessing the state of the environment, measuring environmental progress, and providing environmental information to the public. OIAA uses a supply-driven approach to information management: the data collected to satisfy statutory requirements is made available for additional purposes and to additional users largely via the self-service mechanism of simple query tools. EPA's 2006-2011 Strategic Plan articulates what the agency will do over the next five years. The guidance it provides regarding what EPA considers important represents an opportunity for OEI to focus beyond its current services to provide high-value additional information services to the agency. IT investments at EPA are governed by the requirements of the various environmental statutes EPA executes, as well as by external mechanisms such as the Clinger-Cohen Act of 1996, which mandated a CIO, the Federal Enterprise Architecture, and the Capital Planning and Investment Control (CPIC) process. OEI has been working for several years to construct and implement an enterprise architecture (EA) for EPA. As part of the EA, OEI intends to base EPA's application architecture upon Web services and services-oriented architecture (SOA). What Interviewees Have Told Us We interviewed 39 EPA employees, two-thirds of whom were from OEI and the rest from program offices, regional offices, and administrative offices. From OEI employees, non-OEI employees, and reviewed documents, consistent perspectives came to light about OEI, how it engages with and serves its customers, data integration, information management, IT governance, and other topics. Three key points emerged: Beyond the IT infrastructure that it provides, OEI's value proposition to the agency is not well understood by either its customers in the program offices and regions or by OEI employees. In order to move forward, OEI must clearly articulate how it supports customer information needs and it must communicate its roadmap for providing future value-added information services to those customers. The fundamental purpose of OEI's activities (and, in fact, of all EPA activities) is to support the agency mission to protect human health and the environment. OEI does not currently have an effective governance structure that evaluates and recommends potential information service investments based on their relative contribution to supporting that mission. OEI has a concept of providing a portfolio of Web services using SOA, but it lacks an effective process by which to define, prioritize, design, and govern such a portfolio. Recommendations and Next Steps OEI has a vital mission: the creation, management, and use of information as a strategic asset at EPA. The road to an expanded OEI future and achievement of that mission is clear: in addition to continuing its current services (IT infrastructure and processing, ongoing work to improve data intake from partners, and information delivery via a supply-driven approach), OEI must engage with its customers to understand their business requirements for information and then deliver information services that target January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 2 ------- those requirements. DecisionPath characterizes such an approach as demand-driven: business requirements "pull" the creation of information services targeted to fulfill those requirements. DecisionPath evaluated four options for whether and how OEI could adopt the demand-driven approach to information management and more effectively govern its activities in order to deliver additional business value to the agency. (See section 3 for a discussion of the four options and the advantages and disadvantages of each.) Because there is no precedent for wholesale adoption of the demand-driven model, we recommend that OEI take an incremental approach to demand-driven information services by beginning with a prototype. Specifically, we recommend that OEI take the following actions: Demonstrate the demand-driven approach though a carefully-selected prototype with a single EPA component. The first step is to select an appropriate project (with a receptive EPA functional organization) for the demand-driven prototype. Expand the IT governance role of the OEI program management office (PMO) to include development of a multi-year program plan of IT investments, based on a portfolio of investment opportunities; identification (or collection) of the investment opportunities that make up the portfolio, including specification of the potential business value and implementation risks of each; and monitoring approved and completed projects for achievement of their projected business value. Apply the demand-driven approach to public information services to deliver supply-driven content with demand-driven information presentation. Develop a program plan for a multi-year effort to extend and enhance the public's access to environmental information. Section 3.6 outlines a series of next steps for OEI to begin to execute these actions. They are the first step toward OEI playing a larger role within EPA and assuring future funding by providing additional information services and incremental business value. January 30,2007 Environmental Information Integration - Final Deliverable (revised) Page 3 ------- The Statement of Work for this engagement requires DecisionPath to produce three final deliverables: requirements for environmental information integration, technical evaluation of information integration alternatives, and a development plan. This consolidated document contains a section for each of the three deliverables. The Requirements section describes the methodology we used, the strategic context for OEI services, a summary of what interviewees told us, and compares OEI's current supply-driven approach to information management with our recommended demand-driven approach. The Technical Evaluation section discusses a number of data architectures, data integration approaches, and IT management techniques, specifically: services-oriented architecture, data warehousing, enterprise architecture, IT governance, and program management office. It describes how OEI currently applies these architectures, approaches, and techniques, how it might extend their use, and situations in which they are and are not appropriate. The Development Plan section considers four options for how OEI might move forward, including the advantages and disadvantages of each, and recommends the option DecisionPath believes OEI should pursue. 1. Requirements for Environmental Information Integration 1.1. Methodology DecisionPath consultants interviewed 39 EPA employees about the business requirements for integrated information, current methods of integrating data, and related topics. These employees were primarily from OEI: they ranged from technical staff to the Acting CIO. Appendix A contains a list of employees interviewed. In addition to these requirements-gathering interviews, we reviewed over 100 EPA documents, including: 2006-20J1 EPA Strategic Plan: Charting Our Course EPA 's FY2005 Performance and Accountability Report EPA Draft Report on the Environment Other pertinent documents Appendix C provides a complete list of the documents we reviewed. January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 4 ------- 1.2. Findings 1.2.1. The strategic context for OEI environmental information services Environmental Protection Agency (EPA) EPA was created in December 1970 from pieces of numerous federal agencies, including the Departments of the Interior; Agriculture; Health, Education, and Welfare (now HHS); the U.S. Atomic Energy Commission; and others. EPA has always been organized by program (media). Each program office has always had its own set of environmental statutes to execute and its own appropriations. The 10 EPA regional offices have also existed from the beginning. Both the assistant administrators in charge of the program offices and the regional administrators are political appointees. Office of Environmental Information (OEO OEI was formed in October 1999 by centralizing various IT functions. Requirements of the Clinger- Cohen Act of 1996 and other legislation were motivators for this centralization. The formation of OEI might also have been partially in response to critical reports about EPA IT issues (ineffective information management, lack of data integration, poor data quality, etc.) by GAO and other oversight entities. EPA Strategic Plan 2006-2011 EPA Strategic Plan: Charting Our Course describes how EPA will accomplish its mission to protect human health and the environment. It outlines five goals, each with a number of supporting objectives. Four of the five goals are media-specific, but the Strategic Plan also contains a number of cross-goal strategies. The cross-goal strategy "Results and Accountability" provides strategic context to OEI in the following areas: Assessing the state of the environment and measuring progress. Making information more accessible. EPA will focus on four major areas: Analytical capacity Governance Excellence in information service delivery Innovation in information management Integrating budget and performance information. Key Management Challenges Each year, the EPA Inspector General submits to the Administrator a memorandum outlining the agency's key management challenges. Two of the 2006 key management challenges deal with data: (1) data gaps, and (2) data standards and data quality. To quote the memorandum, "If EPA is to manage for results, it needs to decide what environmental and other indicators will be measured; provide data standards so that organization responsible for delivering environmental programs are measuring what is important and are using common definitions; and ensure that data are of sufficient quality for effective decision making." January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 5 ------- Another key management challenge is "EPA's Use of Assistance Agreements to Accomplish its Mission." EPA spends more than half of budget on assistance agreements, which "are a primary means EPA uses to carry out its mission." The assistance agreements are with the partners (states, tribes, and others) to whom EPA has delegated the execution of its programs. With so much of its work done by these partners, their participation with EPA in data quality is critical. EPA will spend approximately S600 million per year for systems development and maintenance in both FY2006 and FY2007. This spending comprises many systems and projects, ranging in size from less than $100,000 to $25 million. Every year since 1996, OIG has listed "Information Resources Management" or a variant as a key management challenge. Despite all the IT governance structures and mechanisms mandated by the Clinger-Cohen Act, effective governance of IT investments to realize business value remains a challenge for EPA. Research Orientation A significant portion of EPA employees are scientists, and EPA is a nexus for environmental research. This cultural orientation toward scientific research influences the EPA approach to providing information. Typically, a scientific researcher looking for information: Is well educated and computer literate. Obtains his or her information personally (that is, does not have a support staff, except perhaps junior researchers responsible for their own parts of the project). Has expert knowledge of the data he or she is interpreting. Has at least a general understanding of the context surrounding the data. Wants to see all available data on a topic; does not want the data filtered. The scientific researcher as an information consumer is well served by a self-service information delivery model in which all data is available via simple query tools. Services-oriented Architecture OEI uses services-oriented architecture (SOA) for the National Environmental Exchange Network, and the Acting CIO desires to use SOA as the fundamental mechanism for provision of all information services. Summary of Strategic Context for OEI Environmental Information Services The strategic context in which OEI operates both provides direction and constrains what it can do. As detailed above, this context includes: EPA culture of stovepiped media-specific program offices, with power and funding decentralized in the program offices and regions Cross-goal strategy to assess the state of the environment and measure progress Cross-goal strategy to make information more accessible Cross-goal strategy to achieve budget and performance integration Key management challenge to improve data quality Key management challenge to work more effectively with partners Key management challenge to more effectively govern information management Cultural orientation toward a scientific researcher model for information delivery Desire to use services-oriented architecture (SOA) as extensively as possible January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 6 ------- 1.2.2. What interviewees told us DecisionPath's initial intention in the requirements-gathering interviews we conducted was to understand and document the business case for data integration at EPA. Over the course of the engagement, our emphasis shifted more toward IT governance. The EPA employees we interviewed spoke on a variety of IT-related topics, including: The degree of OEI's engagement with its internal customers about their information needs Data quality, data standards, data stewardship, and metadata management Data integration - how much there is, and how much is needed From our interviews, we captured a number of quotes that provide a picture of the current EPA situation as perceived by the employees we interviewed. The comments below are paraphrased, and are not identified by specific employee. Comments about OEI's level of engagement with its customers: OEI leadership should engage their customers. OEI is far from the business. If [OEI] wants to know what decision makers need, they should talk to them, not to data providers. OEI should participate more actively in [program office's] business. [EPA manager] questions how well OEI understands what is currently going on at EPA. Even where data is available and unambiguous, it is usually not wanted in a standalone form. Politicians and scientists want the context. Comments about data quality, data standards, data stewardship, and metadata management: In order to get the information we need, the most critical problem is the quality of the existing data. The main challenge today is data quality. Data quality assessment is vital. EPA does not do data stewardship well. There is a huge need for metadata. There are still a lot of data gaps. Adequate data standards exist but their implementation is inconsistent. January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 7 ------- Comments about data integration, or the lack thereof: The key is to make all the information come together at the right place and time. EPA can't tell the public as much as it wants to because it lacks the means to add up the data. EPA uses a lot of manual effort to do integration that should be automated. The data doesn't have to be integrated, but it does have to be integratable. Comments about the problems associated with not having integrated data: For Katrina, the process of moving analytical data from the collection point in the Meld to EPA headquarters was very slow. The agency could unite around problems faster with better information sharing. The EPA has so much data and shares so little of it. Business Questions and Information Needs Areas DecisionPath began this engagement anticipating that our requirements-gathering effort would entail interviewing primarily information consumers - that is, representatives from the program offices and regions. As it turned out, most of the EPA employees we interviewed were from OEI. Typically, our requirements-gathering process yields a list of business questions detailing the information the business users and decision makers want to be able to answer. We then group like business questions into information needs areas. Because OEI uses a supply-driven approach to information management (see Section 1.2.3 for an explanation of the supply- and demand-driven approaches), it is not accustomed to collecting business requirements for information and responding with systems to provide that information. Nevertheless, our interviews did yield a number of business questions, which we grouped into five information needs areas. These business questions and information needs areas are an example of the demand-driven approach to information management. See Appendix B for the information needs areas and the business questions that comprise them. Because these business questions came from IT people relating what information they believe the business users want, they should be validated directly with business users before being acted upon. Summary of Interviewee Comments The input we obtained from interviewees and the documents we reviewed coalesced around several key points: Beyond the IT infrastructure that it provides, OEI's value proposition to the agency is not well understood by either its customers in the program offices and regions or by OEI employees. In order to move forward and be more successful, OEI must clearly articulate how it supports customer information needs and communicate its roadmap for providing future value-added information services to those customers. OEI has a concept of providing a portfolio of Web services using SOA, but it lacks an effective process by which to define, prioritize, design, and govern such a portfolio. January 30,2007 Environmental Information Integration - Final Deliverable (revised) Page 8 ------- The fundamental purpose of OEI's activities (and, in fact, of all EPA activities) is to support the agency mission to protect human health and the environment. OEI does not currently have an effective governance structure that evaluates and recommends potential information service investments based on their relative contribution to supporting that mission. 1.2.3. Supply-driven versus demand-driven information management The Current Supply-driven Approach OEI's approach to providing information services can be characterized as supply-driven. The data required by the various environmental statures is collected from EPA partners via the Exchange Network and stored in program-specific databases and systems. These program-specific databases and systems fulfill the statutory reporting requirements. Although some of these systems run on OEI's IT infrastructure, for the most part, they are developed, maintained, and managed by program office and regional IT organizations rather than OEI. The Office of Information Analysis and Access (OIAA) within OEI is responsible for enabling the use of this data for additional purposes, especially analytical and cross-media purposes, and by additional audiences, such as the public. OIAA's approach to enabling these additional uses is based on the scientific researcher model described earlier in section 1.2.1. It essentially provides data under the motto "Here's the data we have: come get what you need." Events in the larger political environment within which EPA operates can cause frequent and rapid changes in information needs and priorities; the supply-drive approach to information services has the advantage of being very flexible. The data is made available to the user in fairly raw form (as opposed to being structured for a specific analytical purpose), and the user structures it according to his or her needs. That's the downside of the supply-driven model. The user must Obtain the raw data via a query. Integrate it. Interpret it and add context to it, in order to turn the data into useful information. The supply-driven model might serve the needs of a researcher performing pure science, but much of what EPA does consists, instead, of business operations. For example, determining how best to allocate EPA's finite resources to achieve the maximum environmental benefit is an analytical business process, not pure science. Many such analytical business processes take place within individual programs, within program offices, within regions, and at headquarters for the agency as a whole. Business operations, especially analytical and decision-making operations, are not well served by the supply-driven model. The Demand-driven Approach An alternative to the supply-driven approach can be characterized as demand-driven. In this approach, data is transformed into information and packaged and presented for a specific mission support, policy- making, decision-making, or some other business purpose. The demand-driven approach to information management is driven by the business's requirements for specific information. January 30,2007 Environmental Information Integration - Final Deliverable (revised) Page 9 ------- In EPA's case, the agency mission and strategic plan provide multiple instances of high priority demands for information. The need to assess the environment and measure progress, the needs of agency decision makers for specific analyses, the need to improve data quality, and the need to integrate budget and performance all drive specific demands for information services. A key principle of the demand-driven approach is "aim high:" to focus on the mission, strategic objectives, and other high-level definitions of organizational success, and to support their achievement with specifically targeted information. Contrasting the Supply-driven and Demand-driven Approaches The supply-driven approach to information management has the advantages of being simple, low cost, and flexible. It realizes these advantages by being user self-service-oriented: the user does his or her own querying, integration, data transformation, addition of business context, and interpretation of the data. The supply-driven approach is well suited to a class of users that is exemplified by the scientific researcher. It always will have a role at EPA. The self-service and ad hoc nature of the supply-driven approach also is its weakness. It is much less well suited for situations in which The requirements are well structured and recurring. Individual users do not possess the background, experience, or business understanding to add context to the data; therefore, that context must be added for them. It is critical that multiple users all receive the information. (This often is referred to as "a single version of the truth.") The required (historical) data no longer resides in the operational systems. The solution to the problem requires multi-dimensional analysis. Data must be integrated from multiple sources. Because it is "pulled" by business requirements for information, the demand-driven approach is more targeted to providing information that is useful to the customer (user) and in a way that is useful to and actionable by him or her. The demand-driven approach to information management requires the IT function to do more and likely costs more, but has the potential to deliver business value not possible with the supply-driven approach. Ventana Research, a leading performance management research and advisory services firm, compares the two approaches this way: "Ventana Research continues to see a significant focus on the data-to-user [supply-driven] approach, data warehousing to their information architecture and business requests, instead of a user-to-data [demand-driven] approach that leverages business intelligence and performance management. The data-to-user approach along with simply picking a best-of-breed tool for data integration, data warehousing, metadata management, and business intelligence tools will not necessarily bring full business value sought by CIO and business management from IT investments." January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 10 ------- 1.3. Recommendation A big advantage of the demand-driven approach is its linkage between business requirements for information and information services to fulfill that requirement. This linkage provides an opportunity to deliver business value in the short term. It facilitates a focus on business value and therefore provides a framework for IT governance. The demand-driven approach to providing information services is different than and a valuable complement to the supply-driven approach. It would require OEI proactively to engage with its customers to determine their information needs and then build information services specifically to meet those needs. The demand-driven approach would also require OEI personnel to have deeper domain knowledge of the work of EPA than the supply-driven approach, so that they could provide more useful context with the information. The demand-driven approach holds more potential for OEI to add value to EPA than the supply-driven approach. DecisionPath recommends that OEI adopt the demand-driven model as its primary approach to information management, while continuing the supply-driven approach for those situations in which it is appropriate. January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 11 ------- 2.Technical Evaluation of Information Integration Alternatives OEI uses a number of data architectures, data integration approaches, and management techniques to carry out its mission. This section discusses five such architectures, approaches, and techniques, for the purpose of understanding exactly what they are, how OEI currently uses them, opportunities to extend their use, and the situations in which they are and are not appropriate. The discussion topics are Services-oriented architecture (SOA) Data warehousing Enterprise architecture IT governance Program Management Office (PMO) 2.1. Services-oriented Architecture (SOA) Services-oriented Architecture (SOA) is an integration technique that loosely couples software services to support business processes requirements. SOA implementations typically use a combination of XML- based technologies to implement Web Services. At its core, SOA is based on the concept of Remote Procedure Call (RFC). In the late 1990s, XML-RPC was introduced and quickly evolved into Web Services. The basic idea in all of these variants is that the service provider and service requester agree on a set of standard communication techniques (an Application Programming Interface or API). For Web Services, the API is documented in Web Services Definition Language (WSDL). The service requester needs only to follow the protocol for invoking the remote service, without being concerned about what is actually happening at the far end of the communication. Web Services use the Simple Object Access Protocol (SOAP). This infrastructure is further enhanced with a directory service known as User Universal Description, Discovery, and Integration (UDDI) that provides for service discovery. A service directory provides API metadata (WSDL) and the location of services that are currently online. In order to locate various API services, the client program needs only know how to use the service directory. Taken together, these features of SOA provide for dynamic discovery, dynamic location, and dynamic invocation of RPC. Web Services add a feature to SOA by using HTTP or HTTPS as their transport mechanism: those are often the only protocols allowed to pass through an organization's firewall. Figure 1 shows a simple diagram of Web Services. For data integration, SOA typically is used to implement a federation or [limited] propagation architecture. SOA typically performs best when small amounts of data are passed back and forth. When large amounts of data must be moved from point to point, the communication overhead quickly can outweigh the benefits of a loosely-coupled architecture. This is especially true of XML-based SOA, such as Web Services, and in general of using XML to encode large volumes of data. January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 12 ------- Service BroAer Service Requestor Service Provider Figure 1: How Web Services work. One of the prevailing architectural visions within OEI is that SOA can and should be used as the basis for its entire applications architecture. Its proponents envision an architecture that is a collection of Web services. Services-oriented architecture (SOA) is a "hot topic" receiving a lot of coverage in the technical media. SOA is a conceptually sound and viable architectural approach; however, its successful application is not without challenges. A careful review of available literature about successful and unsuccessful implementations of SOA reveals the following: Two objectives of SOA are interoperability and service reuse. Service reuse entails using some of the services developed for application, in application?, some services from application, and application: in applications, and so on - enabling development of each application to take less time and money than the application that preceded it. The degree of service reuse that can be realized depends heavily on identifying the right business process components to implement as services. Correct identification requires understanding both the business processes and the information requirements. "SOA by itself provides no guidance on how to build the right services to meet current business requirements" Ronald Schmelzer and Jason Bloomberg, Three Roads to the SOA Implementation Framework. SOA still is relatively new and unproven. Its promise has not yet been realized in widespread practice. SOA success stories are based upon using SOA to solve specific business problems. The use of the SOA-based Exchange Network as a common data intake mechanism for many EPA partners to submit data into many EPA program office systems is an example of a SOA application to solve a specific business problem - how to obtain data from EPA's partners. Business process modeling often is used to define processes that are composed of services, as shown in Figure 2. However, enterprise-wide business process modeling efforts are a long-term undertaking and therefore are not practical to complete before beginning a SOA implementation. SOA success requires a governance structure that has significant input from the business user/customer. January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 13 ------- "It's becoming evident from the experiences of early adopters that an SOA requires mechanisms for oversight, policy governance, and change management" Phil Wainewright, Laying the foundations for SOA. Not all IT processes are suitable to be architected as a service. Examples of unsuitable processes include ETL streams that move large volumes of data (which could be either from source to data warehouse or from data warehouse to data mart) ETL streams that entail complex transformation logic ETL streams that perform significant data cleansing Federated queries that require data to be combined from many sources Federated queries with complex logic Queries that return a large volume of data From our experience, these processes are much better satisfied by the consolidation approach to data integration as typified by data warehousing. Business intelligence and data warehousing generally are not good candidates for SOA, no matter what software tool vendors might claim. Figure 2: A process for developing SOA-based applications. It is important to have a clear understanding of the current and possible uses of SOA and the Exchange Network by EPA. The SOA-based Exchange Network is used for data intake: the data flows from EPA's partners to EPA. There is limited data flow from EPA to the partners. A partner such as a state already has in its own system(s) the environmental data it collected and submitted to EPA. If it wants to use this data, it retrieves the data from its own system(s) rather than "getting it back" from EPA. If an EPA January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 14 ------- partner has a need for environmental information that 1) it didn't send to EPA and doesn't have in its own system(s), or that 2) must be integrated from various sources, neither the Exchange Network nor SOA are the right mechanism for that partner to obtain the information it needs from EPA. 2.2. Data Warehousing Data warehousing is a set of IT techniques to make information available for analysis and decision- making. It uses the consolidation approach to data integration by pre-building special-purpose data stores called data warehouses and data marts. According to Bill Inmon, the "father of data warehousing," a data warehouse is "a subject-oriented, integrated, non-volatile, time-variant collection of data organized to support management's needs." Although a data warehouse is a database, it differs from a database used for an On-Line Transaction Processing (OLTP) application in that the data in it is specifically organized for information distribution instead of for transaction processing. Data warehouses typically have a relational structure that might not be as normalized as for an OLTP database. Data marts are special-purpose data stores optimized for information access. They are designed to facilitate end-user analysis of data. A data mart typically supports a single analytic application used by a distinct set of end-users. Data marts can be dependent (that is, sourced from a data warehouse) in a hub-and-spoke data architecture, or independent in a bus architecture. The hub-and-spoke architecture is associated with Bill Inmon, and the bus architecture is associated with Ralph Kimball: Mr. Inmon and Dr. Kimball are noted authorities in the field of data warehousing. Because data warehouses and data marts use the consolidation approach to data integration, they are well suited to situations in which: Large volumes of data must be moved. Data from many sources must be integrated. The transformation logic is complex. The source data requires significant data cleansing. User needs for information are repeatable and can be predicted in advance. Users need historical or trend data. Envirofacts is a data warehouse with loose integration by facility, substance, and geolocation. The TRI and AQS data marts are independent data marts. The TRJ and AQS data marts were built using a Kimball-oriented bus architecture. The debate within OEI about conformed dimensions is a result of this architecture. DecisionPath believes OEI should migrate toward a hub-and-spoke architecture, with Envirofacts being the hub data warehouse and the TRI, AQS, and all future data marts being dependent spokes. Such a hub-and-spoke architecture offers more future flexibility, in these ways: The debate about conformed dimensions becomes moot: the dimensions in dependent data marts are copies or subsets of the dimensions in the data warehouse. Atomic-level history can be stored in the data warehouse rather than in the data marts, which makes redesign of data marts for additional or changed requirements much easier. January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 15 ------- Change to a source system only requires changing one ETL stream (from source to data warehouse) to reflect the source change, versus requiring changes to the ETL stream from source to data warehouse (Envirofacts) and the ETL streams from source to each data mart that uses that source. Envirofacts initially was developed circa 1993, at a time before good ETL tools were available. Therefore, most of the ETL to populate Envirofacts is hand-coded. Redevelopment of this hand-coded ETL using the Informatica ETL tool would make Envirofacts much easier to maintain. The constraint impeding such a redevelopment effort is lack of funding. Recap of SOA and Data Warehousing Both SOA and data warehousing are valid approaches to application architecture and design. The key is to use each in situations for which it is well suited. The table below describes the situations for which each is better. Characteristic or Situation Method of data integration Data to be moved Trigger for data movement Requirement for latency of data Frequency of data change Pattem(s) of usage User requirements for information Best if SOA / Web Services Federation or propagation Individual transactions or messages; small volume, but perhaps frequent, moves Event-driven; on demand Near real-time; low latency Volatile data Not well known Not known in advance; ad hoc Desired result is a single record Queries are simple Need most up-to-date data Service portfolio is dynamic Service providers are transient in location Data Warehousing Consolidation Large volume Usually scheduled; batch Some degree of latency is acceptable Stable data Well understood; predictable Predictable and recurring Data from many sources must be integrated Historical data is required Transformation logic is complex Significant data cleansing is required "Single version of the truth" is important Multi-dimensional analysis is required Aggregated data is needed Decision Path believes that some of EPA's information needs are best met using a data warehousing approach and that, therefore, OEI should pursue a combination of both SOA and data warehousing rather than a pure SOA application architecture. 2.3. Enterprise Architecture The Clinger-Cohen Act of 1996 assigned federal agency CIOs the responsibility to develop information technology architectures. The CIO Council began developing the Federal Enterprise Architecture Framework in 1998 "to promote shared development for common Federal processes, interoperability, and sharing of information among the Agencies of the Federal Government and other Government entities." The CIO Council defines enterprise architecture as "a strategic information asset base, which defines the mission, the information necessary to perform the mission and the technologies necessary to perform the January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 16 ------- mission, and the transitional processes for implementing new technologies in response to the changing mission needs. An enterprise architecture includes a baseline architecture, target architecture, and a sequencing plan." The objectives of the Federal Enterprise Architecture Framework are to Promote Federal interoperability Promote Agency resource sharing Provide potential for Federal and Agency reduced costs Improve ability to share information Support Federal and Agency capital IT investment planning OEI has been working since 2001 to implement an EPA Enterprise Architecture (EA) aligned with the Federal Enterprise Architecture. The Chief Architect and others continue to work diligently to develop and communicate EPA's EA. Some of the program offices, most notably the Office of Pollution Prevention and Toxics (OPPT), are documenting their business processes for inclusion into the EA. OPPT has documented thirty-two process flows and one of them, Inventory Update Rule, is one of the first applications (called iEUR) of ECMS, EPA's implementation of Documentum. One of the fundamental challenges of the team working on EPA's EA is to make it relevant and actually influence how IT development funds are spent and how individual projects are designed and built. In order to be relevant, the EA must be more than an abstract representation of an idealized future state: the sponsors, users, and developers of systems must be able to tell why and how to apply it to their project(s). Widespread adoption of the EA within EPA will be very challenging because of EPA's decentralized and organizationally stovepiped culture. Most of the (primarily OEI) EPA employees that we interviewed voluntarily voiced support for the EA, and no one explicitly opposed it. However, we do not see widespread support for the EA in practice. For example, the Chief Architect's September 14, 2006 presentation to the IRM Branch Chiefs Meeting entitled "Evolving EPA's Enterprise Architecture" listed eleven proposed architecture priorities. However, it also noted that "These 'priorities' highlight how EA can be used, but have not been selected to be acted on." EPA's enterprise architecture provides guidance for how systems are to be designed and built. However, implementing the EA is a massive, long-term effort that will take many years to complete. 2.4. IT Governance IT governance is a subset of corporate governance that deals with the connection between business focus and IT management of an organization. Its primary goals are 1) to assure that investments in IT generate business value, and 2) to mitigate the risks that are associated with IT. In EPA's context, "business value" can be defined as realization of its mission ("To protect human health and the environment") and achievement of the goals articulated in its 2006-2011 Strategic Plan. Compliance with environmental statutes and other legislation and federal policy also might be considered business value. January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 17 ------- In The Information Paradox, John Thorp describes IT governance using the "Four Ares" model. Figure 3 showing the "Four Ares" is based on a diagram in Enterprise Value: Governance of IT Investments, The Val IT Framework, published by the IT Governance Institute. The strategic question: Is the investment: In line with our vision Consistent with our business principles Contributing to our strategic objectives Providing optimal value, at affordable cost, at an acceptable level of risk The architecture question: Is the investment: In line with our architecture Consistent with our architectural principles Contributing to the population of our architecture In line with other initiatives Are doing the right things? we getting the benefits? doing them the right way? Are we getting them done well? The value question: Do we have: A clear and shared understanding of the expected benefits Clear accountability for realizing the benefits Relevant metrics An effective benefits realization process The delivery question: Do we have: Effective and disciplined management, delivery, and change management processes Competent and available technical and business resources to deliver: > The required capabilities > The organizational changes required to leverage the capabilities Figure 3: The "Four Ares" model of IT governance. The IT Governance Institute has developed two complementary frameworks for IT governance: Val IT and COBIT© (Control Objectives for Information and Related Technology). Val IT and COBIT use Thorp's "Four Ares" model. COBIT "provides a comprehensive framework for the management and delivery of high-quality information technology-based services. It sets best practices for the means of contributing to the process of value creation." Val IT "adds best practices for the end, providing the means to unambiguously measure, monitor, and optimize the realization of business value from investment in IT." The purpose of this discussion of Val IT and COBIT is not to recommend them to OEI, but to demonstrate that IT governance is: A mature discipline with well-developed conceptual frameworks Fundamentally about recognizing that IT is an investment for the purpose of realizing business value, and managing it that way The Clinger-Cohen Act of 1996 mandated that certain federal agencies have a CIO, requires federal agencies to have an Enterprise Architecture, and created the Capital Planning and Investment Control (CPIC) process. These and other external requirements are a form of IT governance imposed upon EPA, but complying with the letter of the requirements does not necessarily yield effective governance. The externally imposed governance structures of the Clinger-Cohen Act attempt to motivate agency IT behavior through controlling the purse strings, essentially saying, "follow the rules or we won't give you the money for your IT project." January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 18 ------- In our interviews with them, several individuals indicated that IT governance is weak at EPA. On the other hand, in the project kick-off meeting, one of the participants said that OEI had built a big IT governance structure over the past few years. The participant might have been referring to external mandates imposed on all federal agencies, such as the Clinger-Cohen Act of 1996. The level of externally imposed governance varies by the size (amount of funding) of the project. The CPIC process requires the most control for projects greater than $3,000,000, and lesser control over projects from $250,000 to $3,000,000. One interviewee indicated that there is little governance over or accountability for projects less than $250,000. We believe that the governance mechanisms mandated by the Clinger-Cohen Act (enterprise architecture, CPIC, and so on) do not completely inherently link IT investments with business value. There is potential for OEI to improve IT governance to achieve such linkage. In order to realize business value from an IT investment, The business value to be gained must be specifically identified. The business process changes by which that business value will be realized must be understood. How information and/or information technology will enable those process changes must be specified. These activities, which normally are part of requirements gathering, require OEI to engage with its business customers and understand their domains. They cannot be done in isolation from the business. Returning to the Thorp "Four Ares" model of IT governance, two of the four questions can only be answered by the business, not by IT, as shown below. Type of Question Strategic Architecture Delivery Value Question Are we doing the right things? Are we doing them the right way? Are we getting them done well? Are we getting the benefits? Answered by Business IT IT Business 2.5. Program Management Office (PMO) The Project Management Institute (PMI) defines a program as "a group of related projects managed in a coordinated way to obtain benefits and control not available from managing them individually. Programs may include elements of related work outside of the scope of the discrete projects in the program." PMI defines a program management office (PMO) as "the centralized management of a particular program or programs such that corporate benefit is realized by the sharing of resources, methodologies, tools, and techniques, and related high-level project management focus." In simple terms, a PMO manages a group of related projects to identify and realize synergies among them. These synergies could be reuse of common resources, tools, or objects; coordination of dependencies between projects; and so on. OEI has a PMO that primarily focuses on providing a toolkit of software tools (sign-on, portal, ETL, and so on) for use by the program office IT groups to develop their own applications. The potential benefits of a PMO with such an emphasis are primarily in the economics of the IT infrastructure (such as avoidance of multiple tools for the same purpose and negotiating leverage for software licensing). January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 19 ------- There also is an ECMS (EPA's implementation of Documentum) PMO. An ECMS PMO is a logical and traditional application of a PMO to prioritize and sequence the individual Documentum implementation projects, realize synergies among these projects, and so on. The OEI PMO as currently tasked serves a useful purpose. OEI has an opportunity to give its PMO a larger and more value-added mission: to measure, monitor, and optimize the realization of business value from the investments EPA makes in OEI. Such a mission would have two major parts: investment decisions (Are we doing the right things?) and realization of benefits (Are we getting the benefits?). If OEI gives its PMO this mission, several issues will have to be worked out: What is the relationship between the IT governance work of the PMO and the Clinger-Cohen Act compliance, enterprise architecture, and CPIC activities performed by other elements of OEI? The managers in OEI who currently can initiate "less than $250,000" projects with minimal governance and accountability now will have such projects governed by the PMO. This cultural change might not be well accepted unless it is carefully managed. The larger question is to what extent the OEI PMO can control or influence IT investment decisions and projects by the program offices and regions. If it cannot, its impact will be significantly reduced. January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 20 ------- 3. Development Plan for Information Integration There are a number of ways OEI can combine the supply- and demand-driven approaches to information management described in section 1 of this document with the technical alternatives - SO A, data warehousing, EA, IT governance, and PMO - described in section 2 into a program plan for moving forward. This section covers four such options, recommends one of them, and outlines next steps for OEI to adopt the recommended option. The four options are: Option 1: Maintain status quo Option 2: Improve governance Option 3: Governance with top-down information needs Option 4: Governance with bottom-up information services 3.1. Option 1: Maintain Status Quo The status quo option can be described as: Continue the present supply-driven approach to providing information services to internal EPA customers. Continue to provide environmental information to the public via the EPA Web site. Incrementally upgrade and extend the capabilities of the Web site. Advantages The primary advantage of continuing the status quo is that it requires no change and no incremental effort. Disadvantages OEI's future, including the prospect of additional funding, may be limited. Beyond provision of IT infrastructure, OEI's value proposition to its internal customers is not clear to them. Therefore, they turn to their internal IT departments for services rather than to OEI. OEI's ability to serve its external customers, the public, is limited, because it doesn't know their specific needs so can offer only undifferentiated supply-driven information services to them. There is limited linkage between OEI's services and EPA's business needs. At some point in the future, some linkage might be provided by full implementation of the Enterprise Architecture. Like all federal agencies, EPA faces budgetary constraints and must make difficult spending choices and trade-offs. Without linkage between OEI's services and EPA's business needs, it is difficult to make IT investment and technology decisions. Without a strong business case for its services, it is difficult for OEI to make and defend specific technology investment choices. January 30,2007 Environmental Information Integration - Final Deliverable (revised) Page 21 ------- 3.2. Option 2: Improve Governance OEI already has components that work on the EPA Enterprise Architecture (EA) and perform the other functions required by the Clinger-Cohen Act, such as the CPIC process. These existing governance mechanisms fall short both in the identification of specific business value for EPA and in tracing a link between IT investments (whether they be in application systems or in tools and technology) and realization of that business value. Without linkage to business value, IT investments risk being technology solutions looking for a problem. An option to improve OEFs IT governance to achieve a tighter linkage between investment and business value would be to expand the governance role of its program management office (PMO). Such a role would entail: Development of a multi-year program plan of IT investments, based on a portfolio of investment opportunities Identification (or collection) of the IT investment opportunities that make up the portfolio, including specification of the potential business value of each and the implementation risks involved Figure 4 graphically illustrates the development and management of the opportunity portfolio, with prioritized implementation of selected opportunities. Portfolio Development Information Services Portfolio Cross-Goal Management Processes: Environmental Performance Analysis and Measurement, Policy Impact Analysis, Budget/Performance Integration Program/Region Processes: Internal Operations & Infrastructure (From Federal Enterprise Architecture) Implementation Risk Portfolio Management Must Haves Easy Wins High Risk/Reward Why Do It? Opportunity E Implementation Figure 4: Development and Management of an IT Investment Portfolio. January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 22 ------- The objective of this PMO governance role is to ensure that IT investments are prioritized and approved based upon cost/value trade-offs. In order to be successful in this expanded role, the PMO should report to the CIO and the OEI Board of Directors and have sufficient authority to develop investment recommendations pertaining to the information services portfolio. 3.3. Option 3: Governance with Top-Down Information Needs If OEI wants to deliver the maximum business value to the agency, it should combine the improved governance of Option 2 with a change to the demand-driven model for information management, and proactively engage with EPA executives in the program offices and regions to determine their information needs. In this option, OEI would utilize different approaches for its internal and external customers: Internal customers: Canvas agency executives to determine information needs and set agency- wide priorities for enhanced information services. External customers: Develop a program plan for a multi-year effort to extend and enhance the public's access to environmental information. Because "the public" is such a large and heterogeneous customer group, it would be quite difficult for OIAA to engage with it at the level necessary to provide demand-driven information services. However, OIAA could more proactively engage subsets of the public regarding how they want EPA's information presented to them. Such an approach could be characterized as supply-driven content with demand-driven information presentation. Advantages Engaging with agency executives regarding their information needs, and responding to them, would give OEI an opportunity to obtain their support for its value proposition, and potentially make it easier to obtain funding for its proposed information services. EPA's external customers, the public, would benefit from a more managed program to provide environmental information to them in formats tailored to their needs. Disadvantages OEI does not have a history of engaging with agency executives regarding their business requirements for information, so the demand-driven approach would be a significant change for both OEI and agency executives. Because of EPA's decentralized culture of program office and regional stovepipes, it might not be possible to obtain executive consensus regarding information services priorities. Each program office and region might have its own parochial set of priorities. Such a top-down approach and building the consensus it requires regarding information services priorities takes time. There might be a perception that this approach would take too long to realize benefits. January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 23 ------- The business engagement aspect of this option would be a very significant change from the way OEI currently operates, and therefore would require large cultural change within OEI. 3.4. Option 4: Governance with Bottom-Up Information Services Option 4 is a more measured undertaking than Option 3. Rather than attempting an agency-wide top- down approach to OEI providing information services to its internal customers, it starts small by using a "prototype" to demonstrate the demand-driven model of information management. This prototype would be limited to a single selected EPA organization (program office, region, or headquarters component). It should be an example of a "must have" as shown in Figure 4: that is, it should have high performance impact but low implementation risk. The prototype's purpose would be to demonstrate how the demand- driven model works and to demonstrate the business value that it can achieve. The approach to providing information services to OEI's external customers, the public, would be the same for Option 4 as it is for Option 3: develop a program plan for a multi-year effort to extend and enhance the public's access to environmental information, then execute that plan. Advantages A prototype is a low-cost, low-risk way for OEI to explore the demand-driven approach to information management with an internal customer. OIAA could use a successful demand-driven prototype as a marketing tool to demonstrate a new capability to other internal customers. Additional capabilities and information services are the best way for OIAA to guarantee future funding and increase its impact upon the agency. A successful prototype will demonstrate that targeted demand-driven information services: Can quickly deliver business value while longer-term initiatives such as enterprise architecture are in progress. Can coexist with the current supply-driven information services. 3.5. Recommendation The four options presented are not mutually exclusive. With the exception of Option 1 (maintain status quo), all utilize the OEI PMO to improve IT governance. Options 3 and 4 both use a program plan to extend and enhance public access to environmental information. The primary difference between Options 3 and 4 is the extent to which they pursue a demand-driven model for information management: Option 3 takes a top-down comprehensive approach, while Option 4 takes a more limited prototype-based approach. Option 4, Governance with Bottom-up Information Services, offers the best combination of benefits and probability of success for OEI. A successful prototype that demonstrates the applicability of the demand- driven approach and its greater ability to deliver business value will give OEI the opportunity to further expand use of the demand-driven approach. January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 24 ------- 3.6. Next Steps Executing Option 4 has three distinct parts: developing a demand-driven prototype, repurposing the OEI PMO to play a larger governance role, and creating and then executing a program plan for public information services to realize "supply-driven content with demand-driven information presentation." Demand-Driven Prototype The first step in developing a demand-driven prototype is to identify an EPA program office, region or headquarters component with a business need for information that the prototype will satisfy, and then to obtain that organization's agreement to participate. Selection of the right information service to prototype is critical. Because the demand-driven approach is new to OEI, it might want to consider expert assistance with the prototype project, particularly in gathering the business requirements it will satisfy. Use OEI PMO to Improve Governance The essence of the additional role for the OEI PMO is to manage a portfolio of investment opportunities (projects to create new or improved information services) in order to achieve maximum business value. This additional role will be a significant change both for the PMO and OEI as a whole, and this change should be managed as a project. Organizational aspects of this PMO role change include the PMO charter, authority, reporting relationship, size/resources, and the relationship between the PMO's new governance activities and CPIC, enterprise architecture, and other existing governance structures and processes. The additional work of the PMO includes Collecting and identifying the IT investment opportunities (projects) that will make up the portfolio. These opportunities include projects already in progress, approved but not yet started, requested but not yet approved, and unmet information needs not yet recognized as projects. For each opportunity in the portfolio, understanding and documenting its potential business value, costs, and business and technical risks. Working with internal customers and OEI executive management to prioritize the opportunities based on business value, cost, and risk. Creating and maintaining the opportunity portfolio, and communicating its contents using mechanisms such as the graphic in Figure 4. Creating and maintaining a multi-year program plan to execute the prioritized and sequenced opportunities in the portfolio. Providing oversight of individual projects in progress. (The project managers of the individual projects will retain responsibility for managing them.) Monitoring completed projects for realization of business value, which often lags behind completion of the project. January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 25 ------- The PMO will be the mechanism by which OEI explores the demand-driven approach to information management. Therefore, initially it will be the locus for acquisition of new skills in business requirements elicitation, portfolio management, and demand-driven information services. When the prototype is successful and OEI moves to adopt the demand-driven approach more broadly, these skills will need to diffuse more widely throughout OIAA and eventually throughout OEI. A program management office chartered as described is a mainstream application of the PMO concept, but might be larger in scope and scale than OEI's PMO experience. OEI might want to obtain expert assistance for the PMO until its new role has been completely integrated into the organization. Program Plan for Public Information Services The public is a large and heterogeneous group of customers for OEI's information services. Because the environmental data that EPA and its partners collect primarily is driven by statute rather than customer requests for information, the content of OEI's public information services must be largely supply-driven. However, the way environmental information is presented and delivered to the public is less constrained. OIAA, and more specifically its Information Access Division (IAD), is the public-facing part of OEI via the EPA Web site. IAD already reaches out to the public to determine its information needs through a variety of mechanisms: focus groups, satisfaction surveys, and the like. These and additional mechanisms can be used to categorize users by type or interest (such as teachers, farmers, real estate purchasers, and so on), so that the EPA Web site presents environmental information in a manner tailored to the specific needs of each group. The result would be demand-driven information presentation for the public. Implementing demand-driven information presentation by category of public customer would be managed best by an overall program plan for public information services. Summary These next steps to develop a demand-driven prototype, improve IT governance via an expanded OEI program management office, and create a program plan for enhanced public information services will move OEI forward toward playing a larger role within EPA and delivering more value to both internal and external customers. They require change primarily from OIAA rather than from the Office of Information Collection (QIC) or the Office of Technology Operations and Planning (OTOP). They also begin to define a compelling new role and value proposition for OIAA: provision of demand-driven information services. January 30. 2007 Environmental Information Integration - Final Deliverable (revised) Page 26 ------- Appendix A Interviewees DecisionPath has held thirty-three information requirements-gathering meetings with 39 EPA employees: Person Count 1 2 3 4 5 6 7 8 9 10 11 12 13 14 IS 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 21 32 33 34 22 35 6 36 8 23 37 26 6 6 7 36 38 39 Group 1 1 2 3 3 4 4 4 5 6 7 7 8 9 10 II 12 13 13 13 14 14 14 14 15 16 17 18 19 20 21 22 23 23 24 25 26 27 28 28 29 30 31 31 31 32 32 32 32 33 Interviewee Mike Petruska Ben Smith Nancy Wentworth Leo Gueriguian Lisa Ayala Lionel Brown Emma McNamara Dalroy Ward Todd Holderman Branch Chiefs Meeting Cindy Dickinson Larry Fitzwater Mike Barrette Odelia Funke Gene Durman Lisa Jenkins Oscar Morales Connie Dwyer Chuck Freeman Chns Clark Doreen Sterling Pat Garvey Sara Hisel-McCoy Jonda Byrd Mike Cullen Kevin Phelps Mary McCafTery Mark Hamilton Kevin Kirby Brion Cook David Hindm John Sullivan Pat Garvey (2) Alex Klassaeg John Harman Connie Haaser Sara Hisel-McCoy (2) Maryane Tremaine Lionel Brown (2) Rick Martin Dalroy Ward (2) Jonda Byrd (2) Craig Hooks Mary McCaffery (2) Lionel Brown (3) Lionel Brown (4) Emma McNamara (2) Rick Martin (2) Linda Travels Warren Beer Assistant Administrator OEI OEI OEI OARM OCFO OEI OEI OEI OPPTS OEI OEI OECA OEI OAR OSWER OEI OEI OEI OEI OEI OEI OEI OEI OEI OSWER OEI OW OEI OPPTS OECA OEI OEI OEI OEI OSWER OEI Region 7 OEI OEI OEI OEI OEI OEI OEI OEI OEI OEI OEI Region 9 Office OIAA OIAA OIAA 99? ??? OIAA OIAA OIAA OPPT QIC QIC OC OTOP OPMO PMO QIC QIC QIC QIC QIC QIC QIC OIAA OTOP IMO ... 999 QIC OPPT OC OTOP QIC QIC QIC OEM QIC PLMG OIAA OIAA OIAA OIAA ... OIAA OIAA OIAA OIAA MTSD Division TRIPD TRIPD EAD 999 999 IAD IAD IAD IMD CSD CSD EPTDD MISD 99? 999 CSD IESD IESD IESD IESD IESD CSD IAD 999 ... ... ??? IESD IMD EPTDD MISD IESD IESD CSD NPPD? CSD ... IAD IAD IAD ... IAD IAD IAD ... ... Branch ... ... ... 999 999 ... ISB SIB DSB DSB 999 ... ??? 99? _ ... IETB 1ETB ... IEPB ... PPMB 999 ... _ ??? ISSB ... ... ... IEPB IEPB DSB 999 ... IRM-B ISB PPMB ... ... ... ... ... 1RM January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 27 ------- A breakdown by organization shows that most of the people we have interviewed work in OEI, as shown below. Organization OEI Program offices Administrative offices (OARM & OCFO) Regional offices Total Number of Interviewees 26 9 2 2 39 January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 28 ------- Appendix B Information Needs Areas The result of our information requirements-gathering interviews typically is a number of business questions that users want to be able to answer. We group these questions according to a common theme and a specific set of users into information needs areas. An information needs area may drive the creation of a data mart which targets that theme and that set of users. Our interviews with EPA personnel have yielded five information needs areas: Emergency planning and response Environmental management of health risk and prioritization Environmental policy analysis Public information Enforcement and compliance Because we inferred many of the business questions from the conversations we had with EPA personnel, and because we primarily have spoken with OEI personnel and therefore have been at least one degree of separation from the actual users of information, both the business questions and the information needs areas need to be validated with users. We believe direct communication with users in the program offices and regions will yield additional information need areas. Emergency Planning and Response The vast majority of incidents to which EPA's Office of Emergency Management responds are localized and straightforward. EPA handles such incidents well using existing information and business processes. Major incidents, such as Hurricane Katrina, the space shuttle disintegration, and the Exxon Valdez oil spill, are more problematic. Such incidents occur on a much larger scale, are more geographically dispersed, have multiple EPA employees and other agencies working on them, and both require and generate much more data. 9/11 and Katrina have required a paradigm shift from localized emergency response to a coordinated national approach for major incidents. Because these types of incidents are more challenging, and because coordination and response to these types of incidents is time-critical (e.g., need for potable water), EPA needs integrated information to better support emergency response. To more effectively respond to such incidents, EPA needs: Better ability to transmit the data collected from multiple field locations and bring it together at EPA headquarters into a holistic picture of the incident. Ability to identify, by geolocation, all relevant facilities in the geographic area affected by the incident. While all of the facility records in FRS contain geolocation (in some cases, populated by an address-matching algorithm), the accuracy and usefulness of this data varies. For example, in a large site, the geolocation data in FRS might not correspond with the physical location of the item within the site. Better ability to integrate facility data stored in regional systems into FRS for use by headquarters personnel. January 30,2007 Environmental Information Integration - Final Deliverable (revised) Page 29 ------- Better ability to incorporate information from external sources (such as Tier 2 prevention data from states) into EPA's own systems and to use this combined data quickly in the incident response. The external data most needed is locations of things (that is, facilities) in the incident area that are known to states and other parties, but are not known to EPA. EPA has limited ability to combine data from multiple incidents to answer questions from the public, Congress, and other stakeholders. Because incident data is stored and organized in a site-specific and incident-specific manner, cross-incident analysis is laborious and often depends on the memory of field response personnel rather than being data-based. The organization of the data also makes it difficult to do analysis by category of incident for continuous improvement of prevention and preparedness. Representative Business Questions: In how many sites did EPA have to clean up chemical X? Which sites were they? For a major incident, such as Hurricane Katrina, what potential hazards exist by substance, media, facility name, and geolocation of facility? For a major incident, such as Hurricane Katrina, what is the current measure of pollutant/hazard by media, facility name and geolocation of facility? Is this level hazardous? 1 Based on post-incident measures of pollutant/hazard by media, what actions are needed to remedy the problem by priority and geographical range? Based on existing post-incident hazards, what health protection actions (such as warning local residents not to drink the water) are required by geographical and time range of the problem? Environmental Management of Health Risk and Prioritization To support its mission of protecting human health, EPA must be able to correlate exposure to environmental hazards with the potential health risks of such exposure. A national, cross-media picture of current and anticipated future environmental hazards is necessary to equip EPA with the information it needs to evaluate agency tradeoffs and set agency priorities. Integrated information on environmental hazards is also needed at the geolocation level so that localities can evaluate and prioritize environmental initiatives based on understanding the environmental hazards in their area and the risks they pose to people and wildlife. Finally, EPA, the states, and the tribes need to work with the same integrated environmental information so that they can coordinate their efforts to answer to the public and remedy the highest priority environmental problems that exist. Currently, while there is some integrated environmental information, it has the following limitations: Cannot Relate Levels of Exposure to Health Risk Factors While partial cross-media information at the local level (zip code) exists in TRI and Envirofacts, it does not contain all pollutants and does not provide information that would allow stakeholders to analyze how varying levels of exposure to those pollutants present risks to human health and wildlife. Therefore, it is difficult to set priorities at an agency and local level. In some cases, such as lead in children, direct linkages between levels of exposure to the pollutant and associated health risks may be possible. In other cases, only a loose linkage to health hazards may be feasible (e.g., documents relating levels of exposure of a substance to increased health problems). January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 30 ------- 1 Difficult to Identify All Environmental Hazards by Facility While program offices collect information at the local level, different systems sometimes identify the same facilities in different ways, making it difficult to provide a complete picture of the pollutants that exist at specific locations. This information would allow EPA to better quantify the total environmental impact of a facility across all substances it emits. This information, combined with health risk information, would enable agency and local level cross-media prioritization of facilities based on highest levels of risk. Health Risk Information Only Available at Summary Level While health risk information is addressed at a summary level in the Strategic Plan, it is currently not available by geographical area to guide tactical planning needed to identify and remedy high- priority environmental hazards. Conflicting Environmental Information Information used by the states and tribes to manage local environmental priorities does not always match information used by the EPA due to timing differences and differences in information identifiers. This can cause EPA, the states, and the tribes to reach different, conflicting conclusions regarding environmental hazards and priorities. While EPA has made solid progress in obtaining information on environmental hazards, integrating this information across media and providing information on known health risks based on levels of exposure would provide improved guidance for agency and local-level planning, prioritization, and funding. It would also improve the agency's ability to relate its priorities and funding to improving outcomes. Representative Business Questions: What are the health risks for a given substance by exposure and demographics? What are the health risks for a given location, by demographics? What are the health risks for a given practice (e.g., eating shellfish) by source and demographic? What is the current state of air quality? How has this changed over time? What is the current state of water quality? How has this changed over time? What level of water quality problems exist, by contributing source of problem, and by location of problem? For a given substance, what are the locations (geolocation) of the sources of the substance? Environmental Policy Analysis EPA's 2006-2011 Strategic Plan continues and reinforces a focus on environmental and human health outcomes. This focus on outcomes requires the ability to evaluate, select, and implement programs to achieve those outcomes, determine the societal and EPA costs of the programs necessary to do so, understand the relationship between costs and benefits, and assess the extent to which the outcomes are being realized. When existing approaches do not achieve the desired outcomes, EPA needs new and/or additional approaches to close the outcome gap. Such analyses require sophisticated scenario modeling January 30,2007 Environmental Information Integration - Final Deliverable (revised) Page 31 ------- capability using economic, population, public health, environmental, and other data. EPA currently has limited capability for such scenario modeling. As a result, the analysis done is laborious, semi-manual, often uses incomplete data, and does not consider as many variables or as many scenarios as would be useful. The availability of high quality cross-program environmental data integrated with external economic, public health, and other data would facilitate analysis and contribute to better outcomes. Representative Business Questions: What is the environmental improvement trend based on enforcing an existing EPA regulation? What is the expected improvement versus the actual improvement? What environmental problem areas do not appear to be improving over time based on enforcing existing EPA regulations? What is the most cost-effective way (among several alternatives) to achieve a targeted environmental or human health outcome? What are the projected costs and benefits of each alternative approach? What is the cost (for industry and society in general, for EPA to monitor and enforce, and so on) of a specific regulation? Is the regulation achieving its intended environmental and societal outcomes? Public Information Inherent in EPA's mission to protect human health and the environment is the obligation and opportunity to provide information about the environment, environmental hazards and risks, and environmental protection and remediation efforts to the public and other stakeholders. While EPA wants to provide information so that stakeholders are informed about the environment, some of EPA's data is sensitive (proprietary, of potential use to terrorists, protected by privacy regulations, and so on). Therefore, access to some types of information must be carefully controlled. One way EPA provides information to its stakeholders is reactive: responding to FOIA requests, Congressional inquiries, correspondence from citizens, and so on. Such information requests can be wide-ranging. EPA needs to be able to respond to such requests with the right information in a timely and cost-effective way that is consistent with privacy and sensitivity constraints. Another way EPA provides information is proactively, through self-service programs such as Envirofacts and Window to My Environment delivered via the EPA Web site and other vehicles. The reactive and proactive communications are often geographically- or location-oriented. In contrast, EPA's data is collected and organized primarily for specific regulations and is therefore organized in systems by media type. Much of it also comes from external entities, such as regulated industries, states, tribes, and other EPA partners, which makes it challenging for EPA to ensure the data's quality and completeness. Currently, much time and manual effort is expended integrating data that originates from multiple EPA and external systems in order to answer information requests. Data quality issues also require time and effort to research and to resolve. They sometimes result in less complete or correct information provided to the requestor. The ability to quickly and easily provide environmental information integrated by geographic location would enable EPA to provide more information that is useful to the public and other stakeholders. January 30,2007 Environmental Information Integration - Final Deliverable (revised) Page 32 ------- Representative Business Questions: What are the funding needs of each non-federal SuperFund NPL (national priority list) site? (This was a question from Congress to EPA OIG.) Which SuperFund sites do not have sufficient funding allocated to maintain clean-up progress? What risks are associated with delaying clean-up completion? Enforcement and Compliance EPA employs an integrated approach of assistance, incentives, and civil and criminal enforcement to maximize compliance with applicable laws and regulations and to reduce threats to public health and the environment. Largely through its Office of Enforcement and Compliance Assurance (OECA), EPA performs inspections of sites and facilities, collects and integrates compliance data, develops compliance monitoring programs to support inspections and self-reporting, supports enforcement activities, prosecutes intentional misconduct, and oversees the enforcement of EPA's national hazardous waste cleanup programs. OECA is a relatively new organization within the EPA. It was created by centralizing compliance and enforcement personnel from the program offices. These centralized personnel use data from the various databases within the program offices, and they need to integrate this data into a holistic picture. For example, OECA needs to see all the permits, RMPs, enforcement actions, penalties, and so on for a given corporation, no matter how many physical locations it has, how many legal entities compose that corporate entity, in how many regions it has facilities, and how many regulations apply to it. Although OECA has the IDEA data warehouse that integrates compliance and enforcement data across programs, IDEA does not satisfy all of its information needs. Representative Business Questions: 0 What were the total enforcement actions by region, medium, industry, or time? " Is there a correlation between the number of regulated facilities in a locale and the quality of its drinking water? " For a given corporation, what is the compliance and enforcement history across all legal entities, facilities, and programs? What trends can be seen in penalties assessed over time by region, industry, or program? For a given facility, which program offices regulate it? How many SuperFund sites were in the regulated community before they were designated SuperFund sites? Which ones are they? (This was a question from OIG.) What facilities with compliance problems are pan of the same corporation? Which corporations have the best compliance records? What is the best deployment of limited inspection, compliance assistance, enforcement, and prosecution resources across the population of regulated entities and facilities? January 30,2007 Environmental Information Integration - Final Deliverable (revised) Page 33 ------- Appendix C Documents Reviewed No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Date Mar-So Feb-90 Jun-91 Apr-92 Nov-93 Sep-94 Mar-96 Jun-97 Mar-98 Jun-98 Jan-99 May-99 Aug-99 Aug-99 Sep-99 Sep-99 Nov-99 Feb-00 May-00 Jun-00 Jul-00 Aug-00 Oct-00 Oct-00 Nov-00 Nov-00 Nov-00 Jan-01 Feb-01 May-01 Aug-01 Aug-01 Oct-01 Source GAO GAO GAO GAO GAO GAO EPA EPA GAO EPA EDSC EPA EPA EPA GAO CIO Council EPA EPA EPA GCN GAO EPA GAO EPA EPA EDSC EDSC EDSC CIO Council EPA EPA EDSC EDSC Title CED-80-18 Stronger Management of EPA's Information Resources Is Critical To Meeting Program Needs PEMD-90-3 Hazardous Waste: EPA's Generation and Management Data Need Further Improvement T-IMTEC-91-16 Ineffective Information Management Impedes EPA's Enforcement Mission and Cross-Media Initiatives IMTEC-92-14 Environmental Enforcement: EPA Needs a Better Strategy to Manage Its Cross-Media Information AIMD-94-25 EPA Toxic Substances Program: Long-standing Information Planning Problems Must Be Addressed RCED-94-93 Toxic Substances: EPA Needs More Reliable Source Reduction Data and Progress Measures AIRS - AQS Modernization Proposal System Management Plan for AQS Executive Summary: AQS Conceptual Design Document Measuring Performance and Demonstrating Results of Information Technology Investments OIG OFFICE OF WATER DATA INTEGRATION EFFORTS FINAL SIC/NAICS DATA STANDARD Standard Update Standard Update Aiming for Excellence - Actions to Encourage Stewardship and Accelerate Environmental Progress ENVIRONMENTAL INFORMATION: EPA Is Taking Steps to Improve Information Management, but Challenges Remain GAO/RCED-99-261 Federal Enterprise Architecture Framework Standard Update Standard Update Standard Update EPA CIO Levine cut his teeth on states' systems AIMD-00-215 Information Security: Fundamental Weaknesses Place EPA Data and Operations at Risk Standard Update GAO-01-97T Environmental Information: EPA Needs Better Information to Manage Risks and Measure Results Blueprint for a National Environmental Information Exchange Network BIOLOGICAL TAXONOMY DATA STANDARD BUSINESS RULES Standard Data Elements For Facility Identification Standard Data Elements for Latitude/Longitude Standard Data Element for Date A Practical Guide to Enterprise Architecture Standard Update Standard Update Standard Data Elements for Biological Taxonomy Standard Data Elements for Chemical Identification January 30,2007 Environmental Information Integration - Final Deliverable (revised) Page 34 ------- No. 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 Date Dec-01 Feb-02 May-02 Jul-02 Aug-02 Sep-02 Sep-02 Sep-02 Sep-02 Sep-02 Nov-02 Dec-02 Jan-03 Jan-03 Jan-03 May-03 May-03 Jun-03 Jun-03 Sep-03 Nov-03 Nov-03 Nov-03 Feb-04 Mar-04 Apr-04 Oct-04 Nov-04 Nov-04 Jan-05 Jan-05 Jan-05 Mar-05 Apr-05 May-05 May-05 Jun-05 Jun-05 Jun-05 Source EPA EPA EPA GAO EPA EPA EPA EPA EPA EPA EPA EDSC EDSC EDSC EDSC EPA EPA EPA EPA EPA EDSC EDSC EPA EPA EPA EPA EPA EPA EPA EPA Mimno, Myers, & Holum Mimno, Myers, & Holum EPA EPA EPA EPA EPA NAPA EPA Title EPA's Key Management Challenges Standard Update Background Information on the National Priorities List (NPL) Federal Civil Penalties Inflation Adjustment Act Standard Update EPA's Strategic Plan form Homeland Security EPA's Key Management Challenges OIG CERCLIS Data Quality OIG EPA Management of Information Technology Resources Under The Clinger-Cohen Act OIG Information Technology: EPA Management of Information Technology Resources Under The Clinger-Cohen Act Registry Update Standard Data Elements for Reporting Water Quality Results for Chemical and Microbiological Analytes Standard Data Elements for Contact Information Standard Data Elements for Enforcement/Compliance Standard Data Elements for Tribal Identifier Registry Update EPA's Key Management Challenges Draft Report on the Environment 2003 Draft Report on the Environment Technical Document 2003-2008 EPA Strategic Plan Federal Facility Identification Data Standard Standard Data Elements for Permitting Information (Final) Enterprise Architecture at EPA Registry Update Health & Ecoinformatics EPA's Key Management Challenges OEI Celebrating Five Years of Success Accelerating Our Progress in the Future CDX Partner Systems CDX Technologies Sample TRI Report Conformed Dimensions for the EPA Enterprise Architecture: Implementation for the AQS Data Mart Conformed Dimensions for the EPA Enterprise Architecture: Recommended Implementation Strategies AQS Data Mart Beta Testers Page EPA's Key Management Challenges Registry Update Charge to the Peer Reviewers: Air and Other Relevant Indicators for the U.S. EPA's 2007 Report on the Environment Technical Document A Report of the Environmental Information Consortium An Integrated Facility Identification System: Key to Effective Management of Environmental Information at the EPA Annual Performance Plan and Budget Overview January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 35 ------- No. 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 Date Jun-05 Jun-05 Jun-05 Jun-05 Jun-05 Jun-05 Jun-05 Jun-05 Jul-05 Jul-05 Aug-05 Sep-05 Nov-05 Nov-05 Nov-05 Dec-05 Jan-06 Jan-06 Jan-06 Jan-06 Jan-06 Jan-06 Feb-06 Feb-06 Mar-06 Mar-06 Mar-06 Mar-06 Mar-06 Mar-06 Source EPA/Lockheed Martin Planview EPA MetaCarta EPA EPA EPA EPA EPA EPA EPA EPA EPA Mimno, Myers, & Holum EPA EPA EPA Gartner EDSC EDSC IT Governance Institute EPA EPA EPA Gartner IT Governance Institute IT Governance Institute IT Governance Institute EPA EPA Title Franchise Development A Strategy for Reuse of Methods and SOA Components It Portfolio Management In The Federal Government The Toxics Release Inventory (TRI) and Factors to Consider When Using TRI Data Improve Environmental Compliance and Analysis Substance Conformed Dimension Data Model Substance Conformed Dimension Data Modeling Discussion Meeting Minutes Substance Conformed Dimension Business Rules Substance Conformed Dimension Permissible Values List OEI's Applied Analyses and Applications Workshop: OEI's Role in Science and Analysis Talking Points for Linda Travers: OEI's Applied Analyses and Applications Workshop Registry Update OIG EPA Needs to Improve Oversight of Its Information Technology Projects U.S. EPA PERFORMANCE AND ACCOUNTABILITY REPORT Enterprise Data Integration at EPA FEA The Data Reference Model Tools for Effective Program Management: Getting budget, financial and environmental results information to EPA employees 2005 PRESIDENT'S MANAGEMENT AGENDA RESULTS REPORT Poor-Quality Data: The Sure Way to Lose Business and Attract Auditors BIOLOGICAL TAXONOMY DATA STANDARD MEASURE DATA STANDARD COBIT 4.0: Control Objectives For Information And Related Technology Functions and Services Provided by CDX Registry Update AQS Data Mart Data Model Enterprise Information Management: Getting Value From Information Assets Enterprise Value: Governance Of IT Investments: The Business Case Enterprise Value: Governance Of IT Investments: The ING Case Study Enterprise Value: Governance Of IT Investments: The Val IT Framework Goal 1 : Clean Air and Global Climate Change RESPONSES TO STATE & TRIBAL ISSUES GOAL 2: CLEAN AND SAFE WATER RESPONSES TO STATE & TRIBAL ISSUES January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 36 ------- No. 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 Date Mar-06 Mar-06 Mar-06 Apr-06 May-06 Jun-06 Jun-06 Jun-06 Jun-06 Jun-06 Jul-06 Jul-06 Aug-06 Aug-06 Sep-06 Sep-06 Sep-06 Oct-06 tbd Source EPA EPA EPA EPA EPA EPA OMB GAO OMB GAO GAO EPA Scottish Executive GAO EPA EPA EPA Lockheed Martin EPA Title EPA STRATEGIC PLAN 2006-201 1 GOAL 3 - Land Preservation and Restoration Responses to State and Tribal Issues GOAL 4 - HEALTHY COMMUNITIES & ECOSYSTEMS Responses to State and Tribal Issues GOAL 5 - COMPLIANCE AND ENVIRONMENTAL STEWARDSHIP RESPONSES TO STATE AND TRIBAL ISSUES EPA's Key Management Challenges EPA 2006-201 1 Strategic Plan: Charting our Course (Draft) AQSP&A Transaction Generator PEA Consolidated Reference Model Document GAO-06-669 Clean Air Act: EPA Should Improve the Management of Its Air Toxics Program Revision Summary Document for the FEA Consolidated Reference Model Version 2.0 Information Technology Architecture and Systems Issues: Comments from EPA GAO-06-780 Paniculate Matter: EPA Has Started to Address the National Academies' Recommendations on Estimating Health Benefits, but More Progress Is Needed AQS Data Mart Training Effective Provision of Environmental Information and Advice: A Scoping Study GAO-06-1032T Chemical Regulation: Actions are Needed to Improve the Effectiveness of EPA's Chemical Review Program OEI/IESD Overview Evolving EPA's Enterprise Architecture 2006-201 1 EPA Strategic Plan: Charting Our Course Application Solution Architecture: Concept of Operations - Re- alignment to Service Oriented Architecture Implementing the Federal Enterprise Architecture Framework at EPA January 30, 2007 Environmental Information Integration - Final Deliverable (revised) Page 37 ------- |