Environmental
Protection
Agency
Data
Management
and
Standardization
Program

Management
Information
and Data
Systems
Divisions
             Arthur Young

-------
Environmental
Protection
Agency


Data
Management
and
Standardization
Program

Management
Information and
Data  Systems
Divisions
Arthur Young

-------
                      TABLE OF CONTENTS






                                                       PAGE






I.      BACKGROUND                                      1




II.     BENEFITS OF DATA RESOURCE MANAGEMENT            3




III.    STUDY APPROACH                                  3




IV.     RECOMMENDATIONS                                 4




V.      CONCLUSION                                      10

-------
                     LIST OF EXHIBITS
                                                    FOLLOWING
                                                      PAGE
1.       STUDY METHODOLOGY                              3

2.       SAMPLE SYSTEMS COMPARISON MATRIX               4

3.       DISTRIBUTED DATA MANAGEMENT ORGANIZATIONAL     5
        STRUCTURE

4.       DISTRIBUTED DATA MANAGEMENT INTERSYSTEM        6
        LIFE CYCLE

5.       PROGRAM LIFE CYCLE                             7

6.       PROGRAM IMPLEMENTATION SCHEDULE                10

7.       COST ESTIMATE MATRIX                           10

8.       FIVE-YEAR COST AFTER FULL IMPLEMENTATION       10

-------
          EXECUTIVE SUMMARY
I.  BACKGROUND
    The  executive who  manages
traditional  resources such as people,
money,  and property, now recognizes the
benefits  of  also  managing data as  a
resource.  Data  resource management
facilitates more effective operational
decision making by enhancing  the
sharing  of   information  across
organizational  boundaries  and   among
information  systems.   Critical
management decisions typically require
the  integration  of   information  from
several sources implying  the need  to
know where relevant information  may be
found and the  context under  which  it
should  be interpreted.  The management
of  data  as  a  resource  may  lead  to
reduced data  inconsistencies and  the
identification  of  conditions  relating
to  the  currency  and accuracy of  the
information.

    The   Data  Management   and
Standardization  Program   Feasibility
Study  conducted  by Arthur  Young  &
Company for  EPA investigates  the  need
for  a  coordinated   data management
program in EPA. From  this study Arthur
Young  &  Company has  developed  an
approach  that will coordinate  the
current  extensive,  yet  diversified,
efforts towards the management of  data
as a resource  in a relatively smooth,
evolutionary  process.   The timing  of
this evolutionary process is such  that
the   implementation  of   the
recommendations should culminate in an
operational  data   management  and
standardization program coinciding with
the  1980's  hardware  procurement
implementation.
    The impetus for this project  came
from  several  diverse factors and
sources  within EPA.   These factors
included:

        Government-wide  recognition
        that ineffective management  of
        information  resources  has
        impaired achievement of Agency-
        wide missions.

        Inability to identify regulated
        facilities consistently across
        program  offices  and   to
        coordinate  and  effectively
        share information  about  these
        facilities among the programs.

        Shift in EPA from unplanned and
        uncoordinated DP growth to the
        introduction  of   formal  DP
        controls  (Stage III  of ADP
        evolution)

        Lack of a  defined  vehicle  to
        coordinate  and  implement  data
        sharing  requirements  for the
        Interagency Regulatory Liaison
        Group (IRLG).

    In the  following  paragraphs  we
describe how these motivational factors
interacted  to initiate this project.

1.   Management  of  Information as  a
    Resource

    There  are  currently bills  before
Congress  to  revise the data processing
and  information  management
organizations in Federal agencies "To
improve the economy and efficiency  of
the Government and  the private  sector
by  improving.Federal information
management..."  —     In addition,  "The
President will soon issue an Executive
Order  establishing  new  management
controls  over  government  information

-------
                         2 /
collection activities." —    These
actions  are  direct  results of  the
reports  issued  by  the  Commission  on
Federal Paperwork.  Both  the Commission
and CMB consider "Information Resource
Management" to  be  the centerpiece  of
the Commissions  efforts.—

    A  number  of individuals  in  MIDSD
and EPA recognized the need to manage
information as a resource long before
the anticipated actions of Congress and
the  President were  known.   The
initiation of the Data Management and
Standardization   Program  Study was  a
concerted effort to document  the need
for such a program  and to identify the
efforts  already  initiated  in EPA  on
components of a  data management program.

2.   Common Facility Identifier

    Attempts  to share and integrate
data   from  systems   developed  by
different programs  or  regions has been
difficult  because of  the  use  of
different  facility definitions  and
identifiers within EPA.   This problem
has been recognized  in EPA for quite a
long time but did not  receive much top
level  interest  until  the  attempt  to
isolate the cause of a cancer epidemic
in New Jersey.   Attempts  to  identify
sources  of  potential  carcinogenic
substances  via  computer was nearly
impossible  because   of  different
facility definitions and identifiers in
use in the program divisions of air and
water.  Attempts to standardize facility
definitions and coding  schemes had been
independently undertaken by the Office
of Water  Program Operations (OWPO),
Office  of Enforcement, and  several
Regions (Facilities  Index System Pilot
Project).  For example an intricate part
of OWPO's long-range ADP plan  is the
development  of a  common  facility
identifier.  Region  II has a currently
ongoing project to interface  a number
of systems.  A major part of this project
is to  cross-reference  the  facility
identifiers and  resolve discrepencies
in order to produce a master facility
identification list for Region II.  The
proposal  of a  data  management  and
standardization program for EPA offers
the opportunity to coordinate  these and
other efforts and  to expand into other
programs in EPA.

3.   Stage III of Data Processing Growth

    Several individuals  within  MIDSD
had recognized that EPA's ADP operations
had entered Stage III of data processing
growthas described by Richard L. Nolan,
Ph.D. —'    The transition from Stage II,
Contagion to Stage  III, Control is
characterized by a shift from a period
of extremely rapid and  uncontrolled DP
growth to a period  where formal controls
are introduced and professionalization
of the DP organization takes place. (A
subsequent study  performed  by Nolan,
Norton,  & Company  confirmed  the
observation that most ADP operations in
EPA had  entered Stage  III).  The  data
management and standardization program
study afforded  the onportunity to
document  the  need  for Agency-wide
coordination and to  establish  the
mechanisms and plan for effecting the
coordination.

4.   Interagency   Regulatory  Liaison
    Group  (IRLG) Support

    Douglas M. Costle, Administrator of
EPA, has  stated  that the Interagency
Regulatory  Liaison  Group  (IRLG)
coordination  of  chemical  substance
regulation has  top priority.  The  IRLG
is  currently conducting  a  project to
identify  and  implement common coding
schemes  vital to the sharing of chemical
information.   It  was recognized early
in  the  conceptual  stages  of the  IRLG
study that implementing the  coding
requirements  in  EPA will  require  a
coordinated,  Agency-wide  data
management program.

-------
II.  BENEFITS  OF  DATA  RESOURCE
    MANAGEMENT
        The full benefits  of a data
    resource management  program may
    require  several  years  to  be
    realized.   However,  immediate
    substantial  benefits  may  be
    obtained   through  phased
    implementation  of  the  program
    through reduction of redundant data
    collection, processing and storage.
    Over the long term the total impact
    will be realized as new information
    systems are  developed  and older
    systems are  upgraded  or replaced.
    The expected benefits  of a data
    resource management program in EPA
    are as  follows:

        Mechanism to assist in assuring
        consistent and  accurate
        responses from the  proper EPA
        sources  to  Congressional, OMB,
        GAO, and FOI inquires

        Reduced  data acquisition costs
        to  EPA and industry through the
        sharing  of data  across
        program/office  boundaries
        which will  reduce multiple
        reporting and processing of the
        same data

        Reduced  ADPE   procurement
        conversion  problems  and  costs
        through  standardized system
        documentation and common  data
        definitions  and codes

        Provide   coordination  points,
        standards,  and  pilot  studies
        that  will  lead  to  consistent
        systems which interface across
        programs, regions, and states

        Improve quality of  information
        by providing  an  awareness
        mechanism for  improving  data
        accuracy,   timeliness  and
        consistency.
These  benefits  of  data  resource
management will have significant impact
on how EPA conducts its business in both
the short and long term.

HI.  STUDY APPROACH

    The  Data   Management  and
Standardization   Program  Feasibility
Study was performed  in two phases:

        Phase  I  - A  requirements
        analysis phase to identify  and
        determine   EPA's  requirements
        for  data  management  and
        standardization

        Phase  II   -  A  program
        development  phase to  explore
        alternative  strategies for  a
        Data   Management  and
        Standardization Program in  EPA.

    Phase Ir the requirements analysis,
was approached from both a quantitative
and  qualitative  perspective.   The
methodology utilized in the performance
of this phase is depicted graphically
in Exhibit 1.

        Quantitative Approach - A major
        objective was  to identify
        concrete examples of EPA's need
        for  data management  and
        standardization.   Fifteen
        representative EPA systems were
        selected by the  EPA Steering
        Committee for this feasibility
        study on  the basis  of their
        potential  for sharing  of data
        across  systems,  the   systems
        visibility  in EPA, and as  a
        representative  cross   section
        from  the offices  in EPA.  The
        analysis focused  on five data
        categories which  were selected
        based on  their potential  for
        use  in  interfacing  files  and
        their current relevance to the
        major issues facing the Agency.
        A summary of the  specific data
        elements  reviewed  in  each

-------
                                                                             STUDY METHODOLOGY
INITIAL (UVItW
                         PROJECT PLANNING
                                                           DATA COUECTION
                                                                                              OATASVHIIUSIS
                                                                                                                                    IDENTIFY
                                                                                                                                      ISSUES
                                                                                                                    FINAUZAIUIN
     Uevdup
    Ililtmev)
    StheJvltl
Syiiti
• tit
.„,
n NlcfviiM
COS
ERSS
FMS
CICS
HSIS
NEEDS
MWEh Tiiai
EDS
NEDS
1
- PCS
- PEM5
- fPIS
- STORE!
- 10X1CS
ill Put
- RAPS/HAUS



/
Dili Eltw.M
• ImnitMy
• Okli»«(
Maliku

/

                                                          onit imtninn
                                                        EPA.MIPSO
Liteiilwi SmtU
• PiwwinAY&CPi*iKl
  Wailipip*" & fti|»ru ••
                                                     •  MBS fublic.li..!
                                                     •  Buufcs anal AflUt«
                                                          Piuiixu Ciiuionnili
                                                          Ui»Slu^«iluilu>
                                                          Dili Oittiuaiitt
                                                                                                                             Uili Minigtiuot tn4
                                                                                                                             Suilakli In tkt EPA
                                                                                                                          •  Lcid •! p[«|iwu
                                                                                                                          •  MumutYi ri>f(
                                                                                                                             UecbiubiHi
                                                                                                                          •  tmplcmeiitHtian
                                                                     •  Amlyti Pc«|i n«ciimuwiij«il
                                                                                                           Dill Ml>U|tu»ul mil Stl
                                                                                                           Ptttgriiit FiiiJ lltpafl
                                                                                                           •  itudv BlckyiUttud & Scu^v
                                                                                                           •  CuiriHl Stilus »1 Uali Miiiiii|e AtiJyin Htsulh
• tolullwn »l Alltinitnl Apimi
  !• Dill MjtHI|C»ICHl
• ImplcineutllllK t\tn & Slult)'«i
  Iw I
                                                                                                                                                                                                    m
                                                                                                                                                                                                    X
                                                                                                                                                                                                    I
                                                                                                                                                                                                    00

-------
    category and the systems  in
    which  they were  found  is
    presented  in Exhibit  2.  The
    review of systems  uncovered
    such problems as:

       Inconsistent and confusing
       identification  of
       facilities  exemplified  by
       10  definitions of what
       constitutes a  facility and
       15 different coding schemes
       used  for   identifying
       facilities

       Location boundaries  that
       were  not  consistently
       defined nor used  for cross-
       referencing across systems
       (i.e., locations  defined  by
       county,  SMSA,  and  river
       basin).

       A  minority  of the systems
       used  government-wide
       standard codes such as the
       PIPS state and  location
       codes.

These  findings confirmed the
problems   identified  by the
Committee,  and   effectively
demonstrated   the   need   for
improvement  in the management  of
agency  data

    Qualitative Approach - Another
    objective of Phase  I  was  to
    determine  perceived  needs and
    potential  benefits for an EPA
    data   management  and
    standardization program.  This
    was  accomplished  through
    interviews with cognizant
    individuals  in EPA,  interviews
    with   external  organizations
    such  as  the National Bureau of
    Standards and the Department of
    Defense/  and   a  literature
    review to determine  the state-
    of-the-art in  information
    management.  The  major  result
        obtained was  a determination
        that  most  programs  in EPA
        support the concept  of  a data
        management  program, and many of
        the programs have independently
        begun to develop one or more of
        the  components   of a  data
        management  program.    The
        primary need  of  EPA  was to
        identify all  the  efforts  that
        were taking place, coordinate
        the  individual  efforts to
        assure compatibility,  and
        develop   a  strategy  for
        integrating  and expanding the
        activities throughout EPA.

    Phase   II  focused  on   the
 identification  of the components of the
 program that are being developed in EPA
 and how best to  coordinate and integrate
 these  efforts  into  an  Agency-wide
 program.    Several  alternative
 strategies  more   identified  and
 presented  to EPA Advisory  Committee on
 October 18,  1978.  The committee
 concurred  with  our recommendation that
 the ongoing  efforts  be supported as
 pilot  projects and that a phased
 approach  be  taken to  expanding the
 successful pilot projects throught EPA.
A  schedule  for   this  controlled
 evolution  implementation was developed
so  that it  would  coincide  with and
complement  the  1980s  hardware
procurement  and  installation
activities.   The recommendations  are
summarized  in  the  remainder of this
summary.

IV.  RECOMMENDATIONS
    An  effective  EPA-wide  data
management and standardization program
must consist of three major components:

        An organization structure which
        supports   the   programs
        implementation    and
        administration,

-------
Sample Systems Comparison Matrix
                                    EXHIBIT 2
/ $ i
1. FACILITY IDENTIFIERS
COMPANY/AUTHORITY
NAME
COMPANY/AUTHORITY
CODE
FACILITY NAME
rACILITY 10 NUMBER
II. MONITORING SAMPLE
STATION SITE
SITE NAME
SITE COOE
Ml. GEOGRAPHICAL
LOCATION
REGION COOE
STATE COOE
STATE NAME
COUNTY COOE
COUNTY NAME
CITYCOOE
CITY NAMg
ADDRESS
ZIP COOE
OTHEB OEO.
IV. PARAMETER UNIT
IDENTIFIERS
UNIT
V. QUALITY ASSURANCE
COOES
O.A.



X
X


X

X
X

X
X
X
X
X
X
X



X
/


X















X




//




X




X
X


X

X
X
X
X




//

X

X
X

X


X
X

X


X
X
X
X

X

X
/ r



X
X


X


X

X
X
X
X


X

X

X
/ g

X
X
X
X





X

X
X

X

X
X




//



X
X




X
X




X
X

X



X
'*(



X
X




X
X

X
X
X
X


X

X


i

X
X

X




X
.X
X



X
X
X





//,



X
X





X

X

X

X

X




/ 
-------
        Policies  and procedures  to
        govern program operations, and

        Data management  tools such as
        standards,  data  element
        dictionary/directory  (DED/D),
        feasibility  studies,  and
        quality  assurance  programs
        which produce program  products.

    A discussion of the recommendations
regarding  the  DRM  program  components
and program  implementation follows.

1.   Recommended  Data  Resource
    Management Program Components

    The multi-faceted nature of the
Agency's programs, as well as the line-
staff  relationship  between
Headquarters, Regions, and Laboratories,
mandates  that there be a division of
responsibility  that  fosters
coordination but not  centralized
control of data.  Although centralized
control of data is appealing  because of
the  need  to share data  across
organizational  boundaries, centralized
control will not succeed because  of the
Agency structure and the programs' needs
for  responding  to  their  specific
missions.   EPA requires a balanced
approach  that   will   encourage
cooperation  among  programs with
appropriate independence of program and
regional  operations.  The strategy
selected  reflects  this  balance  as
follows.

    (1) Organizational Structure

       In performing  the evaluation of
    the alternative organizational
    structures,  we  considered  the
    inherent  ability   of  each
    alternative to obtain:

       Management commitment,

       Programmatic participation,  and
    Balance  of  authority  and
    responsibility.

Our recommendation,  a  distributed
approach, is  a hybrid  of  the
classical  centralized  and
decentralized    management
approaches.   The  basic philosophy
is  to   locate   the  DRM
responsibilities  at  the level  in
the organization  best suited  to
perform  the  specific   tasks
consistent  with   broader  EPA
missions.   The  structure  includes
both an EPA ADP  Oversight Committee
and oversight committees for  each
program.  This concept is presented
graphically in  Exhibit 3. -'

    The  ADP  Oversight Committee,
reporting  at  the  Administrator
level,  would  be  responsibile for
Data Resource  Management  program
policy.  An oversight committee  in
each  programmatic  area  would
monitor  adherence  to programmatic
level  data management concepts and
support the EPA-wide ADP Oversight
Committee.    A  prototype
programmatic oversight  committee  is
being  formed by OWPO  to support the
Wastewater Treatment (WWT) Facility
Program.   The  chairperson  of the
programmatic  committee would  be  a
member of  the Agency committee.   A
Data Administrator  (DA)  and staff
would  support  the  Committee  by
reviewing  feasibility  studies,
system designs and implementations;
updating the EPA-wide data resource
directory  with  selected input from
programmatic  areas;  auditing
systems   to  assure  standards
compliance; and evaluating  systems
to assure  that  EPA-wide
requirements  were  being  met.  The
programmatic  staffs would  retain
their  current responsibilities for
doing  feasibility studies,  system
design,   implementation,  and
operations.  In  addition, they would
maintain   a program  specific  data

-------
Distributed Data Management
   Organizational Structure
                                      Program ADP \
                                      Oversight   j
                                      Committee  /
                                                        m
                                                        X
                                                        55
                                                        CO

-------
 element  dictionary/directory  and
 provide the Data Administrator with
 the subset of metadata  (data about
 data) for an Agency-wide directory.
 Programmatic areas  should  also
 audit  data  quality  and perform
 system  evaluations.   Exhibit  4
 presents  an  overview  of  the
 responsibilities  in this approach.

 (2) Policies  and  Procedures

    A number  of  policies must  be
 established  by   the   Data
 Administrator and his staff  in
 conjunction  with  the  committees.
 These  would include specific
 assignment  of   program
 responsibilities, tools  to  be
 utilized,  and quality assurance
 criteria.   A  key concept of data
 resource management  is  the
 development of a  consistent  set of
 policies  with  regard  to the
 management of data  or  information.
 This  set of  policies  is the
 cornerstone upon which the program
 is  built and which  defines  the
 authority and  limitations  of
 activities related.  The procedures
 which should be written include the
 areas of:

    Setting and  promulgating  data
    standards,

    Evaluating  and   approving
    requests,

    Auditing  systems,  and

    Updating  data management tools.

 (3) Data  Management Tools

    We recommend that the following
data management  tools  be  employed
to effectively manage data  as  a
 resource.   The use  of  these tools
 is  prescribed,  implemented, and
maintained by the  organizational
component  of   the data management
program through the use of program
policies and procedures.

    Standards  - Three  major  types
    of  standards  should  be
    developed and  applied  to
    facilitate data management:
        Data
element
        standardization - involves
        the use  of common data
        definitions,  data use,
        coding schemes, and  naming
        conventions.

        System   Design   and
        Documentation  Standards -
        i nvo1ve s  the  use  o f
        standardized  methods  for
        system  design and
        documentation   throughout
        the system development life
        cycle, acceptance criteria
        to  determine the adequacy
        of  the system,  and a  system
        change approval process.

        Data  Acquisition
        Techniques  -  involve  the
        use of data  collection
        approval  procedures, form
        design and instruction
        guidelines,   document
        tracking  procedures  to
        provide an audit trail  for
        locating  lost  source data,
        and  verification
        techniques  to  control  the
        accuracy of  data entry.

    p_ a t  a	 E _l_e_2!_i_n t_
    Dictionary/Directory (PEP76") -
    A   data    e  1  e m e  n t
    dictionary/directory is a
    software tool  used to control
    and manage metadata.  It is a
    central   repository  of
    information  about each data
    element in related  systems,  and
    facilitates  access to  and
    control of the data bases. This
    tool does not  manage  the actual

-------
                                    Distributed  Data Management Intersystem Life Cycle
                 Policies
                  Feasibility
                   Studies
   Design &
Implementation
Dictionary/
Directory
  System
Operations
Audit
 Systems
Evaluation
Oversight
Committee
Data
Administrator
Programmatic
Staff
 Set*
Promulgate
 Monitor
                                                                                 Feedback
                                                                                 Feedback
                                                                                                                                             X

                                                                                                                                             03

-------
        content of the data, but manages
        the    descriptive
        characteristics of  that  data.
        The identification of  the
        individuals responsible for the
        quality  and dissemination of
        data in the specific systems is
        also important.  When a request
        is  received  for  specific
        information  from  other  EPA
        Offices, Congress, GAO, or the
        public,   access  to   the
        dictionary  can help determine
        if  the  information  is
        available, and whom  to contact.

        Feasibility Studies - Planned
        systemdevelopments should be
        reviewed  in a data  management
        program  so  that  redundant and
        inconsistent  data  can  be
        identified  before the designs
        are implemented.

        Quality  Assurance Program - A
        common problem  is  a  general
        lack of confidence in  the
        quality  of  the data contained
        in an organization's  systems.
        Quality Assurance is essential
        to information management, for
        without  reliable data and the
        confidence  of  its  users, the
        most  efficient  and powerful
        system is ineffective.

2.   Recommended Approach To Data
    Resource   Management  Program
    Implementation

    Arthur Young &  Company  recommends
the  phased  implementation approach
graphically depicted in Exhibit 5.  Use
of  a  phased  program  implementation
will:

        Distribute  the  resource
        requirements    for
        implementation   over  a
        reasonable period of time,
        Permit  appropriate  lead time
        for programmatic  areas  to
        prepare   for  program
        implementation,  respond  to
        educational process, and

        Allow   the  proposed ADP
        Oversight  Committee  and
        supporting  Data  Administrator
        and staff  to  concentrate on
        interfacing programmatic areas
        as  program participants.

Implementation activities include those
phases shown in Exhibit 5 and  follow
the  program plan  to  effect  a full
program implementation.  The first phase
of this  Exhibit, Data Management and
Standarization Program Plan,  is  shaded
to indicate that it has been  completed
with  the presentation  of this  report.
The  objectives  and activities   of the
remaining phases are discussed below:

    (1) Establish Data Management Task
        Force

        The first step is to  establish
   a Data  Management Task Force.  The
   task force should:

        Identify  members for the ADP
        Oversight  Committee  and
        supporting  program   oversight
        committees throughout EPA,

        Prepare draft  policies  and
        procedures, and

        Recommend  the  specific  levels
        of  administrative authority and
        the method for delegation.
    (2)  Establish
        Structure
  0 rgani zation
ToSupport  the
        Program

        This  activity  involves
    determination  of  the  working
    procedures  for  both  the  ADP
    Oversight  Committee   and
    programmatic oversight committees.

-------
EPA Data Management and Standardization  Program  Life Cycle
                                                          Data Management
                                                          and Standardization
                                                          Program Plan
              Evaluate Program
              After Full
              Implementation
                                                                         PHASE 2
                                                                         Establish Data
                                                                         Management Task
Implement Full
Program
Operations
                                                                   PHASE 3
                                                                 Establish Organization
                                                                 Structure to Support
                                                                 Program
     Phased Development
     and Implementation
     of Tools
PHASE 7
User Training
                                             Set and Promulgate
                                             Policies and
                                             Procedures
                 Implement Use
                 of Tools
                                                             •  ADP Oversight
                                                               Committee
                                                             •  DA and Support
                                                               Staff
                                                             •  Programmatic
                                                               Level Oversight
                                                               Committees
                                                                                                      X
                                                                                                      I
                                                                                                      CD
                                                                                                                Ul

-------
It provides  identification of the
authority  and  delegation  of
responsibility  to  the  committee
support  staffs.   It results  in a
schedule  for  phased  program
implementation.

(3) Set  Policies and Procedures

    In this phase, the ADP Oversight
Committee would formalize the data
management  and  standardization
policies and  procedures.   These
policies and  procedures will be
passed to the  programmatic level
oversight committees to  serve as a
basis for planning pilot programs,
actual  implementation of  the
programs,  and  ongoing program
operations.

    Policies  must   include
assignment  of program
responsibility  approval  flows,
utilization  of general  tools, and
issuance  of  quality  assurance
criteria.  Procedures must describe
the  processes  through which data
management  policies  can  be
executed.

(4) Develop  and Implement Tools

    Once the  programmatic  level
oversight   committee  staff  is
assembled,  work  can  begin  on
installation of the tools for data
management and standardization, and
appropriate user  training.   The
plans   for  pilot  program
implementation should be formally
documented  for review by the ADP
oversight  committee.   The phased
approach would develop by providing
other programmatic  areas with the
particular  tool  in a sequential
implementation  effort.   The
specific actions  that  should be
taken  are  discussed  in  the
following paragraphs:
Standards

Data  element  standardization
efforts need to be coordinated
to  assure that the  codes  and
definitions established in  one
project are  compatible with
other  projects  and program
areas.  Cross-reference indices
or  conversion  tables   for
measurement  units should be
used as a method  of relating
data.

System design and documentation
standards efforts should be
focused on  the  establishment
and   promulgation   of
standardized methods for system
design  and  documentation
throughout   the  system
development  life  cycle.
Acceptance criteria and change
approvals should be defined and
developed  to  determine  system
efficiency.  A major  effort for
establishing   these  standards
for documentation  is underway
for the 1980's procurement, and
this impetus  would  provide  a
particular  opportunity to
ensure incorporation of these
standards.   EPA has  a  unique
opportunity to accelerate  this
process  when  the  1980's
hardware  procurement will
necessitate   the  conversion
and/or   replacement  of
applications software.  During
this  transaction,  the
incremental  cost  of including
and following  data management
standards for a selected set of
critical  systems  should be
relatively small.

Standards for  data acquisition
techniques  should address
design  of  forms,  data
collection   methodologies,
document tracking  procedures
and verification techniques for

-------
data  entry.    A coordinated
effort  is  needed to address
data  acquisition  in   a
systematic  manner.

E> _c3_t_a	E  1 e m  e  n  t
Dictionary/Directory (DED/D)

Our  recommendation  is  to
implement  a  passive  data
element  dictionary/directory
on a hierarchical basis. This
allows  for  the retention of the
current EPA dictionaries with
their  various  formats,
contents,  and  media;  and
overlays  an  Agency-wide
directory.    The  Agency
directory would be organized by
data category  and contain  only
key  identification  data, or
metadata, on  data elements in
the  programmatic  data  bases.
The  current  MIDSD effort to
build  a DED/D  for  selected
systems is  the vehicle for  not
only setting standards but  for
eventually providing  DED/D
capabilities to  the  other
programmatic areas.  An example
is  the WWT  Data Dictionary
which   is  currently  being
developed  for the Office of
Water   Program Operations
(OWPO).

Finally,   we  recommend   the
evaluation  of an active DED/D
in conjunction with the 1980's
ADPE  procurement.    The
specification  for supporting an
active DED/D could be included
in the RFP  for the procurement.

Feasibility Studies and Reviews

Coordinated  feasibility
studies in a  data management
program reduce  redundancy  and
data    inconsistency.
Feasibility studies  should be
utilized  to  determine  and
justify  appropriate  levels  of
data standardization, required
data     element
dictionary/directory   (DED/D)
contents  and level of detail,
and program  policy  and
procedural  requirements and
impacts. The  proposed mini-
computer Review Group comprised
of the  Regional and Laboratory
site ADP  managers  can  also
provide  effective control  on
standards  for the  Regions
provided    sufficient
interaction with national level
personel  is  maintained.   Our
recommendation is  that  the
Agency ADP Oversight Committee
endorse  the  MIDSD  review
process, and that the Committee
members from each programmatic
area ensure that the procedures
are followed  for  their
programmatic systems.

Quality Assurance Program

A common problem is a general
lack of confidence in  the
quality of the  data contained
in  EPA's systems.  We recommend
a coordinated effort supported
by  management  to  encourage
systems groups to initiate such
system  audit  activities,  and
secondly,  to  assure  that- the
programs do not focus just on
11 clean ing-up" the current files
but  also  address  and correct
the causes.  The steps that can
be taken to  improve the quality
of  the data include:

-   Provision  for  effective
    edit procedures,

    Standardization  of  data
    definitions,

    Revision  of  forms  and
    procedures  for  simplicity
    and clarity, and

-------
            Adherence to periodic audit
            procedures.
    (5) Implement
        Operations
Full  P rog ram
        At the  conclusion of the final
    phase  of  program  implementation,
    all EPA organizational components
    should  be  participating  in  an
    active,  organization-wide  Data
    Management  and  Standardization
    Program.   Due to  the  large scale
    nature of  this  activity the ADP
    Oversight Committee may be required
    to  reassess  original  working
    procedures, delegated authority and
    responsibility.  In addition, this
    committee should begin to plan for
    additional  information management
    activities  to  be  projected  over a
    subsequent  five-year time period.

        At the completion of each phase,
    a document  relating the activities
    performed should be produced.  This
    document should be submitted to EPA
    management   for   review  of
    activities, adherence to schedules,
    and progression  to the next phase
    of the implementation.   Exhibit 6
    presents  an  approximate  schedule
    for the implementation plan.

    (6) Evaluate Program  After  Full
        Implementat ion

        A   Post-Implementation
    Evaluation  report   should  be
    developed to include an evaluation
    of program performance, operational
    costs, and areas for improvement and
    future enhancement.

3.   Program Cost Estimates

    The  cost   estimates  for   the
implementation and annual operations of
an  EPA  Data  Management  and
Standardization  Program  are presented
in  Exhibit 7.   Since  the  recommended
implementation  plan  for this  program
includes a phased approach over a five-
year period,  details  for each year of
the five-year implementation plan are
shown  in the detailed  final report.
Exhibit 8 presents the estimated total
annual operations costs for the program
after  full  implementation,  for  an
additional  five-year period.  A 7% cost
escalation factor was added to the cost
estimates  for each  successive  year
beyond  fiscal years  78-79  to account
for anticipated  inflation.    The  cost
estimate matrices  cross-tabulate  the
cost  elements  of  the  program
implementation  and  operations  life
cycles.  The  specific assumptions and
costing  details  for  the recommended
program  are  discussed  in   the  final
report.

V.   CONCLUSION
                       The management of  information and
                   data as a scarce resource is  a  concept
                   that  will shortly  be  introduced and
                   implemented  throughout  all Federal
                   agencies.   EPA recognized the value of
                   this approach to information management
                   and ADP management before  the  current
                   initiative  was  being  planned  by
                   Congress and the President. Individual
                   components of data resource management
                   program have been in  development and
                   operation  in specific  programs of EPA
                   for several years. The  objective of the
                   Data  Management  and  Standardization
                   Program outlined  in  this  report  is to
                   coordinate the  individual efforts into
                   a  unified,  Agency-wide  program  that,
                   along  with  the   1980s hardware
                   procurement  effort, will put EPA in the
                   vanguard  of  modern  Federal  ADP
                   operations.
                                      10

-------
EPA Data Management and Standardization Program Implementation Schedule

Establish Data
Management Task Force
Establish Organization
Structure to Support
Program
Set and Promulgate
Policies and Procedures
Phased Development and
Implementation of Tools
• Implement Use of
Tools
• User Training
Implement Full
Program Operations
Evaluate Program
After Full Implementation
FY 79-80
o







N







D







J







F







M







A







M







J







1







A







S







FY 80-81
0







N






•
D


,




J



i



F



I



M







A







M







I







J







A







S








0







N







FY 81-82
D







]







F







M







A



f



M







J







I







A







S







FY 82-83
0







N



\
D




I




J







F




i


M







A







M







I







J







A







S







FY 83-84
o







N







D







J







F







M







A







M







J





I

J







A

-





S







                                                                    X
                                                                    X
                                                                    DO

-------
EPA Data Management and Standardization Program


               COST ESTIMATE MATRIX

           COST SUMMARY -  FIVE-YEAR PLAN
LIFE CYCLE
PHASE
DATA
MANAGEMENT
PROGRAM
DEVELOPMENT
COSTS
COST
ELEMENTS
ESTABLISH
DATA MANAGEMENT
TASK FORCE
ESTABLISH
ORGANIZATION
STRUCTURE
SET AND PROMUL-
GATE POLICIES
AND PROCEDURES
TOTAL DEVELOPMENT COSTS
PROGRAM
IMPLEMENTATION
COSTS
IMPLEMENT USE
OF TOOLS
USER
TRAINING
TOTAL PROGRAM
IMPLEMENTATION COSTS
TOTAL PROGRAM DEVELOPMEHT
AND IMPLEMENTATION COSTS
ANNUAL PROGRAM
OPERATIONS COSTS
PERSONNEL RESOURCES
EPA
CLERICAL
$ 1.400
$ 4,000
$ 4,400
$ 9,800
$120,800
$ 2,200
$123,000
§132,800
$1,382,500
EPA
PROFESSIONAL
$ 49,000
$ 32,200
$ 63,100
$ 144,300
$ 300,500
$ 24,400
$ 324,900
$ 469,200
$1,938,700
EPA
SYSTEMS
$ 9,200
$ 127,000
$ 269,800
$ 406,000
$ 854,900
$ 51,500
$ 906,400
$1,312,400
$2,595,100

DOCUMENTATION
$ 2,000
$ 1,400
? 4,000
$ 7,400
$ 27,500
$ 18,600
$ 46,100
$ S3, 500
$ 179,900

OUT-OF
POCKET
EXPENSES
$ 2,000
$ 1.400
$ 4,000
$ 7,400
$ 27,500
$ 18,600
$ 46,100
$ 53,500
$ 179,900
TOTALS
OPPOR-
TUNITY
COST
f 59,600
! 163,200
? 337,300
J 560,100
51, 276, 200
$ 78,100
$1,354,300
$1,914,400
$5,916,300

OVERALL
TOTAL
$ 61,600
$ 164,600
$ 341,300
$ 567,500
$1,303,700
$ 96,700
$1,400.400
$1,967,900
$6,096,200
                                                                               m
                                                                               X
                                                                               I

                                                                               CO

                                                                               H
                                                                               vl

-------
EPA Data Management and Standardization Program
         FIVE-YEAR COST AFTER FULL IMPLEMENTATION

FISCAL YEAR
33-81
81-85
85-86
86-87
87-88
TOTAL
PERSONNEL RESOURCES
EPA
CLERICAL
$ 765,300
$ 818,900
$ 876,200
$ 937,500
$ 1,003,100
$ 4,401,000
EPA
PROFESSIONAL
$ 1,006,300
$ 1.076,800
5 1,152,100
$ 1,232,800
$ 1,319,100
$ 5,787,100
EPA
SYSTEMS
$ 1,392,400
$ 1,489,900
$ 1,594,200
$ 1,705,800
1,825,200
8,007,500

DOCUMENTATION
$ 102,300
$ 109,500
$ 117,100
$ 125,300
$ 134,100
$ 588,300

OUT-OF
POCKET
EXPENSES
$ 102,300
$ 109,500
$ 117,100
$ 125,300
$ 134,100
$ 568,300
TOTALS
OPPOR-
TUNITY
COST
S3, 164, 000
53,385,600
!3, 622, 500
H, 876,100
4,147,400
18,195,600

OVERALL
TOTAL
$3,266,300
$3,495,100
$3,739,600
$4,001,400
4,281,500
18,783,90
                                                                              m
                                                                              X
                                                                              X
                                                                              CD
                                                                              H
                                                                              00

-------
                         FOOTNOTES
— H.R. 3570, "The Proposed Paperwork and Redtape Reduction
  Act of 1979", April 10, 1979, p.l.

2/
—'"Paperwork and Red Tape - New Perspectives - New Directions,"
  A report to the President and the Congress from the Office
  of Management and Budget, September 1979,  p.6.


-/Ibid., p.19.

4/
—'Richard L. Nolan, Ph.D., "Computer Managers to Data Resource
  Managers," Nolan, Norton & Company, Lexington, Mass.,  1978.


—EPA is now establishing a standing DAA Committee on
  Monitoring and Information Management which reports to
  the Administrator, Data Resource management issues will
  be among the issues to be addressed by this committee.
  (Refer to MIM#79-1, memorandum from the EPA Administrator,
  subject 1 "Monitoring and Information Management in EPA",
  date September 18, 1979).

-------