FIPS PUB  Usage  Report
             Prepared for the
 Information  Management Services Division
Office of Information  Resource Management
   U.S. Environmental Protection  Agency

                 June  1989

           Contract GS-OOK85-1AFD2777
             Task Number N4B688019
AMERICAN MANAGEMENT SYSTEMS, INC

-------
      FIPS PUB Usage Report
             Prepared for the
 Information Management Services Division
Office of  Information Resource Management
   U.S. Environmental Protection  Agency

                June 1989

           Contract GS-OOK85-1AFD2777
             Task Number N4B688019
AMERICAN MANAGEMENT SYSTEMS, INC.

-------
1.0    Introduction	  1
      1.1    Background	  1
      1.2    Purpose and Scope	  1
      1.3    Procedures Used in Analysis of Data Standards	  2
      1.4    Layout of Paper	  2

2.0    Analysis of FIPS PUBs Standards  	  4
      2.1    Calendar Date - FIPS PUB 4-1  	  4
      2.2    States and Outlying Areas of the U.S. - FIPS PUB 5-2	10
      2.3    Counties and County Equivalents - FIPS PUB 6-3	11
      2.4    Metropolitan Statistical Areas (MSAs) - FIPS PUB 8-5	12
      2.5    Congressional Districts - FIPS PUB 9	14
      2.6    Countries of the World - FIPS PUB 10-3 & 104-1	15
      2.7    Named Populated Places, Primary County Divisions, and Other
            Location Entities of the U.S. - FIPS PUB 55-2	16
      2.8    Local Time of Day - FIPS PUB 58-1	17
      2.9    Representations of Universal Time - FIPS PUB 59	18
      2.10   Standard Industrial Classification (SIC) Codes - FIPS PUB 66  	19
      2.11   Geographical Point Locations - FIPS PUB 70-1	20
      2.12   Standard Occupational Classification (SOC) Codes - FIPS PUB 92  ... 22
      2.13   Federal and Federally-Assisted Organizations - FIPS PUB 95  	23
      2.14   Hydrologic Units  in the U.S. and Caribbean - FIPS PUB 103 -
            from the U.S. Geological Survey	24
      2.15   Future FIPS PUB  Data Element Standards 	25

3.0    Analysis of Non-NBS Coding Schemes	26
      3.1    Hazardous Material ID Number	26
      3.2    Social Security Number	26
      3.3    Human Sexes	27
      3.4    ZIPCodes	27
      3.5    Animal and Plant Taxonomic Schemes	28

4.0    Current EPA Data Standards Program Development	29
      4.1    EPA Data Standards Program Policies 	29
      4.2    EPA Standard Data Elements  	30
            4.2.1 Chemical Abstract Service (CAS) Number  	30
            4.2.2 Electronic Transmission of Laboratory Measurement
                 Results	31
            4.2.3 Facility Identifier	31
            4.2.4 Geographical Point Locations  	32
            4.2.5 The Electronic Data Interchange (EDI) Work Group  	32

5.0    Summary  of Findings and Recommendations	33
      5.1    Review of FIPS Standards and Usage	33
      5.2    Review of Other Non-NBS Standard Coding Schemes  	35
      5.3    Review of Data Standards Program	35

-------
1.0    Introduction

1.1    Background

      EPA has numerous ADP systems to support its management directives and
regulatory and enforcement functions. These systems were developed over the past
years as the Agency's information needs expanded.  Because many of these systems
were designed by individual EPA program offices, this pattern of ADP system
development has resulted in many non-uniform coding practices and incompatible
data structures, leading to inefficient data communication across media boundaries.

      To improve EPA's data management, the Agency has undertaken a Data
Standards program.  As part of this program EPA-wide data standards are studied,
approved and promulgated to increase consistency and portability of data within the
Agency and between the Agency and outside sources (such as the States or industry).
A previous report, Analysis for  the EPA Data Standards Program (December, 1982),
outlined the Data Standards Program goals and procedures and analyzed the
National Bureau of Standards (NBS) library of standard data elements. Since the
first publication of the Analysis... , the inventory of EPA information needs and
hence, information systems, has continued to grow.  Simultaneously, the library of
NBS standards has also continued to grow and change. Therefore a need has arisen
to reevaluate the NBS standards  and their applicability to EPA's information needs.

1.2    Purpose and Scope

      This document is intended to support the EPA's ongoing  Data Standards
program. In particular, it analyzes the current NBS standards, the current EPA
standards and the usage of both within Agency systems. Each NBS standard  is
analyzed for:

      •     Applicability to EPA's information needs
      •     Presence of an existing or proposed EPA standard
      •     Usage within Agency systems

-------
      In addition, non-NBS standard data element coding systems listed in FTPS
Publication 19-1 are studied as possible candidates for EPA standardization.

1.3    Procedures Used in Analysis of Data Standards

      The analysis for this document followed a three-step procedure. First, the
current NBS and EPA standard data elements and EPA data standards policies and
procedures were identified. This step was accomplished through review of the
Federal Information Processing Standards Publications (FIPS PUBs), the EPA
Information Resource Management Policy Manual  and other Agency materials.
Second, the applicability and current usage of standards was analyzed. This second
step was accomplished through interviews with key personnel and reviewing
documentation for a sample of EPA's major systems. Major systems are defined as
those  rated as level 1 in the Information  Systems Inventory  (ISI). An interview was
also conducted with a key person from the Facility Index System (FINDS) due to the
system's role in developing a common coding scheme for EPA-regulated facilities.
The third step involved analyzing the findings from the steps above, determining
recommendations and preparing this report.

1.4    Layout of Paper

      The remainder of this document is divided into four sections.

      •     Section 2 "Analysis of FIPS PUBs Standards" - defines each NBS
            standard for data elements,  describes the findings of its applicability and
            usage within EPA systems,  and recommends whether or not to adopt it
            as an EPA standard;

      •     Section 3 "Analysis of Non-NBS Coding Schemes" - defines those non-
            NBS standards listed in FIPS PUB 19-1 that may be applicable to the
            EPA, briefly summarizes findings regarding each coding scheme;

-------
Section 4 "Current EPA Data Standards Program Development" -
discusses EPA's current and proposed data standards policies standard
data elements; and

Section 5 "Summary of Findings and Recommendations" - contains a
brief summary of findings regarding EPA 's Data Standards Program.

-------
2.0    Analysis of FIPS PUBs Standards   />   /   ^ 6^, /

                   htrea»-ef^teTOla7o!s*9evelops and maintains a library of
standardized data elements and representations for use in Federal automated data
systems.  These standards are published in the Federal Information Processing
Standards Publications (FIPS PUBs) and are issued to all Federal agencies. Currently
there are fourteen areas covered by data standards. This chapter examines each of
these standards for possible adoption by EPA and recommends actions to be taken by
the Agency.

      Each of the fourteen data standards areas is discussed separately below.
Discussion of each data standard is broken into four sections:

      •     Description of the standard
      •     Applicability of the standard to EPA's information needs
      •     Current usage of the data standard within EPA systems - includes
            existence of current or proposed EPA standard
      •     Recommendations for EPA data standards program

      Exhibit 1 summarizes the descriptions of each FIPS standard data element
reviewed. Exhibit 2 summarizes the recommendations for each FIPS.

2.1    Calendar Date -FIPS PUB 4-1

                                 Description

      This standard specifies that dates must be recorded using the Gregorian
calendar in a YYYYMMDD format. These dates should be stored in numeric  fields.
This standard updates the old FTPS standard of YYMMDD calendar dates.

-------
                             Exhibit 1
                 Standard NBS Data Elements
                           ttitf
FIPS
                Subject
                                        Definition
 4-1

 5-2

 6-3


 8-5
 10-37
 104

 55-2
 58-1

  59


  66


 70-1

  92


  95


 103
Calendar Date

States and Outlying Areas

Counties and County
Equivalents

Metropolitan Statistical Areas
(MSAs, CMSAs, PMSAs)

Congressional Districts

Countries and Principle
Administrative Divisions

Populated Places - Cities,
Towns, County Divisions

Local Time of Day

Universal Time, Time
Differentials & Time Zones

Standard Industrial Class
Codes (SIC)

Geographical Point Location

Standard Occupational Class
(SOC) Codes

Federal & Federally Assited
Organizations

Hydrologic Units
8-digit numeric, Gregorian Calendar
YYYYMMDD
2-digit numeric and 2 character
alphabetic abbreviation (Postal Codes)
3-digit numeric, unique within each
state

4-digit numeric for metro stat areas
3-digit for consolidated metro areas

2-digit numeric unique within each
state
2-character country code and 4-char
primary divisions

5-digit numeric unique to each U.S.
location

Representations of 12 & 24 hour clock
times
Represenations of 12 & 24 hour
universal time, 3-char time zones, &
time differentials
2-4 digit numeric identifying major
business classes and sub-classes

3 represent ations of point locations -
Lat/Long, UTM, & State Plane Coord.
2-4 digit numeric identifying major
occupation classes & sub-classes

4-digit numeric: 2-digit Treasury
Agency Symbol + 2 digit division code

2,4,6,or-digit numeric for U.S. bodies
of water

-------
                   Exhibit 2
Recommendations for NBS Standard Data Elements
Subject
Calendar Date
States and Outlying Areas
Counties and County
Equivalents
Metropolitan Statistical Areas
(MSAs, CMSAs, PMSAs)
Congressional Districts
Countries and Principle
Administrative Divisions

ADOPT 1
*j
*f
*
^j
*j
^J

DO NOT 1
ADOPT 1







STUDY 1
FURTHER!







Comments
Should be standard for all existing and future
systems; existing non-standard data codes may be
exempted if conversion is too costly.
Should be standard for all existing and future
systems; existing non-standard data codes may be
exempted if conversion is too costly.
Should be standard for all existing and future
systems, where applicable; existing non-standard
data codes may be exempted if conversion is too
costly or information would be lost by conversion.
Should be standard for all existing and future
systems; standard essentially in place already.
Should be standard for all existing and future
systems; standard essentially in place already.
EPA should adopt the ISO standard instead of the
alternate NBS standard in FIPS 10-3; the ISO
standard is more widely accepted.

-------
                                   Exhibit 2 (continued)

                  Recommendations for NBS Standard Data Elements
                                          -"•^b*'^.*--
            Subject
6
o
                                           tfl
                                                                Comments
Populated Places - Cities, Towns,
County Divisions
Local Time of Day
Universal Time, Time
Differentials & Time Zones

Standard Industrial Class Codes
(SIC)
Geographical Point Location
Standard Occupational Class
(SOC) Codes
              Should be standard for all future systems;
              conversion of existing systems is recommended only
              where benefits outweigh costs.

              Should be standard for all existing and future
              systems; existing non-standard data codes may be
              exempted if conversion is too costly.

              Not currently applicable to EPA, but should be
              adopted to avoid future problems.

              Should be standard for systems where SIC codes
              provide the necessary detail; conversion should be
              determined on an individual basis.

              Adopt latitude/longitude representation in all future
              systems and convert existing systems where benefits
              exceed costs.

              Not applicable to EPA currently, but should be
              adopted to avoid future problems.

-------
             Exhibit 2 (continued)
Recommendations for N-BS Standard Data Elements
Subject
Federal & Federally Assisted
Organizations
Hydrologic Units
ADOPT 1


OP
D<
*

<*> £


Comments
Not applicable to EPA.
Should be standard for all existing and future
systems; existing non-standard data codes may be
exempted if conversion is too costly.

-------
                                 Applicability

      This element is used in record structures to identify when the information
was last updated.  Administrative and enforcement divisions use this code to
schedule and chronologize events.  EPA programs use date information extensively
to perform their missions.

                                    Usage

      The 6-digit YYMMDD format of the Gregorian calendar is the most common
representation within Agency systems. Few systems use the 8-digit version.
Variations of order as well as structure exist.  Some systems rely on a five-digit
Julian calendar code which identifies the year and the day of the year (1 through
365), while others use the Gregorian calendar but store dates in MMDDYY format.
Currently the only EPA standard regarding this element is a standard format for
electronically transmitting dates. The standard is contained in the Transmittal
Order "Data Standards for the Electronic Transmission  of Laboratory Measurements
Results" and specifies that date formats should be in the format YYMMDD.

                              Recommendation

      Since so much of the Agency's information is date-dependent, we
recommend the adoption of this FTPS standard and the update of the "Data
Standards for  the Electronic Transmission of Laboratory Measurements Results"
directive.  Although the "Electronic Transmission..." paper has already established a
standard, the omission of a century identifier will require a larger conversion effort
of electronically transmitted data in the future, when the year 2000 approaches.
      A common internal coding scheme will facilitate the exchange of
information across program boundaries.  Adopting this standard with a four digit
code for the calendar year will prepare the EPA for the changing of the century,
eleven years from now.  Existing file structures which do not conform to the
proposed standard will be granted exemptions if the effort required to convert the
codes is greater than any benefit which would result from  standardization. Since

-------
most date formats are readily convertible and many already conform to the old 6-
digit YYMMDD FTPS standard, no significant difficulties are foreseen in
implementing this standard.2.2 States and Outlying Areas of the U.S. - FIPS PUB 5-2

                                 Description

      Two-character alphabetic abbreviations and two-digit numeric codes are
defined for each of the states and U.S. territories. The abbreviations are identical to
the Postal Service codes and therefore are useful for identifying establishment
locations and for producing mailing addresses.
                                Applicability

      State codes are used by Agency programs for location, jurisdiction, and
mailing address identification.  The digital format is applicable where machine and
numeric sorting are required, since a sort on the numeric code arranges the States in
alphabetical sequence.
      Presently, the Agency uses several different alphabetic and numeric coding
schemes for states. The FIPS standard is used most widely, especially for mailing
addresses, since the standard uses the Postal Service codes. However, some systems
use a hybrid of the Postal codes, adding additional program-specific codes for
internal use.  Some air program systems use an alternative numeric code based on
the code found in SAROAD. Currently there is no EPA standard for this data
element.

                              Recommendation

      This standard should be adopted by the EPA for internal use. The Agency will
benefit from adopting a common  coding scheme for identifying the 50 states and
other territories.  It will improve communications between the program offices,
                                      10

-------
between EPA and external organizations, and it will support the concept of
establishing unique location identifiers. This standard is particularly critical because
much of EPA's funding and enforcement activity is state-specific. Also, states are
assuming many of the data gathering responsibilities which were previously
performed by the EPA. A well-defined state data element will aid in the transfer of
information between the state and Federal levels.

      Since the fifty states are well defined, there should be little difficulty in
conversion between alternative coding systems. However, different systems have
different needs for identifying the outlying areas and trust territories of the United
States and other locations or entities outside of the states. An example is including
an extra code for each region to identify locations specific to a region but not to a
state. Conversion of these other territories may lead to more significant difficulties.
As such, EPA should adopt the Postal codes as the standard to be used for the 50
states, adopt the Postal codes for the outlying areas to be used where possible, and
allow the codes to be expanded to include additional codes where necessary.

2.3    Counties and County Equivalents - FIPS PUB 6-3

                                  Description

      The FIPS representation of this data element consists of three-digit numeric
codes for each county or county equivalent in the U.S. The counties are arranged in
alphabetical order and the corresponding codes are unique within each state. For
example, the first counties in Alaska and Wyoming will each be designated '001'.
This sequence, when combined with the FIPS numeric state code discussed earlier,
will unambiguously identify each county in the United States.

                                 Applicability

      Many EPA systems use county codes as a means of identifying important
areas or locations.  Counties are used to track problem pollution areas, resource
allocation and funding levels.
                                      11

-------
                                   Usage

      Currently, the Agency uses at least two different coding schemes for county
identifiers.  The FIPS standard is used in many EPA systems, while the SAROAD
county identifier is used in several air program systems. Currently there are no EPA
standards for county codes.

                              Recommendation

      The Agency should adopt standard county codes to increase consistency and
portability of data.  Since there is no current EPA standard, either formally or
informally, the standard published in FIPS PUB 6-3 should be adopted to conform
with Federal guidelines.

2.4    Metropolitan Statistical Areas (MSAs) - FIPS PUB 8-5

                                 Description

      MSA (formerly SMSA) codes are elements which identify integrated social
and economic units. Each contains a unique four-digit numeric identifier plus a
one-digit alphabetic population level code  Nine criteria, including population, are
used to identify MSAs, which may cross state boundaries. MSAs are used to define
large residential centers and other areas of dense population which are of particular
concern to the Agency.  In October 1984, FIPS PUB 8-5 revised the definitions of
MSAs. The updated standard  revised the criteria used to identify MSAs and also
established three new statistical areas: the Primary Metropolitan Statistical Area
(PMSA), the Consolidated Metropolitan Statistical Area (CMSA), and the New
England County Metropolitan  Area (NECMA).  The PMSA is an area analogous to
the MSA but is slightly larger.  PMSA codes are formatted the same as MSAs. CMSA
codes are two-digit numeric groups of PMSAs assembled together into larger
integrated areas. NECMAs are separate metropolitan area codes for groups of New
England towns and cities. Since more detailed information is available for New
England at the town level, additional codes have been developed to allow
                                     12

-------
information to be tracked by more detailed metropolitan areas.  The NECMA codes
are formatted the same as MSA codes but are not intended to replace MSA codes, but
rather to be more detailed areas for New England.

                                Applicability

      Several EPA programs track monitoring and enforcement information by
Metropolitan Statistical Area.

                                   Usage

      Several systems use MSAs as location units in addition to states, counties,
cities, etc.  Since there is no other established code analogous to the MSA, only the
standard MSA codes are currently being used. However, some systems are using
outdated MSA data (from FIFS 8-4).  No systems have been identified as using the
new statistical areas (PMSA, CMSA, NECMA).  Currently, there is no EPA  standard
for metropolitan statistical areas.

                              Recommendation

      Since EPA programs have displayed a need for information related to
integrated metropolitan areas and actively use the MSAs in automated systems, EPA
should standardize its metropolitan areas. However, EPA should further research
which unit(s) to use as the standard.  Whichever statistical area(s) is selected, the
standard should follow the standards outlined in FIPS PUB 8-5. The NECMA is not
recommended as an EPA standard since it includes information about a limited
geographical area.  This does not mean that use of NECMAs is discouraged as an
additional  tool for  regional analysis.

       EPA must first determine tracking data by CMSA is useful.  CMSAs offer the
EPA further flexibility in tracking information, but no specific need for CMSA-
related data has been identified.  If the need exists, then  the FIPS standard should be
adopted.
                                     13

-------
      Since the need for MSA-level tracking  has already been identified, the
remaining issue is whether to continue to use the MSAs or to adopt the PMSAs as
the standard unit. MSAs are currently used and offer little problem for
standardization within the Agency.  However, PMSAs are the building blocks of
CMSAs and are the logical choice for standardization if CMSAs are also used.

2.5   Congressional Districts - FIPS PUB 9

                                 Description

      Two-digit numeric codes identify the congressional districts of the U.S.  The
numbers are unique within each state and unique for each of the sessions of
Congress. "At large" is presented to by the  designation '00'.  The districts may be
rearranged occasionally, depending upon census results.

                                Applicability

      Congressional district information is  useful to EPA programs, specifically for
enforcement and monitoring purposes.

                                    Usage

      At least five Agency programs utilize this information for monitoring and
enforcement purposes.  There is no alternative coding scheme and no other data
element examined has any of the same key  attributes. Currently there  is  no EPA
standard for this data element.

                              Recommendation

      Essentially, this element has already  been standardized. A formal  notice
should be published to adopt this FIPS data  standard as an Agency standard and
procedures established for updating the district codes.
                                     14

-------
2.6    Countries of the World - FIPS PUB 10-3 & 104-1

                                 Description

      This original NBS standard, FIPS PUB 10-3, consists of a two-character
alphabetic code, one for each of the geo-political entities in the world.  Although the
NBS standard was developed first, a coding scheme later published by the
International Organization for Standardization (ISO) has received wider publicity
and greater use. Recognizing this, the NBS has published FTPS PUB 104-1 which
allows any agency to adopt either format as their standard.

                                 Applicability

      The EPA currently has little need for country codes. Although no specific
need has been identified, new international issues, such as acid rain and global
warming, may increase the need for international data.

                                    Usage

      The EPA has little experience with country codes and very few Agency
systems currently use country codes.  There is no current EPA standard for country
codes.

                              Recommendation

      In order to provide direction and guidance for future systems, it is
recommended that the Agency  adopt the ISO standard  (FIPS PUB 104-1) for country
abbreviations.  By adopting the standard before use is wide-spread, the potential for
later misuse is eliminated.
                                      15

-------
2.7    Named Populated Places/ Primary County Divisions, and Other Location
      Entities of the U.S. - FIPS PUB 55-2

                                 Description

      NBS has assigned a unique five-digit code  for each incorporated place, census
designated place, township, Indian reservation, and Alaskan native area in the U.S.
and its territories. The original sequencing provided for future expansion and
modification by establishing a minimum increment of eight points between
successive locations. As areas become incorporated or established, they will be
placed in this alphabetical listing and given their own code.

                                Applicability

      Agency systems frequently collect data by city or town or other geographical
breakdown.  However, there is no EPA standard for the appropriate geographical
breakdown of the U.S. In the absence of a current EPA standard, this coding scheme
is applicable to the EPA as a standard for integrating geographical regions across
Agency systems.
      The Agency has not made substantial use of this standard thus far. In fact,
research has not revealed one coding system that uses this'NBS^tandard. Most
programs used numbering schemes and identifiers whicn~were specific to the
program's requirements and which had little significance for any other system.  The
problem is that most programs deal with unique geographical boundaries which are
defined by legislation.  One example is the AQCR code used in the air programs
which blocks off areas according to measurements of air quality. Such a code likely
has no pertinence to another program. There is no current EPA standard for this
data element.
                                     16

-------
                              Recommendation

      This FTPS standard should be adopted to provide a common detailed
locational code within EPA systems. These codes will not replace the specialized
geographical boundary information necessary to each program, such as the AQCR
codes in the air program or the hydrologic codes used in the water program, but will
instead provide added information common to all programs.

      There are several benefits to be realized from adopting this FITS standard.  As
EPA consolidates its programs and focuses its attention on multi-phase enforcement
activities such as combined air and water pollution inspections for large power
plants, it will require  more comprehensive informational resources.  This
consolidation effort will need information, and therefore, data elements,  in
compatible formats to facilitate the integration process. Facility identification and
location will be just two of those elements.  The FIPS standard is specific, yet flexible
enough to be used in this capacity.  Another positive aspect of this code structure is
that its maintenance will be minimal.  Dun and Bradstreet currently provides this
code along with the D&B facility number.

2.8   Local Time of Day -FIPS PUB 58-1

                                 Description

      This standard describes specific representations for both 12 and 24-hour
timekeeping systems.  The formats for civil and military clock time at point of
origin are presented.

                                 Applicability

      There is a limited need for time representations, mostly involving the time
of an update transaction to a data record.
                                      17

-------
                                   Usage

      CERCLIS uses time elements to record the timing of updates to data records
and follows the FTPS standard.  Research did not reveal any other systems using
local time elements. Currently the only EPA standard regarding this element is a
standard format for electronically transmitting times. The standard is contained in
the Transmittal Order "Data Standards for the Electronic Transmission of Laboratory
Measurements Results" and specifies that time formats should be HH MM using a
24-hour dock.
                              Recommendation

      This standard, though used very infrequently, should be adopted to ensure
that future systems will be developed consistently.  Since the only system identified
currently conforms to the standard, no conversion is necessary. It is also
recommended that midnight be adopted as the beginning of the new day rather than
the end of the previous day. This means that midnight will be represented as 000000
for 24-hour dock and 120000A for a 12-hour clock instead of 240000 or 120000P,
respectively.

      The standard contained in the "Electronic Transmission..." paper should be
updated to indude two-digit fields for seconds. Adopting the 6-digit format allows
more flexibility for time representations, while maintaining consistency across  EPA
systems.

2.9   Representations of Universal Time - FJPS PUB 59

                                 Description

      This standard identifies the various time zones of the world.  Procedures for
expressing universal time (Greenwich Mean Time)  and for presenting local time
differential factors and time zones are given.
                                     18

-------
                                 Applicability

      This NBS standard has no application in the Agency at this time.

                                    Usage

      There are currently no uses for Universal Time within the Agency.  There is
no current EPA standard for this data element.

                               Recommendation

      The EPA in recent years has become more involved in international
environmental issues. It appears that this trend will continue and international
issues will grow in importance for the Agency.  Although currently there is no
application of universal  time codes within EPA, the need may arise in the future
and this standard should be adopted now, before any systems are developed that use
this type of information.  Adoption now will create little extra effort in the short-
term and may avoid significant problems in the future.

2.10   Standard Industrial Classification (SIC) Codes - FIPS PUB 66

                                 Description

      This standard provides classifications, short titles, and codes for representing
industries and groups of establishments with similar economic activities.  The codes
may be two to four digits in length and are left-justified. A two-digit number refers
to a general industrial classification, - such as rubber, whereas a four-place number
identifies a subdivision in more specific detail. The  National Bureau of  Standards
allows any agency to add more levels of classification by appending digits to the  right
of the four-digit code. This flexibility is useful to those agencies and offices which
are interested in very specific activities.
                                      19

-------
                                 Applicability

      The EPA has recognized the need to categorize industry establishments
according to their business activity.  SIC codes provide the capabilities needed by
most programs, but some programs require classifications with an orientation
different from SIC's socio-economic structure.
      Although many systems have adopted the SIC standard, there are many
others that use different coding schemes.  Some programs require more detail than
SIC provides, while others require less detail. The Agency has not agreed upon one
standard for industrial classes.

                              Recommendation

      Where SIC codes provide the necessary detail and orientation to support
program needs, the codes in FTPS PUB 66 should be adopted.  With so many
variations of industrial classification currently in use, thorough analysis of each
application must be conducted to determine their compatibility with the SIC
standard. In some cases, it may be useful to develop a cross-reference showing the
relationships among SIC codes and other industrial classification schemes.

2.11   Geographical Point Locations - FIPS PUB 70-1

                                 Description

      This standard specifies three formats for representing geographic point
locations: longitude/latitude, Universal Transverse Mercator (UTM), and the State
Plane Coordinate System.  Of these, the first is the most widely used at the Federal
level. It employs spherical coordinate representations (degrees, minutes, and
seconds) to identify points on the earth's surface. The prime meridians for
longitude and latitude are Greenwich, England and the equator, respectively.
                                      20

-------
      The UTM method is a rectangular coordinate system which uses linear
measurements to specify a location.  Two variations are used to display the
coordinates: (1) hemisphere, zone, casting value, and northing value; and (2)
hemisphere, zone, east or west value, and northing value.  The two systems rely on
different referencing points for establishing a vertical line of demarcation. The State
Plane Coordinate system was designed to define the location of points within a
geographical area.

      A standard for altitude data is also included that is represented by the vertical
distance between a point and the National Geodetic Vertical Datum - roughly sea
level.

                                 Applicability

      Many EPA programs depend on geographical point location data to locate
facilities, event locations, such as the site of a spill, nature terrain and other points.
The OIRM.  Directives Manual  states that "geographical information systems
developed by the Agency must conform to an established set of appropriate data
standards which permit the use of the system by all relevant programs and State
agencies."!  A standard for geographical point locations is imperative for  future EPA-
wide integration of geographical information.
      The most widely used geographical point location system at EPA is the
latitude/longitude system, though at least one system does use the UTM method
and at least one State Plane Coordinate system is in use in each of the states.
Although many systems have standardized on the latitude/longitude method, there
are significant differences in the accuracy with which latitude and longitude
measurements are collected and stored. Improper accuracy may hinder cross-system
integration of data.
            EPA Office of Information Resource Management Directives Manual
            Draft, October 30,1986.

                                      21

-------
                              Recommendation

      The EPA should adopt the longitude/ latitude method as a standard for
describing geographic point locations.  This is the dominant form of use and its
application and understanding is wide-spread.  Since all three methods are
interchangeable, adoption of the latitude/longitude system will not cause significant
conversion difficulties.

2.12   Standard Occupational Classification (SOC) Codes - FIPS PUB 92

                                 Description

      This standard specifies two, three and four digit numeric codes classifying
work categories. These categories are based on the actual work performed and not
other factors such as skill level, place of work, licensing required, etc. The two digit
numerics are high level categories and the three and four digit numerics are further
levels of subclassifications.

                                 Applicability

      No specific uses have been identified for SOC codes.
      No usage of standard occupational codes has been identified, and no EPA
standards currently exist.

                              Recommendation

      Although currently there is no application of occupational codes within EPA,
the need may arise in the future and this standard should be adopted now, before
any systems are developed that use this type of information. This code should be
the standard, and any future non-standard coding systems should be approved
                                      22

-------
before their implementation. Adoption now will create little extra effort in the
short-term  and may avoid significant problems in the future.

2.13  Federal and Federally-Assisted Organizations - FIPS PUB 95

                                  Description

      Four digit numeric codes are defined for each organization that is funded by
the United  States Government. The code consists of the two-digit Treasury Agency
Symbol (TAS) followed by a two-digit subdivision code. The Department of
Treasury maintains this number. The types of organizations covered includes the
Legislative, Judicial, and Executive Branches, other Independent Federal and Quasi-
Federal Organizations, Independent Federal-State and Interstate Organization, and
International Organizations.

                                 Applicability

      No specific applications for this standard has been identified within EPA
programs.

                                    Usage

      No usage of Federal and Federally-Assisted Organization codes has been
identified and there are no related standards within EPA.

                              Recommendations

      Since there is no current need for these codes, this standard should not be
adopted. If the need arises in the future, this standard should be reevaluated.
                                      23

-------
2.14   Hydrologic Units in the U.S. and Caribbean - FIPS PUB 103 - from the U.S.
      Geological Survey

                                  Description

      The U.S. Geological Survey developed two to eight digit numeric codes
identifying each  major hydrological area in the U.S. and the Caribbean. The codes
are hierarchical and begin with 2-digit codes at the top identifying the 21 major
hydrological regions. 4-digit codes subdivide the regions into 222 subregions; 6-digit
codes subdivide  subregions into 352 accounting units; 8-digit codes further
subdivide the accounting units into approximately 2,150 cataloging units. In
addition to numeric codes, the standard provides names for each unit which  are
unique within a  branch of the hierarchy.

                                 Applicability

      EPA programs have demonstrated a need for collecting hydrologic
information to monitor water conditions, track enforcement and compliance
activities, and track hazardous waste spillage and other events.

                                    Usage

      Many EPA systems currently use this standard to identify hydrological units.
Different systems use different levels of  detail within the hierarchy.  For example
CERCLIS uses the full detail (all 8 digits) while RCRIS  only uses the top three levels
(first 6 digits of the codes). There is no current EPA standard regarding hydrological
units.

                              Recommendation

      Since this  coding scheme is widely applicable to  EPA programs and is already
widely used, the U.S.G.S. codes are recommended for adoption as an  EPA standard.
This standard will facilitate interchange of data and cross-program analysis  of
geological and hydrological regions.
                                      24

-------
2.15   Future FTPS PUB Data Element Standards

      The National Bureau of Standards is currently examining one new standard
data element. NBS has proposed adoption of a standard for data used in mapping
applications. The standard will adopt specifications developed by the Department of
Interior for digital cartographic data.  The standard will improve interchange of
information within and among agencies analyzing land use, demographics, and
other geographic information.  The new rule is currently in the Proposed Rule Stage
and a Notice of Proposed Rule Making is scheduled for release in September 1989.
                                     25

-------
3.0    Analysis of Non-NBS Coding Schemes

      This section defines several commonly used coding schemes that are not NBS
standards, but have some relevance to EPA's data needs. The first four elements are
described in FIPS PUB 19-1 as suggested schemes commonly used in business and
industry and the fifth element is one that was discovered during the interview
process. A brief summary of findings regarding each coding system is presented in
the following sections.

3.1    Hazardous  Material ID Number

      The hazardous material ID number is a United Nations guided code for
uniquely identifying various harmful substances. Since EPA programs often track
information  pertaining to hazardous materials, this coding system may provide a
common method of identification within the EPA and may increase consistency
with international conventions.  The applicability to EPA's information needs  (and
its relationship to current EPA standards, especially CAS Number) and the potential
usefulness of this code, warrant further study by EPA into its possible adoption as an
EPA data standard.
3.2   Social Security Number

      The Social Security Number is a nine digit numeric developed by the Social
Security Administration to uniquely identify all employees of the United States.
Many organizations across the United States and some systems within the EPA use
this code to identify personnel.  Although many systems at EPA do not keep records
of specific people, there is applicability to EPA for internal staffing-related records
where a systematic code would facilitate portability between systems.  Due to its wide-
spread use in all sectors of the U.S., the Social Security Number is the logical choice
for such a standard. EPA should consider adopting SSN as a standard personnel
identifier within EPA systems, while acting to ensure compliance with applicable
privacy restrictions. Conversion to the SSN should be performed only where doing
so causes no information loss and benefits outweigh costs.
                                     26

-------
3.3    Human Sexes

      The human sexes codes provide a consistent method for documenting the sex
of people. The scheme includes codes for male, female, and codes identifying that
sex is not specified or not known. The applicability of human sex codes to EPA is
essentially the same as for the Social Security Number. Personnel records within
EPA will benefit from increased portability with the adoption of a standard code.
Subsequently, a consistent coding scheme for identifying human sexes should be
adopted within the Agency. The scheme outlined in FTPS PUB 19-1 is one that
provides flexibility with its additional standard codes for unknown data, and should
be adopted by the EPA if it fulfills all of the Agency's information needs.
3.4   ZIP Codes

      ZIP Codes are five and nine digit numeric codes identifying the Postal Service
areas in the United States. These codes are crucial to all mailing address
information. Currently, many EPA programs have a need for mailing address
information and many EPA systems store ZIP Codes for this purpose. However,
some Agency systems can only store the five digit ZIP Codes and cannot
accommodate the recently introduced extended nine digit ZIP Codes.  Since ZIP
Codes are useful throughout the Agency, they should be adopted as an EPA standard
and their collection made  mandatory for all mailing addresses. Also, EPA should
mandate the updating of all five digit ZIP Code data elements to the new standard
nine digits wherever practical.
                                     27

-------
3.5    Animal and Plant Taxonomic Schemes

      Through interviews, key Agency personnel have identified animal and plant
taxonomic coding schemes as an area where development and adoption of a
standard would be beneficial. Currently, there are many different coding schemes in
existence throughout the government and scientific community with no clear
standard emerging. BIOS and ODES both use a common coding scheme that was
developed in conjunction with the National Oceanic Data Center (NODC), but usage
beyond EPA and NODC is limited. The NODC scheme is a 17-digit code identifying
animal species. However, this scheme may not contain the detail necessary for
some EPA programs and some effort is being spent within EPA to examine the
possibility of expanding the current codes to as many as 30-digits.

      Since the Agency has demonstrated a need for animal and plant taxonomic
codes, it is recommended that EPA look further into a possible scheme that fulfills
all program needs. The level of effort exerted to study these codes must not exceed
the benefits to be gained from adoption of an EPA standard.
                                    28

-------
4.0    Current EPA Data Standards Program Development

      This section reviews EPA's current and proposed data standards policies and
data element standards. Section 4.1 describes EPA's data standards policy efforts and
ongoing work. Section 4.2 discusses the Agency's current standard data elements
and elements currently being considered for adoption as standards.

4.1    EPA Data Standards Program Policies

      The EPA has established a Data Standards Program within its broader Data
Administration Program.  The policies and goals of EPA's Data Standards Program
have been developed and published in both the OIRM Directives Manual - Chapter
5 and in the EPA Information Resource Management Policy Manual.   Coordination
of the program is the responsibility of the Office of Information Resource
Management.  Each Assistant Administrator, Regional Administrator, Office
Director, and Senior Information Resource Management Officer is responsible for
implementation,  compliance and enforcement his/her organization.

      Procedures for developing, implementing, and enforcing data standards have
not been formally adopted. A separate EPA Data Standards Catalog has also not been
developed to centrally store all EPA data standards. Currently OIRM is developing a
proposal on data standards procedures and has issued a draft titled "Procedures for
the EPA Data Standards Program".  When finalized and approved, this document
will establish official EPA procedures regarding data standards.

      Within the last two years the EPA has adopted two data standards. These
standards were published as EPA Transmittal Orders to be added to the EPA
Directives System. The first standard  establishes the Chemical Abstract Service
Registry Number as an EPA standard  data element to be collected with all
information regarding chemicals. The second standard establishes record and media
formats, record sequences, etc. for transmitting laboratory data within the EPA.

      The EPA is also working on several new and ongoing data standards tasks.
                                     29

-------
      •     A proposal was issued in December 1988 to establish an EPA data
            element standard for facility identifiers;
      •     Work is ongoing in establishing an EPA standard for geological point
            locations; and
      •     An Electronic Data Interchange Committee has been set up to analyze
            the problem of exchanging data through electronic media.

4.2    EPA Standard Data Elements

      The EPA has adopted two standard data elements, the Chemical Abstract
Service (CAS) Registry Number and an Electronic Transmission of Laboratory
Measurement Results standard. Both of these are briefly discusses below.  In
addition, EPA currently is working on adopting standards for geographic point
locations and facility IDs and has formed a committee to study Electronic Data
Interchange (EDI committee).

4.2.1  Chemical Abstract Service (CAS) Number

      The CAS number is a unique identifier assigned by the Chemical Abstract
Service to each distinct chemical substance recorded in the CAS Chemical Registry
System. It is represented as a ten digit code with the first nine characters uniquely
identifying the chemical and the tenth character acting as a check digit.

      EPA programs have demonstrated a need for collecting information about
chemical substances and their properties.  Currently, the use of the CAS number is
wide-spread within Agency systems.

      In June 1987, EPA officially adopted the CAS Number as an EPA standard data
element

4.2.2  Electronic Transmission of Laboratory Measurement Results
                                     30

-------
      The Agency has published an EPA order to standardize the formats and
electronic representations of data commonly used in transmitting laboratory results.
The order covers EPA laboratory transmission standards regarding: media formats
describing diskette and tape specifications and standard record lengths; record
formats describing each record layout; definition of production runs; record
sequences describing the order of records within a file; file and record integrity; date
and time formats; and other transmission information. The paper also presents
record layouts and field definitions for many commonly used record types.

      The date and time formats presented in the "Electronic Transmission..." order
conflict with current FIPS standards.  We recommend that both FIPS standards be
adopted and that the "Electronic Transmission..." order be updated to be consistent
with this change.  See Sections 2.1 and 2.8 for further discussion of the FIPS
standard.

      The standard  was adopted as an EPA standard in December 1988 and
published as the EPA Transmittal Order "Data Standard for the Electronic
Transmission of Laboratory Measurement Results".

4.2.3  Facility Identifier

      In December 1988, the EPA drafted a proposed standard for facility
identification codes.  The proposal recommended adoption of a unique EPA facility
identification scheme. This coding scheme is not based on any previous EPA or
external standard, such as the DUNS ID, but rather is a unique EPA identifier
comprised of the FIPS standard 2-digit state code for the facility, followed by a unique
random 10-digit identification number. These codes will be assigned and tracked by
the Facility Index Data System (FINDS) and its management. The facility ID
standard element, when approved, must be collected as a key data element in all
EPA systems containing facility information.  However, adoption of this standard
does not preclude using DUNS IDs and other EPA program-specific identifiers for
additional  information.
                                      31

-------
      Currently, this standard has not been adopted as an EPA standard data
element.  Green Border review was scheduled for February 1989.

4.2.4  Geographical Point Locations

      The OIRM Directives  Manual and the EPA  Information Resource
Management Policy Manual state that a priority of the Data Standards Program is to
establish an EPA standard for the representation of geographical point location
information. Currently, the EPA is looking into adopting a standard.  This paper
recommends the adoption of the Latitude/Longitude method presented in FIPS
Publication 70-1.

4.2.5  The Electronic Data Interchange (EDI) Work Group

      The EPA is currently studying possible standards for electronic interchange of
data.  The Electronic Data Interchange (EDI) work group is looking into methods
viable for the EPA. The work group is targeting a proposal for EPA standards in late
1989.

      Several FTPS PUBs have been established regarding data interchange formats.
We recommend that the EDI  work group analyze the applicable FIPS in conducting
their study.
                                     32

-------
5.0    Summary of Findings and Recommendations

5.1    Review of FIFS Standards and Usage

      After reviewing the Federal Information Processing Standards and EPA's
usage of them, we recommend that eleven out of the fourteen FIPS be adopted as
internal EPA standard data elements.  Each recommended standard should be
implemented wherever conversion is practical and beneficial.  The standard
regarding Standard Industrial Classification codes should be reviewed for
applicability to each system and implemented where no information is lost in
conversion. Where the current coding system is significantly different or more
detailed, a mapping should be developed between  the current codes and the SIC
codes.

      Each of the three FIPS not recommended for adoption are not currently
applicable to EPA's program needs.  If a future need arises, these FIPS should be
reviewed again.

      The current usage of the recommended FIPS standards across EPA systems
ranges from nearly full compliance in  the case of MSAs and Congressional Districts
to wide-spread non-conformity as with Populated Places, and Calendar Dates.  There
are several reasons for the non-compliance with the FIPS:

      •     system is older than the standard;
      •     system uses a coding scheme from a previous system;
      •     standard does not fulfill  information  needs; and
      •     standards information is  not widely known.

                      System is Older Than the Standard

      Several EPA systems date back far enough that they predate some or all of the
relevant standards. As such, these systems had little guidance on choosing
appropriate standards at their inception. Some systems incorporated the FIPS
                                     33

-------
during later revisions, while others have not.  Currently, there are several systems
that still do not conform to the FIFS for this reason, including, for example, NEDS.

             System Uses a Coding Scheme From a Previous System

      Several systems do not comply to the FIFS because they have adopted non-
standard coding schemes or formats from predecessor EPA systems.  In the air
program, the legacies of SAROAD's and NED's coding structures are found in many
other air program systems. The copying of coding schemes from system to system
has led to some consistency within program offices but has also prolonged non-
standard habits.

                  Standard Does Not Fulfill Information  Need

      Some EPA systems have developed special coding schemes because the codes
presented in the FIFS PUBs do adequately satisfy a program need.  Some programs
have devised their own coding schemes that represent data in a significantly
different manner than  a FIFS standard does.  These schemes would lose
information in a conversion to a FIPS standard. Therefore, these schemes should
not be forced to comply with an inadequate FIFS standard, but rather should be
treated as separate data elements. Mappings between the specially developed  codes
and the FIPS is encouraged where possible.

      Other coding schemes are merely expanded  or consolidated versions of FIPS.
These schemes either increase  the level of detail by adding subdivision codes  or
decrease the level of detail by creating appropriate groupings of codes.  The practice
of modifying FIPS standards is encouraged as a way to meet program needs without
creating entirely new coding systems. This provides the information needed  for the
program while sacrificing as little consistency and portability as possible.  Mappings
between the modified schemes and the FIPS is encouraged.
                                     34

-------
                 Standards Information Is Not Widely Known

      One of the major reasons cited for non-conformance is that information
about both EPA standards and the Federal Information Processing Standards is not
widely known among EPA personnel. Many personnel interviewed did not know
where to find information about data standards or that EPA had any formal policies
regarding the use of standard data elements. Most of those interviewed agreed that
formal EPA data standard policies and promulgations are distributed poorly to the
personnel who need them, and that in general EPA  training of systems personnel
regarding policies is insufficient.  Of the personnel who did know about EPA's data
standards effort, most identified EPA experience and informal lines of
communication as the sole source of their knowledge, rather than  formal
notification or training.

      In general, data standards information is not consistently getting to the people
who need it most. It is recommended that EPA continue to develop its data
standards program by analyzing the current procedures for information
dissemination and by adopting new methods where necessary.
5.2    Review of Other Non-NBS Standard Coding Schemes

      In addition to the FIPS standards, several non-NBS standards referenced in
FIPS 19-1 have been recommended for EPA adoption or further study. The
recommended elements are ZIP Codes, Social Security Numbers and standard codes
for human sexes. The hazardous material ID from the United Nations is
recommended  for further review.

5.3    Review of Data Standards Program

      Overall,  the EPA's Data Standards Program is still in its development stages.
EPA has developed general policies for the goals of the program, but still has not
fully translated these goals into procedures and results. In the last two years, the
Agency has begun to adopt standard data elements and publish them in EPA
                                     35

-------
Transmittal Orders, but has not developed a central Data Standards Catalog or an
effective distribution plan that disseminates the information to all key personnel in
the Agency. The EPA is currently working towards adopting formal data standards
procedures, and should continue to do so to attain the program's goals.
                                     36

-------