FIPS PUB Usage Report Prepared for the Information Management Services Division Office of Information Resource Management U.S. Environmental Protection Agency June 1989 Contract GS-OOK85-1AFD2777 Task Number N4B688019 AMERICAN MANAGEMENT SYSTEMS, INC ------- FIPS PUB Usage Report Prepared for the Information Management Services Division Office of Information Resource Management U.S. Environmental Protection Agency June 1989 Contract GS-OOK85-1AFD2777 Task Number N4B688019 AMERICAN MANAGEMENT SYSTEMS, INC. ------- 1.0 Introduction 1 1.1 Background 1 1.2 Purpose and Scope 1 1.3 Procedures Used in Analysis of Data Standards 2 1.4 Layout of Paper 2 2.0 Analysis of FIPS PUBs Standards 4 2.1 Calendar Date - FIPS PUB 4-1 4 2.2 States and Outlying Areas of the U.S. - FIPS PUB 5-2 10 2.3 Counties and County Equivalents - FIPS PUB 6-3 11 2.4 Metropolitan Statistical Areas (MSAs) - FIPS PUB 8-5 12 2.5 Congressional Districts - FIPS PUB 9 14 2.6 Countries of the World - FIPS PUB 10-3 & 104-1 15 2.7 Named Populated Places, Primary County Divisions, and Other Location Entities of the U.S. - FIPS PUB 55-2 16 2.8 Local Time of Day - FIPS PUB 58-1 17 2.9 Representations of Universal Time - FIPS PUB 59 18 2.10 Standard Industrial Classification (SIC) Codes - FIPS PUB 66 19 2.11 Geographical Point Locations - FIPS PUB 70-1 20 2.12 Standard Occupational Classification (SOC) Codes - FIPS PUB 92 ... 22 2.13 Federal and Federally-Assisted Organizations - FIPS PUB 95 23 2.14 Hydrologic Units in the U.S. and Caribbean - FIPS PUB 103 - from the U.S. Geological Survey 24 2.15 Future FIPS PUB Data Element Standards 25 3.0 Analysis of Non-NBS Coding Schemes 26 3.1 Hazardous Material ID Number 26 3.2 Social Security Number 26 3.3 Human Sexes 27 3.4 ZIPCodes 27 3.5 Animal and Plant Taxonomic Schemes 28 4.0 Current EPA Data Standards Program Development 29 4.1 EPA Data Standards Program Policies 29 4.2 EPA Standard Data Elements 30 4.2.1 Chemical Abstract Service (CAS) Number 30 4.2.2 Electronic Transmission of Laboratory Measurement Results 31 4.2.3 Facility Identifier 31 4.2.4 Geographical Point Locations 32 4.2.5 The Electronic Data Interchange (EDI) Work Group 32 5.0 Summary of Findings and Recommendations 33 5.1 Review of FIPS Standards and Usage 33 5.2 Review of Other Non-NBS Standard Coding Schemes 35 5.3 Review of Data Standards Program 35 ------- 1.0 Introduction 1.1 Background EPA has numerous ADP systems to support its management directives and regulatory and enforcement functions. These systems were developed over the past years as the Agency's information needs expanded. Because many of these systems were designed by individual EPA program offices, this pattern of ADP system development has resulted in many non-uniform coding practices and incompatible data structures, leading to inefficient data communication across media boundaries. To improve EPA's data management, the Agency has undertaken a Data Standards program. As part of this program EPA-wide data standards are studied, approved and promulgated to increase consistency and portability of data within the Agency and between the Agency and outside sources (such as the States or industry). A previous report, Analysis for the EPA Data Standards Program (December, 1982), outlined the Data Standards Program goals and procedures and analyzed the National Bureau of Standards (NBS) library of standard data elements. Since the first publication of the Analysis... , the inventory of EPA information needs and hence, information systems, has continued to grow. Simultaneously, the library of NBS standards has also continued to grow and change. Therefore a need has arisen to reevaluate the NBS standards and their applicability to EPA's information needs. 1.2 Purpose and Scope This document is intended to support the EPA's ongoing Data Standards program. In particular, it analyzes the current NBS standards, the current EPA standards and the usage of both within Agency systems. Each NBS standard is analyzed for: Applicability to EPA's information needs Presence of an existing or proposed EPA standard Usage within Agency systems ------- In addition, non-NBS standard data element coding systems listed in FTPS Publication 19-1 are studied as possible candidates for EPA standardization. 1.3 Procedures Used in Analysis of Data Standards The analysis for this document followed a three-step procedure. First, the current NBS and EPA standard data elements and EPA data standards policies and procedures were identified. This step was accomplished through review of the Federal Information Processing Standards Publications (FIPS PUBs), the EPA Information Resource Management Policy Manual and other Agency materials. Second, the applicability and current usage of standards was analyzed. This second step was accomplished through interviews with key personnel and reviewing documentation for a sample of EPA's major systems. Major systems are defined as those rated as level 1 in the Information Systems Inventory (ISI). An interview was also conducted with a key person from the Facility Index System (FINDS) due to the system's role in developing a common coding scheme for EPA-regulated facilities. The third step involved analyzing the findings from the steps above, determining recommendations and preparing this report. 1.4 Layout of Paper The remainder of this document is divided into four sections. Section 2 "Analysis of FIPS PUBs Standards" - defines each NBS standard for data elements, describes the findings of its applicability and usage within EPA systems, and recommends whether or not to adopt it as an EPA standard; Section 3 "Analysis of Non-NBS Coding Schemes" - defines those non- NBS standards listed in FIPS PUB 19-1 that may be applicable to the EPA, briefly summarizes findings regarding each coding scheme; ------- Section 4 "Current EPA Data Standards Program Development" - discusses EPA's current and proposed data standards policies standard data elements; and Section 5 "Summary of Findings and Recommendations" - contains a brief summary of findings regarding EPA 's Data Standards Program. ------- 2.0 Analysis of FIPS PUBs Standards /> / ^ 6^, / htrea»-ef^teTOla7o!s*9evelops and maintains a library of standardized data elements and representations for use in Federal automated data systems. These standards are published in the Federal Information Processing Standards Publications (FIPS PUBs) and are issued to all Federal agencies. Currently there are fourteen areas covered by data standards. This chapter examines each of these standards for possible adoption by EPA and recommends actions to be taken by the Agency. Each of the fourteen data standards areas is discussed separately below. Discussion of each data standard is broken into four sections: Description of the standard Applicability of the standard to EPA's information needs Current usage of the data standard within EPA systems - includes existence of current or proposed EPA standard Recommendations for EPA data standards program Exhibit 1 summarizes the descriptions of each FIPS standard data element reviewed. Exhibit 2 summarizes the recommendations for each FIPS. 2.1 Calendar Date -FIPS PUB 4-1 Description This standard specifies that dates must be recorded using the Gregorian calendar in a YYYYMMDD format. These dates should be stored in numeric fields. This standard updates the old FTPS standard of YYMMDD calendar dates. ------- Exhibit 1 Standard NBS Data Elements ttitf FIPS Subject Definition 4-1 5-2 6-3 8-5 10-37 104 55-2 58-1 59 66 70-1 92 95 103 Calendar Date States and Outlying Areas Counties and County Equivalents Metropolitan Statistical Areas (MSAs, CMSAs, PMSAs) Congressional Districts Countries and Principle Administrative Divisions Populated Places - Cities, Towns, County Divisions Local Time of Day Universal Time, Time Differentials & Time Zones Standard Industrial Class Codes (SIC) Geographical Point Location Standard Occupational Class (SOC) Codes Federal & Federally Assited Organizations Hydrologic Units 8-digit numeric, Gregorian Calendar YYYYMMDD 2-digit numeric and 2 character alphabetic abbreviation (Postal Codes) 3-digit numeric, unique within each state 4-digit numeric for metro stat areas 3-digit for consolidated metro areas 2-digit numeric unique within each state 2-character country code and 4-char primary divisions 5-digit numeric unique to each U.S. location Representations of 12 & 24 hour clock times Represenations of 12 & 24 hour universal time, 3-char time zones, & time differentials 2-4 digit numeric identifying major business classes and sub-classes 3 represent ations of point locations - Lat/Long, UTM, & State Plane Coord. 2-4 digit numeric identifying major occupation classes & sub-classes 4-digit numeric: 2-digit Treasury Agency Symbol + 2 digit division code 2,4,6,or-digit numeric for U.S. bodies of water ------- Exhibit 2 Recommendations for NBS Standard Data Elements Subject Calendar Date States and Outlying Areas Counties and County Equivalents Metropolitan Statistical Areas (MSAs, CMSAs, PMSAs) Congressional Districts Countries and Principle Administrative Divisions ADOPT 1 *j *f * ^j *j ^J DO NOT 1 ADOPT 1 STUDY 1 FURTHER! Comments Should be standard for all existing and future systems; existing non-standard data codes may be exempted if conversion is too costly. Should be standard for all existing and future systems; existing non-standard data codes may be exempted if conversion is too costly. Should be standard for all existing and future systems, where applicable; existing non-standard data codes may be exempted if conversion is too costly or information would be lost by conversion. Should be standard for all existing and future systems; standard essentially in place already. Should be standard for all existing and future systems; standard essentially in place already. EPA should adopt the ISO standard instead of the alternate NBS standard in FIPS 10-3; the ISO standard is more widely accepted. ------- Exhibit 2 (continued) Recommendations for NBS Standard Data Elements -"^b*'^.*-- Subject 6 o tfl Comments Populated Places - Cities, Towns, County Divisions Local Time of Day Universal Time, Time Differentials & Time Zones Standard Industrial Class Codes (SIC) Geographical Point Location Standard Occupational Class (SOC) Codes Should be standard for all future systems; conversion of existing systems is recommended only where benefits outweigh costs. Should be standard for all existing and future systems; existing non-standard data codes may be exempted if conversion is too costly. Not currently applicable to EPA, but should be adopted to avoid future problems. Should be standard for systems where SIC codes provide the necessary detail; conversion should be determined on an individual basis. Adopt latitude/longitude representation in all future systems and convert existing systems where benefits exceed costs. Not applicable to EPA currently, but should be adopted to avoid future problems. ------- Exhibit 2 (continued) Recommendations for N-BS Standard Data Elements Subject Federal & Federally Assisted Organizations Hydrologic Units ADOPT 1 OP D< * <*> £ Comments Not applicable to EPA. Should be standard for all existing and future systems; existing non-standard data codes may be exempted if conversion is too costly. ------- Applicability This element is used in record structures to identify when the information was last updated. Administrative and enforcement divisions use this code to schedule and chronologize events. EPA programs use date information extensively to perform their missions. Usage The 6-digit YYMMDD format of the Gregorian calendar is the most common representation within Agency systems. Few systems use the 8-digit version. Variations of order as well as structure exist. Some systems rely on a five-digit Julian calendar code which identifies the year and the day of the year (1 through 365), while others use the Gregorian calendar but store dates in MMDDYY format. Currently the only EPA standard regarding this element is a standard format for electronically transmitting dates. The standard is contained in the Transmittal Order "Data Standards for the Electronic Transmission of Laboratory Measurements Results" and specifies that date formats should be in the format YYMMDD. Recommendation Since so much of the Agency's information is date-dependent, we recommend the adoption of this FTPS standard and the update of the "Data Standards for the Electronic Transmission of Laboratory Measurements Results" directive. Although the "Electronic Transmission..." paper has already established a standard, the omission of a century identifier will require a larger conversion effort of electronically transmitted data in the future, when the year 2000 approaches. A common internal coding scheme will facilitate the exchange of information across program boundaries. Adopting this standard with a four digit code for the calendar year will prepare the EPA for the changing of the century, eleven years from now. Existing file structures which do not conform to the proposed standard will be granted exemptions if the effort required to convert the codes is greater than any benefit which would result from standardization. Since ------- most date formats are readily convertible and many already conform to the old 6- digit YYMMDD FTPS standard, no significant difficulties are foreseen in implementing this standard.2.2 States and Outlying Areas of the U.S. - FIPS PUB 5-2 Description Two-character alphabetic abbreviations and two-digit numeric codes are defined for each of the states and U.S. territories. The abbreviations are identical to the Postal Service codes and therefore are useful for identifying establishment locations and for producing mailing addresses. Applicability State codes are used by Agency programs for location, jurisdiction, and mailing address identification. The digital format is applicable where machine and numeric sorting are required, since a sort on the numeric code arranges the States in alphabetical sequence. Presently, the Agency uses several different alphabetic and numeric coding schemes for states. The FIPS standard is used most widely, especially for mailing addresses, since the standard uses the Postal Service codes. However, some systems use a hybrid of the Postal codes, adding additional program-specific codes for internal use. Some air program systems use an alternative numeric code based on the code found in SAROAD. Currently there is no EPA standard for this data element. Recommendation This standard should be adopted by the EPA for internal use. The Agency will benefit from adopting a common coding scheme for identifying the 50 states and other territories. It will improve communications between the program offices, 10 ------- between EPA and external organizations, and it will support the concept of establishing unique location identifiers. This standard is particularly critical because much of EPA's funding and enforcement activity is state-specific. Also, states are assuming many of the data gathering responsibilities which were previously performed by the EPA. A well-defined state data element will aid in the transfer of information between the state and Federal levels. Since the fifty states are well defined, there should be little difficulty in conversion between alternative coding systems. However, different systems have different needs for identifying the outlying areas and trust territories of the United States and other locations or entities outside of the states. An example is including an extra code for each region to identify locations specific to a region but not to a state. Conversion of these other territories may lead to more significant difficulties. As such, EPA should adopt the Postal codes as the standard to be used for the 50 states, adopt the Postal codes for the outlying areas to be used where possible, and allow the codes to be expanded to include additional codes where necessary. 2.3 Counties and County Equivalents - FIPS PUB 6-3 Description The FIPS representation of this data element consists of three-digit numeric codes for each county or county equivalent in the U.S. The counties are arranged in alphabetical order and the corresponding codes are unique within each state. For example, the first counties in Alaska and Wyoming will each be designated '001'. This sequence, when combined with the FIPS numeric state code discussed earlier, will unambiguously identify each county in the United States. Applicability Many EPA systems use county codes as a means of identifying important areas or locations. Counties are used to track problem pollution areas, resource allocation and funding levels. 11 ------- Usage Currently, the Agency uses at least two different coding schemes for county identifiers. The FIPS standard is used in many EPA systems, while the SAROAD county identifier is used in several air program systems. Currently there are no EPA standards for county codes. Recommendation The Agency should adopt standard county codes to increase consistency and portability of data. Since there is no current EPA standard, either formally or informally, the standard published in FIPS PUB 6-3 should be adopted to conform with Federal guidelines. 2.4 Metropolitan Statistical Areas (MSAs) - FIPS PUB 8-5 Description MSA (formerly SMSA) codes are elements which identify integrated social and economic units. Each contains a unique four-digit numeric identifier plus a one-digit alphabetic population level code Nine criteria, including population, are used to identify MSAs, which may cross state boundaries. MSAs are used to define large residential centers and other areas of dense population which are of particular concern to the Agency. In October 1984, FIPS PUB 8-5 revised the definitions of MSAs. The updated standard revised the criteria used to identify MSAs and also established three new statistical areas: the Primary Metropolitan Statistical Area (PMSA), the Consolidated Metropolitan Statistical Area (CMSA), and the New England County Metropolitan Area (NECMA). The PMSA is an area analogous to the MSA but is slightly larger. PMSA codes are formatted the same as MSAs. CMSA codes are two-digit numeric groups of PMSAs assembled together into larger integrated areas. NECMAs are separate metropolitan area codes for groups of New England towns and cities. Since more detailed information is available for New England at the town level, additional codes have been developed to allow 12 ------- information to be tracked by more detailed metropolitan areas. The NECMA codes are formatted the same as MSA codes but are not intended to replace MSA codes, but rather to be more detailed areas for New England. Applicability Several EPA programs track monitoring and enforcement information by Metropolitan Statistical Area. Usage Several systems use MSAs as location units in addition to states, counties, cities, etc. Since there is no other established code analogous to the MSA, only the standard MSA codes are currently being used. However, some systems are using outdated MSA data (from FIFS 8-4). No systems have been identified as using the new statistical areas (PMSA, CMSA, NECMA). Currently, there is no EPA standard for metropolitan statistical areas. Recommendation Since EPA programs have displayed a need for information related to integrated metropolitan areas and actively use the MSAs in automated systems, EPA should standardize its metropolitan areas. However, EPA should further research which unit(s) to use as the standard. Whichever statistical area(s) is selected, the standard should follow the standards outlined in FIPS PUB 8-5. The NECMA is not recommended as an EPA standard since it includes information about a limited geographical area. This does not mean that use of NECMAs is discouraged as an additional tool for regional analysis. EPA must first determine tracking data by CMSA is useful. CMSAs offer the EPA further flexibility in tracking information, but no specific need for CMSA- related data has been identified. If the need exists, then the FIPS standard should be adopted. 13 ------- Since the need for MSA-level tracking has already been identified, the remaining issue is whether to continue to use the MSAs or to adopt the PMSAs as the standard unit. MSAs are currently used and offer little problem for standardization within the Agency. However, PMSAs are the building blocks of CMSAs and are the logical choice for standardization if CMSAs are also used. 2.5 Congressional Districts - FIPS PUB 9 Description Two-digit numeric codes identify the congressional districts of the U.S. The numbers are unique within each state and unique for each of the sessions of Congress. "At large" is presented to by the designation '00'. The districts may be rearranged occasionally, depending upon census results. Applicability Congressional district information is useful to EPA programs, specifically for enforcement and monitoring purposes. Usage At least five Agency programs utilize this information for monitoring and enforcement purposes. There is no alternative coding scheme and no other data element examined has any of the same key attributes. Currently there is no EPA standard for this data element. Recommendation Essentially, this element has already been standardized. A formal notice should be published to adopt this FIPS data standard as an Agency standard and procedures established for updating the district codes. 14 ------- 2.6 Countries of the World - FIPS PUB 10-3 & 104-1 Description This original NBS standard, FIPS PUB 10-3, consists of a two-character alphabetic code, one for each of the geo-political entities in the world. Although the NBS standard was developed first, a coding scheme later published by the International Organization for Standardization (ISO) has received wider publicity and greater use. Recognizing this, the NBS has published FTPS PUB 104-1 which allows any agency to adopt either format as their standard. Applicability The EPA currently has little need for country codes. Although no specific need has been identified, new international issues, such as acid rain and global warming, may increase the need for international data. Usage The EPA has little experience with country codes and very few Agency systems currently use country codes. There is no current EPA standard for country codes. Recommendation In order to provide direction and guidance for future systems, it is recommended that the Agency adopt the ISO standard (FIPS PUB 104-1) for country abbreviations. By adopting the standard before use is wide-spread, the potential for later misuse is eliminated. 15 ------- 2.7 Named Populated Places/ Primary County Divisions, and Other Location Entities of the U.S. - FIPS PUB 55-2 Description NBS has assigned a unique five-digit code for each incorporated place, census designated place, township, Indian reservation, and Alaskan native area in the U.S. and its territories. The original sequencing provided for future expansion and modification by establishing a minimum increment of eight points between successive locations. As areas become incorporated or established, they will be placed in this alphabetical listing and given their own code. Applicability Agency systems frequently collect data by city or town or other geographical breakdown. However, there is no EPA standard for the appropriate geographical breakdown of the U.S. In the absence of a current EPA standard, this coding scheme is applicable to the EPA as a standard for integrating geographical regions across Agency systems. The Agency has not made substantial use of this standard thus far. In fact, research has not revealed one coding system that uses this'NBS^tandard. Most programs used numbering schemes and identifiers whicn~were specific to the program's requirements and which had little significance for any other system. The problem is that most programs deal with unique geographical boundaries which are defined by legislation. One example is the AQCR code used in the air programs which blocks off areas according to measurements of air quality. Such a code likely has no pertinence to another program. There is no current EPA standard for this data element. 16 ------- Recommendation This FTPS standard should be adopted to provide a common detailed locational code within EPA systems. These codes will not replace the specialized geographical boundary information necessary to each program, such as the AQCR codes in the air program or the hydrologic codes used in the water program, but will instead provide added information common to all programs. There are several benefits to be realized from adopting this FITS standard. As EPA consolidates its programs and focuses its attention on multi-phase enforcement activities such as combined air and water pollution inspections for large power plants, it will require more comprehensive informational resources. This consolidation effort will need information, and therefore, data elements, in compatible formats to facilitate the integration process. Facility identification and location will be just two of those elements. The FIPS standard is specific, yet flexible enough to be used in this capacity. Another positive aspect of this code structure is that its maintenance will be minimal. Dun and Bradstreet currently provides this code along with the D&B facility number. 2.8 Local Time of Day -FIPS PUB 58-1 Description This standard describes specific representations for both 12 and 24-hour timekeeping systems. The formats for civil and military clock time at point of origin are presented. Applicability There is a limited need for time representations, mostly involving the time of an update transaction to a data record. 17 ------- Usage CERCLIS uses time elements to record the timing of updates to data records and follows the FTPS standard. Research did not reveal any other systems using local time elements. Currently the only EPA standard regarding this element is a standard format for electronically transmitting times. The standard is contained in the Transmittal Order "Data Standards for the Electronic Transmission of Laboratory Measurements Results" and specifies that time formats should be HH MM using a 24-hour dock. Recommendation This standard, though used very infrequently, should be adopted to ensure that future systems will be developed consistently. Since the only system identified currently conforms to the standard, no conversion is necessary. It is also recommended that midnight be adopted as the beginning of the new day rather than the end of the previous day. This means that midnight will be represented as 000000 for 24-hour dock and 120000A for a 12-hour clock instead of 240000 or 120000P, respectively. The standard contained in the "Electronic Transmission..." paper should be updated to indude two-digit fields for seconds. Adopting the 6-digit format allows more flexibility for time representations, while maintaining consistency across EPA systems. 2.9 Representations of Universal Time - FJPS PUB 59 Description This standard identifies the various time zones of the world. Procedures for expressing universal time (Greenwich Mean Time) and for presenting local time differential factors and time zones are given. 18 ------- Applicability This NBS standard has no application in the Agency at this time. Usage There are currently no uses for Universal Time within the Agency. There is no current EPA standard for this data element. Recommendation The EPA in recent years has become more involved in international environmental issues. It appears that this trend will continue and international issues will grow in importance for the Agency. Although currently there is no application of universal time codes within EPA, the need may arise in the future and this standard should be adopted now, before any systems are developed that use this type of information. Adoption now will create little extra effort in the short- term and may avoid significant problems in the future. 2.10 Standard Industrial Classification (SIC) Codes - FIPS PUB 66 Description This standard provides classifications, short titles, and codes for representing industries and groups of establishments with similar economic activities. The codes may be two to four digits in length and are left-justified. A two-digit number refers to a general industrial classification, - such as rubber, whereas a four-place number identifies a subdivision in more specific detail. The National Bureau of Standards allows any agency to add more levels of classification by appending digits to the right of the four-digit code. This flexibility is useful to those agencies and offices which are interested in very specific activities. 19 ------- Applicability The EPA has recognized the need to categorize industry establishments according to their business activity. SIC codes provide the capabilities needed by most programs, but some programs require classifications with an orientation different from SIC's socio-economic structure. Although many systems have adopted the SIC standard, there are many others that use different coding schemes. Some programs require more detail than SIC provides, while others require less detail. The Agency has not agreed upon one standard for industrial classes. Recommendation Where SIC codes provide the necessary detail and orientation to support program needs, the codes in FTPS PUB 66 should be adopted. With so many variations of industrial classification currently in use, thorough analysis of each application must be conducted to determine their compatibility with the SIC standard. In some cases, it may be useful to develop a cross-reference showing the relationships among SIC codes and other industrial classification schemes. 2.11 Geographical Point Locations - FIPS PUB 70-1 Description This standard specifies three formats for representing geographic point locations: longitude/latitude, Universal Transverse Mercator (UTM), and the State Plane Coordinate System. Of these, the first is the most widely used at the Federal level. It employs spherical coordinate representations (degrees, minutes, and seconds) to identify points on the earth's surface. The prime meridians for longitude and latitude are Greenwich, England and the equator, respectively. 20 ------- The UTM method is a rectangular coordinate system which uses linear measurements to specify a location. Two variations are used to display the coordinates: (1) hemisphere, zone, casting value, and northing value; and (2) hemisphere, zone, east or west value, and northing value. The two systems rely on different referencing points for establishing a vertical line of demarcation. The State Plane Coordinate system was designed to define the location of points within a geographical area. A standard for altitude data is also included that is represented by the vertical distance between a point and the National Geodetic Vertical Datum - roughly sea level. Applicability Many EPA programs depend on geographical point location data to locate facilities, event locations, such as the site of a spill, nature terrain and other points. The OIRM. Directives Manual states that "geographical information systems developed by the Agency must conform to an established set of appropriate data standards which permit the use of the system by all relevant programs and State agencies."! A standard for geographical point locations is imperative for future EPA- wide integration of geographical information. The most widely used geographical point location system at EPA is the latitude/longitude system, though at least one system does use the UTM method and at least one State Plane Coordinate system is in use in each of the states. Although many systems have standardized on the latitude/longitude method, there are significant differences in the accuracy with which latitude and longitude measurements are collected and stored. Improper accuracy may hinder cross-system integration of data. EPA Office of Information Resource Management Directives Manual Draft, October 30,1986. 21 ------- Recommendation The EPA should adopt the longitude/ latitude method as a standard for describing geographic point locations. This is the dominant form of use and its application and understanding is wide-spread. Since all three methods are interchangeable, adoption of the latitude/longitude system will not cause significant conversion difficulties. 2.12 Standard Occupational Classification (SOC) Codes - FIPS PUB 92 Description This standard specifies two, three and four digit numeric codes classifying work categories. These categories are based on the actual work performed and not other factors such as skill level, place of work, licensing required, etc. The two digit numerics are high level categories and the three and four digit numerics are further levels of subclassifications. Applicability No specific uses have been identified for SOC codes. No usage of standard occupational codes has been identified, and no EPA standards currently exist. Recommendation Although currently there is no application of occupational codes within EPA, the need may arise in the future and this standard should be adopted now, before any systems are developed that use this type of information. This code should be the standard, and any future non-standard coding systems should be approved 22 ------- before their implementation. Adoption now will create little extra effort in the short-term and may avoid significant problems in the future. 2.13 Federal and Federally-Assisted Organizations - FIPS PUB 95 Description Four digit numeric codes are defined for each organization that is funded by the United States Government. The code consists of the two-digit Treasury Agency Symbol (TAS) followed by a two-digit subdivision code. The Department of Treasury maintains this number. The types of organizations covered includes the Legislative, Judicial, and Executive Branches, other Independent Federal and Quasi- Federal Organizations, Independent Federal-State and Interstate Organization, and International Organizations. Applicability No specific applications for this standard has been identified within EPA programs. Usage No usage of Federal and Federally-Assisted Organization codes has been identified and there are no related standards within EPA. Recommendations Since there is no current need for these codes, this standard should not be adopted. If the need arises in the future, this standard should be reevaluated. 23 ------- 2.14 Hydrologic Units in the U.S. and Caribbean - FIPS PUB 103 - from the U.S. Geological Survey Description The U.S. Geological Survey developed two to eight digit numeric codes identifying each major hydrological area in the U.S. and the Caribbean. The codes are hierarchical and begin with 2-digit codes at the top identifying the 21 major hydrological regions. 4-digit codes subdivide the regions into 222 subregions; 6-digit codes subdivide subregions into 352 accounting units; 8-digit codes further subdivide the accounting units into approximately 2,150 cataloging units. In addition to numeric codes, the standard provides names for each unit which are unique within a branch of the hierarchy. Applicability EPA programs have demonstrated a need for collecting hydrologic information to monitor water conditions, track enforcement and compliance activities, and track hazardous waste spillage and other events. Usage Many EPA systems currently use this standard to identify hydrological units. Different systems use different levels of detail within the hierarchy. For example CERCLIS uses the full detail (all 8 digits) while RCRIS only uses the top three levels (first 6 digits of the codes). There is no current EPA standard regarding hydrological units. Recommendation Since this coding scheme is widely applicable to EPA programs and is already widely used, the U.S.G.S. codes are recommended for adoption as an EPA standard. This standard will facilitate interchange of data and cross-program analysis of geological and hydrological regions. 24 ------- 2.15 Future FTPS PUB Data Element Standards The National Bureau of Standards is currently examining one new standard data element. NBS has proposed adoption of a standard for data used in mapping applications. The standard will adopt specifications developed by the Department of Interior for digital cartographic data. The standard will improve interchange of information within and among agencies analyzing land use, demographics, and other geographic information. The new rule is currently in the Proposed Rule Stage and a Notice of Proposed Rule Making is scheduled for release in September 1989. 25 ------- 3.0 Analysis of Non-NBS Coding Schemes This section defines several commonly used coding schemes that are not NBS standards, but have some relevance to EPA's data needs. The first four elements are described in FIPS PUB 19-1 as suggested schemes commonly used in business and industry and the fifth element is one that was discovered during the interview process. A brief summary of findings regarding each coding system is presented in the following sections. 3.1 Hazardous Material ID Number The hazardous material ID number is a United Nations guided code for uniquely identifying various harmful substances. Since EPA programs often track information pertaining to hazardous materials, this coding system may provide a common method of identification within the EPA and may increase consistency with international conventions. The applicability to EPA's information needs (and its relationship to current EPA standards, especially CAS Number) and the potential usefulness of this code, warrant further study by EPA into its possible adoption as an EPA data standard. 3.2 Social Security Number The Social Security Number is a nine digit numeric developed by the Social Security Administration to uniquely identify all employees of the United States. Many organizations across the United States and some systems within the EPA use this code to identify personnel. Although many systems at EPA do not keep records of specific people, there is applicability to EPA for internal staffing-related records where a systematic code would facilitate portability between systems. Due to its wide- spread use in all sectors of the U.S., the Social Security Number is the logical choice for such a standard. EPA should consider adopting SSN as a standard personnel identifier within EPA systems, while acting to ensure compliance with applicable privacy restrictions. Conversion to the SSN should be performed only where doing so causes no information loss and benefits outweigh costs. 26 ------- 3.3 Human Sexes The human sexes codes provide a consistent method for documenting the sex of people. The scheme includes codes for male, female, and codes identifying that sex is not specified or not known. The applicability of human sex codes to EPA is essentially the same as for the Social Security Number. Personnel records within EPA will benefit from increased portability with the adoption of a standard code. Subsequently, a consistent coding scheme for identifying human sexes should be adopted within the Agency. The scheme outlined in FTPS PUB 19-1 is one that provides flexibility with its additional standard codes for unknown data, and should be adopted by the EPA if it fulfills all of the Agency's information needs. 3.4 ZIP Codes ZIP Codes are five and nine digit numeric codes identifying the Postal Service areas in the United States. These codes are crucial to all mailing address information. Currently, many EPA programs have a need for mailing address information and many EPA systems store ZIP Codes for this purpose. However, some Agency systems can only store the five digit ZIP Codes and cannot accommodate the recently introduced extended nine digit ZIP Codes. Since ZIP Codes are useful throughout the Agency, they should be adopted as an EPA standard and their collection made mandatory for all mailing addresses. Also, EPA should mandate the updating of all five digit ZIP Code data elements to the new standard nine digits wherever practical. 27 ------- 3.5 Animal and Plant Taxonomic Schemes Through interviews, key Agency personnel have identified animal and plant taxonomic coding schemes as an area where development and adoption of a standard would be beneficial. Currently, there are many different coding schemes in existence throughout the government and scientific community with no clear standard emerging. BIOS and ODES both use a common coding scheme that was developed in conjunction with the National Oceanic Data Center (NODC), but usage beyond EPA and NODC is limited. The NODC scheme is a 17-digit code identifying animal species. However, this scheme may not contain the detail necessary for some EPA programs and some effort is being spent within EPA to examine the possibility of expanding the current codes to as many as 30-digits. Since the Agency has demonstrated a need for animal and plant taxonomic codes, it is recommended that EPA look further into a possible scheme that fulfills all program needs. The level of effort exerted to study these codes must not exceed the benefits to be gained from adoption of an EPA standard. 28 ------- 4.0 Current EPA Data Standards Program Development This section reviews EPA's current and proposed data standards policies and data element standards. Section 4.1 describes EPA's data standards policy efforts and ongoing work. Section 4.2 discusses the Agency's current standard data elements and elements currently being considered for adoption as standards. 4.1 EPA Data Standards Program Policies The EPA has established a Data Standards Program within its broader Data Administration Program. The policies and goals of EPA's Data Standards Program have been developed and published in both the OIRM Directives Manual - Chapter 5 and in the EPA Information Resource Management Policy Manual. Coordination of the program is the responsibility of the Office of Information Resource Management. Each Assistant Administrator, Regional Administrator, Office Director, and Senior Information Resource Management Officer is responsible for implementation, compliance and enforcement his/her organization. Procedures for developing, implementing, and enforcing data standards have not been formally adopted. A separate EPA Data Standards Catalog has also not been developed to centrally store all EPA data standards. Currently OIRM is developing a proposal on data standards procedures and has issued a draft titled "Procedures for the EPA Data Standards Program". When finalized and approved, this document will establish official EPA procedures regarding data standards. Within the last two years the EPA has adopted two data standards. These standards were published as EPA Transmittal Orders to be added to the EPA Directives System. The first standard establishes the Chemical Abstract Service Registry Number as an EPA standard data element to be collected with all information regarding chemicals. The second standard establishes record and media formats, record sequences, etc. for transmitting laboratory data within the EPA. The EPA is also working on several new and ongoing data standards tasks. 29 ------- A proposal was issued in December 1988 to establish an EPA data element standard for facility identifiers; Work is ongoing in establishing an EPA standard for geological point locations; and An Electronic Data Interchange Committee has been set up to analyze the problem of exchanging data through electronic media. 4.2 EPA Standard Data Elements The EPA has adopted two standard data elements, the Chemical Abstract Service (CAS) Registry Number and an Electronic Transmission of Laboratory Measurement Results standard. Both of these are briefly discusses below. In addition, EPA currently is working on adopting standards for geographic point locations and facility IDs and has formed a committee to study Electronic Data Interchange (EDI committee). 4.2.1 Chemical Abstract Service (CAS) Number The CAS number is a unique identifier assigned by the Chemical Abstract Service to each distinct chemical substance recorded in the CAS Chemical Registry System. It is represented as a ten digit code with the first nine characters uniquely identifying the chemical and the tenth character acting as a check digit. EPA programs have demonstrated a need for collecting information about chemical substances and their properties. Currently, the use of the CAS number is wide-spread within Agency systems. In June 1987, EPA officially adopted the CAS Number as an EPA standard data element 4.2.2 Electronic Transmission of Laboratory Measurement Results 30 ------- The Agency has published an EPA order to standardize the formats and electronic representations of data commonly used in transmitting laboratory results. The order covers EPA laboratory transmission standards regarding: media formats describing diskette and tape specifications and standard record lengths; record formats describing each record layout; definition of production runs; record sequences describing the order of records within a file; file and record integrity; date and time formats; and other transmission information. The paper also presents record layouts and field definitions for many commonly used record types. The date and time formats presented in the "Electronic Transmission..." order conflict with current FIPS standards. We recommend that both FIPS standards be adopted and that the "Electronic Transmission..." order be updated to be consistent with this change. See Sections 2.1 and 2.8 for further discussion of the FIPS standard. The standard was adopted as an EPA standard in December 1988 and published as the EPA Transmittal Order "Data Standard for the Electronic Transmission of Laboratory Measurement Results". 4.2.3 Facility Identifier In December 1988, the EPA drafted a proposed standard for facility identification codes. The proposal recommended adoption of a unique EPA facility identification scheme. This coding scheme is not based on any previous EPA or external standard, such as the DUNS ID, but rather is a unique EPA identifier comprised of the FIPS standard 2-digit state code for the facility, followed by a unique random 10-digit identification number. These codes will be assigned and tracked by the Facility Index Data System (FINDS) and its management. The facility ID standard element, when approved, must be collected as a key data element in all EPA systems containing facility information. However, adoption of this standard does not preclude using DUNS IDs and other EPA program-specific identifiers for additional information. 31 ------- Currently, this standard has not been adopted as an EPA standard data element. Green Border review was scheduled for February 1989. 4.2.4 Geographical Point Locations The OIRM Directives Manual and the EPA Information Resource Management Policy Manual state that a priority of the Data Standards Program is to establish an EPA standard for the representation of geographical point location information. Currently, the EPA is looking into adopting a standard. This paper recommends the adoption of the Latitude/Longitude method presented in FIPS Publication 70-1. 4.2.5 The Electronic Data Interchange (EDI) Work Group The EPA is currently studying possible standards for electronic interchange of data. The Electronic Data Interchange (EDI) work group is looking into methods viable for the EPA. The work group is targeting a proposal for EPA standards in late 1989. Several FTPS PUBs have been established regarding data interchange formats. We recommend that the EDI work group analyze the applicable FIPS in conducting their study. 32 ------- 5.0 Summary of Findings and Recommendations 5.1 Review of FIFS Standards and Usage After reviewing the Federal Information Processing Standards and EPA's usage of them, we recommend that eleven out of the fourteen FIPS be adopted as internal EPA standard data elements. Each recommended standard should be implemented wherever conversion is practical and beneficial. The standard regarding Standard Industrial Classification codes should be reviewed for applicability to each system and implemented where no information is lost in conversion. Where the current coding system is significantly different or more detailed, a mapping should be developed between the current codes and the SIC codes. Each of the three FIPS not recommended for adoption are not currently applicable to EPA's program needs. If a future need arises, these FIPS should be reviewed again. The current usage of the recommended FIPS standards across EPA systems ranges from nearly full compliance in the case of MSAs and Congressional Districts to wide-spread non-conformity as with Populated Places, and Calendar Dates. There are several reasons for the non-compliance with the FIPS: system is older than the standard; system uses a coding scheme from a previous system; standard does not fulfill information needs; and standards information is not widely known. System is Older Than the Standard Several EPA systems date back far enough that they predate some or all of the relevant standards. As such, these systems had little guidance on choosing appropriate standards at their inception. Some systems incorporated the FIPS 33 ------- during later revisions, while others have not. Currently, there are several systems that still do not conform to the FIFS for this reason, including, for example, NEDS. System Uses a Coding Scheme From a Previous System Several systems do not comply to the FIFS because they have adopted non- standard coding schemes or formats from predecessor EPA systems. In the air program, the legacies of SAROAD's and NED's coding structures are found in many other air program systems. The copying of coding schemes from system to system has led to some consistency within program offices but has also prolonged non- standard habits. Standard Does Not Fulfill Information Need Some EPA systems have developed special coding schemes because the codes presented in the FIFS PUBs do adequately satisfy a program need. Some programs have devised their own coding schemes that represent data in a significantly different manner than a FIFS standard does. These schemes would lose information in a conversion to a FIPS standard. Therefore, these schemes should not be forced to comply with an inadequate FIFS standard, but rather should be treated as separate data elements. Mappings between the specially developed codes and the FIPS is encouraged where possible. Other coding schemes are merely expanded or consolidated versions of FIPS. These schemes either increase the level of detail by adding subdivision codes or decrease the level of detail by creating appropriate groupings of codes. The practice of modifying FIPS standards is encouraged as a way to meet program needs without creating entirely new coding systems. This provides the information needed for the program while sacrificing as little consistency and portability as possible. Mappings between the modified schemes and the FIPS is encouraged. 34 ------- Standards Information Is Not Widely Known One of the major reasons cited for non-conformance is that information about both EPA standards and the Federal Information Processing Standards is not widely known among EPA personnel. Many personnel interviewed did not know where to find information about data standards or that EPA had any formal policies regarding the use of standard data elements. Most of those interviewed agreed that formal EPA data standard policies and promulgations are distributed poorly to the personnel who need them, and that in general EPA training of systems personnel regarding policies is insufficient. Of the personnel who did know about EPA's data standards effort, most identified EPA experience and informal lines of communication as the sole source of their knowledge, rather than formal notification or training. In general, data standards information is not consistently getting to the people who need it most. It is recommended that EPA continue to develop its data standards program by analyzing the current procedures for information dissemination and by adopting new methods where necessary. 5.2 Review of Other Non-NBS Standard Coding Schemes In addition to the FIPS standards, several non-NBS standards referenced in FIPS 19-1 have been recommended for EPA adoption or further study. The recommended elements are ZIP Codes, Social Security Numbers and standard codes for human sexes. The hazardous material ID from the United Nations is recommended for further review. 5.3 Review of Data Standards Program Overall, the EPA's Data Standards Program is still in its development stages. EPA has developed general policies for the goals of the program, but still has not fully translated these goals into procedures and results. In the last two years, the Agency has begun to adopt standard data elements and publish them in EPA 35 ------- Transmittal Orders, but has not developed a central Data Standards Catalog or an effective distribution plan that disseminates the information to all key personnel in the Agency. The EPA is currently working towards adopting formal data standards procedures, and should continue to do so to attain the program's goals. 36 ------- |