>EPA
            United States
            Environmental Protection
            Agency
             Administration And
             Resources Management
             (PM-211D)
220 B-92-006
March 19S2
Locational Data Policy
implementation Guidance

Guide To Selecting Latitude/
Longitude Collection Methods
                                 Printed on recycled paper

-------

-------
                              Contents

PREFACE.	.	i-1

EXECUTIVE SUMMARY	ii-1

Chapter 1      BACKGROUND	.	1-1

2.2    Geocoding Study Methodology	2-2

Chapter 2      AUDIENCE AND PURPOSE	2-1

Chapter 3      BASIC TERMS, CONCEPTS AND DEFINITIONS	3-1
Chapter 4      IMPORTANT THEMES: PROGRAMMATIC	4-1
              DIVERSITY AND TWO ORGANIZATIONAL
              APPROACHES TO GEOCODING

Chapter 5      THE GEOCODING LIFECYCLE.	5-1

5.2    Incremental field-Based Geocoding	5-2
5.2    Centralized Geocoding	5-2
53    The Geocoding Lifecycle	5-3

Chapter 6      GEOCODING METHODS, CAPABILITIES, REALISTIC	6-1
              COSTS AND ACCURACIES

Chapter 7      CROSS CUTTING IMPLEMENTATION ISSUES	7-1

7.2    The Need For A Geographic Reference Standard	7-2
7.2    Urban vs.  Rural Geocoding: Productivity, Accuracy,	7-2
      And Cost  Differences
7.3    Performing In-House Geocoding Or Contracting For	7-3
      Specialized Geocoding Services
7.4    EPA's Regulated Universe; Who Carries The Geocoding Burden?	7-3
75    Accuracy Checking of Locational Data	7-5
7.6    Secondary  Data Users, Multimedia Data Integration, Public 	7-8
      Data  Access, and  Enforcement  Data Requirements
3/92

-------
                    Contents  (continued)

Chapter 8      A FRAMEWORK FOR ESTIMATING GEOCODING	8-1
              LIFECYCLE COSTS

8.1   Characterize Existing Records	8-1
8.2   Define Geocoding Requirements	8-2
8.3   Select/Define Geocoding Methodology	8-2
8.4   Estimate Geocoding Lifecycle Fixed and Variable Costs	8-2

Chapter 9      SUMMARY	9-1

Appendix A    GEOCODING METHODS FACT SHEETS	A-l

Appendix B    REFERENCES	B-l
3/92

-------
                  List of Figures and Tables
Figure 1    The Geocoding Lifecycle	5-3
Figure 2    Comparative Geocoding Costs And Accuracies	6-1
Figures    Bounding Boxes	7-7
Figure 4    Generic Geocoding Cost Estimation Model	8-3
Figure 5    Spreadsheet Version of Geocoding Cost Estimation Model	8-6
Table 1     EPA Data System's Locational Data Inventory Summary	7-5
3/92

-------
                               Preface
This version of the Guide  to Selecting Latitude/Longitude Collection Methods (the
"Geocoding Study")  reflects the many comments and suggestions received in
response to a prior draft version dated January 29, 1992.  The major change to this
version is the incorporation of review comments from EPA's Office of Information
Resources Management (OIRM) and Office of Pollution Prevention, Pesticides, and
Toxic Substances (OPPPTS).  Jeff Booth was the EPA  project manager for this
document and is part of the Program Systems Division of OIRM.

Support for the  development  of this document was  provided by  American
Management Systems,  Inc.  under EPA contract #68-W9-0039, Delivery Order #025.
and Booz, Allen & Hamilton, under  EPA contract #68-W9-0037, Delivery Order
#094.
3/92                                i-1

-------
                        Executive   Summary

In May, 1990 EPA's Office of Administration and Resources Management (OARM)
added a new chapter to the agency's Information Resources Management (IRM)
Policy Manual. This chapter established the principles and methods underlying the
collection and documentation of locational data.  Following the addition of, and in
response to that chapter, EPA initiated a geocoding study to analyze the realistic
capabilities of various geocoding technologies and methods.  "Geocoding" is the
general term applied to different procedures, techniques, and technologies that are
used to identify, quantify, and document locational coordinates.

The geocoding study was based upon literature  reviews and vendor solicitations
that were  performed to establish  a complete  array of  applicable geocoding
methodologies and their associated advantages  and disadvantages.   Five major
geocoding  methodologies  were chosen for review: Photo Interpretation, Map
Interpolation, Global Positioning Systems (GPS), Address Matching, and ZIP Code
Centroids.  The agency encourages the use of GPS technology specifically because the
cost over time is expected to decrease while the ability to bolster skills and accuracy
will increase.

This guide  describes each of these methods according to description, accuracy, cost,
benefits and limitations.  It also provides technical information about geocoding to
organizations responsible for implementing EPA's Locational Data Policy (LDP).
Both  managers and  technical  personnel  will  find this  information useful  in
selecting cost-effective geocoding methods.

In order to understand specific geocoding  technologies, managers and technical
specialists  must understand  that environmental  programs are responsible for a
diverse set of regulated entities and that the locational data collected for these
entities has different organizational uses in  response to regulatory and legislative
requirements.  Therefore, environmental organizations and government agencies
have different locational data accuracy and quality needs which may influence their
choice of geocoding technology(ies) or approach to implementing that technology.

The objective of this guide is to:

      •     Inform EPA program offices, States,  and other parties affected by the
            Locational Data  Policy, about the geocoding life cycle (i.e., the steps
            required to produce locational data independent of specific geocoding
            technologies or methods).

      •     Describe the practically  achievable  capabilities, realistic  costs and
            accuracies, of different geocoding methods/technologies.

      •     Identify cross cutting geocoding issues.
3/92                                 ii-1

-------
      •     Provide a  framework for designing and estimating the cost of a
            geocoding project or program consistent with EPA IRM policy.

      •     Review some locational data accuracy checking methods.

In addition, the guide  provides basic terms, concepts and definitions regarding
geocoding.

Geocoding has its own well-defined life cycle, a generic process independent of
specific technologies or methods, that provides a realistic framework for estimating
the true cost of implementing one or more of the available technologies.  It also
provides a fair basis of comparing geocoding methods.  The way in which the life
cycle will be employed, however,  depends on the type of approach a manager or
technical specialist is using. There are  two fundamental organizational approaches
to geocoding: 1) Incremental Field-based Geocoding, and 2) Centralized Geocoding.
These approaches affect the way data is collected as well as the cost and time required
to collect locational data.

In comparing  different geocoding methods, managers and technical personnel
should  be aware  of  the  differences between urban  and rural geocoding, and
understand issues involving the  ability to  perform in-house geocoding  versus
contracting specialized geocoding services.

It is important not only to understand the specifics of different geocoding methods,
but also to realize the lack of locational data contained in existing databases covering
regulated entities.  An OIRM-sponsored regulatory review of EPA spatial data
requirements concluded that States provide  the majority of locational data, followed
by  the  regulated  community, EPA headquarters,  and Regional  offices.    The
Locational Data Policy encourages the integration of data based on location, thereby
promoting use of  EPA's extensive data  resources for  cross-media environmental
analyses and  management decisions.   The  cross-media benefit  that the  policy
encourages will allow EPA program offices to share data collection responsibilities.

Because different geocoding methods can vary widely in applicability, accuracy, and
cost, managers and technical specialists may  need to characterize existing records,
define geocoding requirements, and select/define geocoding methodology in order
to estimate geocoding  life  cycle costs.  Ultimately, estimating these costs depends
upon defining and estimating fixed and variable costs.
3/92                                 ii-2

-------

   Chapter I
BACKGROUND


-------
                       1.    BACKGROUND

On May 17, 1990, after formal agency-wide review, the Locational Data Policy (LDP)
became effective as an official directive under the 2100 series and Chapter 13 of the
EPA Information Resources Management (IRM) Policy Manual.  Its stated purpose is
to establish  "... the principles  for collecting and documenting latitude/longitude
coordinates for facilities, sites and monitoring and observation points regulated or
tracked under Federal  environmental programs within the jurisdiction of the
Environmental Protection agency (EPA)."

Pursuant to this policy  initiative, a Locational  Accuracy Task Force (LATF) was
formed in June of 1990 to develop a minimum locational data accuracy goal for EPA
and  to make recommendations to the EPA  IRM  Steering Committee  about
techniques for collecting locational data (i.e., geocoding).  "Geocoding" is the general
term applied to different procedures, techniques and technologies that  are used to
identify, quantify, and document locational coordinates.   The findings of the LATF
are incorporated into EPA's Locational Data Policy Implementation Guidance.

In the  same  time period, EPA  initiated  a  study to analyze the realistic capabilities
and  costs of alternative geocoding technologies and methods.   This document
reports on the findings of  that geocoding  study by providing objective information
and  data about currently available  geocoding technologies.   This document is
designed to assist organizations better understand technical, cost, and organizational
issues that will directly impact their ability  to comply with the LDP.

Two "themes" should be kept in mind while reading this document:

      •     Programs within the agency are responsible for regulating a diverse set
            of entities and, therefore,  have a diverse set of locational accuracy
            needs.

      •     There are two fundamental approaches or combinations of approaches
            for  obtaining locational  coordinates: incremental and centralized.
            Programs must understand  their own organizational  requirements
            before deciding which approach to take.

These themes are presented in detail in Chapter 4.

1.1  Geocoding  Study  Methodology

The geocoding study is the basis for most  of the recommendations in this report. It
was conducted in six basic steps:

      •     Literature review.
      •     Study site selection.
      •     Existing EPA locational data in EPA data compilation.
3/92                                 1-1

-------
      •     Geocoding database creation.
      •     Geocoding test data analysis.
      •     Error/accuracy checking methods assessment.

These steps are described in detail below.

Literature  reviews and vendor solicitations  were performed  to  establish the
universe of applicable geocoding methodologies and their inherent advantages and
disadvantages.  Five major  geocoding  methodologies were chosen for review:
Photo Interpretation, Map Interpolation, Global  Positioning Systems (GPS), Address
Matching and ZIP Code Centroids.  In addition, a description of photogrammetry
has been added to this report.  The reviews and  solicitations were integrated and are
contained in the Geocoding Methods,  Capabilities,  Realistic  Costs and  Accuracies
section of this report. Loran-C was excluded from this study because the system is:

      •     Not accurate enough to reliably provide locational data within the
            agency's 25 meter goal.
      •     Not widely used within the agency.
      •     Being supplanted by other technologies.

To  identify study sites with previously collected and well-documented accurate
locational data, EPA Geographic Information System (GIS) Teams were solicited for
candidate study sites.  As a result of this solicitation, three study sites were chosen:
Chattanooga, TN, Nashua River Basin, CT, and San Gabriel,  CA.  The Chattanooga
study area was selected because over 1000 business entities in the study area already
had airphoto-interpreted locational data coordinates. Nashua and San Gabriel were
selected  because  both were slated for Global  Positioning System (GPS) surveys
within the time-frame of this project.

Data from various EPA systems were collected and integrated. Appropriate subsets
were  made from EPA data systems, including AIRS, Biennial  Report, CERCLIS,
FINDS, IFD, PCS,  RCRIS, TRIS,  and, in  the case of Chattanooga, an existing
integrated Chattanooga database created in 1985.  These data were chosen because
they are most characteristic of regulated entities  in EPA's data holdings.  Since many
of the data systems contain different facility identifiers, matching facility records
across the data bases  was a semi-automated process.  Matches were automatically
made when possible, based on ID numbers. Once all possible automated matches
were processed, further manual matching was accomplished based on the name and
address of each remaining record.

To  supplement the existing locational data  in  each study site, prioritized lists of
regulated entities in each area were  prepared and sent to EPA GPS  survey teams.
These  teams collected GPS data for Chattanooga, Nashua and San Gabriel.  In
addition to GPS  data, address-matched  locational data were solicited  from  three
different vendors for all facilities with accurate address records in the study sites.
3/92                                  1-2

-------
The accuracy of each methodology was assessed by assuming one set of coordinates
was  "truth" and then comparing all other coordinate pairs  to  that set.   From
literature reviews and professional  expertise, GPS was assumed to be the most
accurate geocoding methodology studied because it represents the highest order of
accuracy technology available. The second most accurate method was assumed to be
Photo Interpretation/Map Interpolation based upon the scale and resolution of
images  used in the study.  Locational discrepancies under 50 meters could not be
resolved due to differences in measuring place.  Based on these assumptions, GPS
coordinates were used as "truth" if they were available for an entity. Otherwise, the
photo-interpreted coordinates were used as "truth."

A cost estimation model  was developed to compare the fixed (i.e., equipment) and
variable (i.e., labor) costs associated with each  geocoding method. For a centralized
geocoding process, a screening process was described that classified entities by the
expected difficulty to geocode them.  Based  on  the  classification, the most cost-
effective geocoding method could be applied to each class of entities.

Accuracy assessment methods  were reviewed  and tested using  existing EPA
locational data. Methods investigated included distance from  ZIP Code centroid,
point-in-polygon tests, and bounding box tests. All these tests essentially predict the
"reasonability" of locational data coordinates, although the accuracy of these
measures is  highly variable.

A questionnaire was distributed to EPA data system managers to collect information
about the status of locational data in EPA data systems.  Responses from  23 data
systems, representing 2.9 million  data records  were summarized (Table 1, page 7-6)
to characterize the nature and sources of locational data in EPA data systems.
3/92                                  1-3

-------

                       Chapter 2
               AUDIENCE AND PURPOSE
-sf.-.-f.-.-.~f.w.-.>f.-.-.-sf.
-------

-------
              2.    AUDIENCE  AND  PURPOSE

This Guide to Selecting Latitude/Longitude Collection Methods is written for those
organizations which are responsible for implementing EPA's Locational Data Policy,
including:

            EPA National System Administrators.
            EPA Program Management Branch Chiefs.
            EPA Regions and States.
            EPA contractors.
            The regulated  community.

The purpose of this Guide is to provide the reader with technical  information about
geocoding  technology.   This  information will be presented in terms that are
understandable both to management and technical personnel  in  order to assist in
selecting cost-effective geocoding methods. There are five specific  objectives:

      •     To inform EPA program offices, States, and other parties affected by the
            LDP, about the geocoding life cycle (i.e., the process required to produce
            locational data  independent of specific geocoding technologies  or
            methods).

      •     To describe the practically achievable capabilities, realistic costs, and
            accuracies of different geocoding methods/technologies.

      •     To identify cross-cutting geocoding issues.

      •     To provide  a framework for designing  and estimating the cost of a
            geocoding project or program consistent with EPA IRM policy.

      •     To review selected locational data accuracy checking methods.
3/92                                 2-1

-------


                  Chapter 3
 BASIC TERMS, CONCEPTS AND DEFINITIONS
WAv««»w««*»»*?«w^


-------
          3.   BASIC TERMS,  CONCEPTS  AND
                              DEFINITIONS

Geocoding,  similar to other  "high technologies," has its own unique  language.
There  are many  ill-defined  terms  and  phrases which can be confusing or
misleading.   Below, definitions of key  terms and concepts commonly used
throughout  this document  are provided to limit some  of the potential  confusion
resulting from indiscriminate use of terminology:

      •     Geocoding -- The application of procedures,  techniques,  and
            technologies  for the  purpose  of identifying,  quantifying,  and
            documenting geographic location or boundaries of a physical entity
            (e.g., facility, outfall pipe, Superfund site).

      •     Latitude/Longitude -- Latitude and longitude refers to the global
            reference system used to locate  objects  on the surface of  the earth.
            Positions are referenced by the number of degrees north or south of the
            equator (latitude) and east" or west of the prime meridian (longitude).
            Other commonly used reference systems  include Universal  Trans-
            Mercator (UTM) and State Plane.  Software  exists to aid conversion of
            existing data to latitude/longitude coordinates.

      •     Global Positioning Systems (GPS) — Global Positioning Systems are
            systems which derive location based on ground  position relative to
            earth-orbiting satellites.

      •     Selective Availability --  Selective availability  is  the intentional
            degradation of  the performance capabilities of satellite systems, such as
            GPS, for civilian users by the U.S. military, accomplished by  artificially
            creating a significant clock error in the satellites.

      •     Address Matching — Address matching is  a semi-automated process for
            deriving latitude/longitude coordinates from street addresses.

      •     Photogramntetry -- Photogrammetry is a process for deriving reliable
            measurements  by locating and measuring the position of an object on
            aerial photographs.

      •     Map Interpolation/Photo Interpretation - -  Photo Interpretation/Map
            Interpolation is an integrated technique  by which the user transfers
            information from an aerial  photograph on to a map base and  then
            extracts coordinate information via standard manual or  digital  map
            interpolation techniques.
3/92                                3-1

-------
            Land  Surveying — Land Surveying is  a field-based process  for
            determining  location  from  direct  measurements from a  known
            baseline (typically geodetic survey monuments).

            Geocoding Accuracy — Geocoding accuracy is a measure of the degree to
            which a geocoding process yields the true location of an object.  If a
            process has a 25 meter accuracy, then the true location of the object
            must be within 25 meters of the reported location.  For EPA, accuracy
            with a 95% level of confidence is espoused for locational data.

            Geocoding Precision — Geocoding precision is a measure of the degree
            to which repeated measurements of an object's location yield the same
            result.

            Secondary Use — The re-use  of existing data in an application other
            than that for which it was primarily collected.  Examples of typical
            secondary use  applications  include comparative risk studies and
            enforcement  support.
3/92                                 3-2

-------
                   Chapter 4
IMPORTANT THEMES: PROGRAMMATIC DIVERSITY AND
       TWO ORGANIZATIONAL APPROACHES TO
                   GEOCODING

-------
     4.  IMPORTANT  THEMES:    PROGRAMMATIC
         DIVERSITY  AND  TWO  ORGANIZATIONAL
                   APPROACHES  TO GEOCODING

Two important themes need to be understood before considering specific geocoding
technologies. First, environmental  programs are responsible for a diverse set of
regulated entities  (e.g., facilities,  air stacks,  outfall pipes,  monitoring wells,
underground storage tanks, landfills, etc.).   Organizational uses for locational data
associated with these  entities  differ in  response to legislative  and regulatory
requirements,  as well as overall mission (i.e., Federal, state, regulated community,
etc.).  As  a  result,  environmental  organizations  and government agencies have
different locational data accuracy  and quality needs which may  influence their
choice of geocoding technology or approach to implementing that technology. It
also should be  noted  that the  LDP guidance recommends that  environmental
organizations  consider secondary uses for "their" locational data when selecting a
geocoding method.

The following comparison is provided to illustrate differences in programmatic
requirements for locational data.   The Toxics Release Inventory System (TRIS) in
EPA's Office of Pollution Prevention, Pesticides,  and  Toxic Substances (OPPPTS)
currently  requires a single pair of latitude/longitude coordinates to geographically
define the location of a reporting  facility, regardless of its size.  In comparison,
Superfund  site managers  and environmental  engineers  require numerous
locational coordinates, including depth and elevation data,  to characterize a wide
variety of natural and man-made phenomena within or around the boundaries of
each site.

TRIS and Superfund program managers must adhere  to the LDP.  TRIS data are
used primarily to ensure a complete national inventory and to support a variety of
national, regional, and state-level planning analyses. Superfund data are used for
site-specific  engineering  studies to determine the  location  and  extent  of
environmental contamination  in order to provide  a scientific basis for removal or
remedial actions.

Although both programs require high-quality locational data, specifications for
amount, type (i.e., points, linear, polygonal, 3-dimensional elevation, or depth data),
and accuracies/precision of data differ dramatically. For example, many Superfund
site analyses require more accurate locational data than  EPA's  25  meter goal.  In
contrast, primary use requirements  of OPPPTS for TRIS locational data may lack
justification for  25 meter accuracy.  Other offices and  systems  have a mix of
orientation (facility and non-facility level).  For example,  locational data in the
Permit Compliance System (PCS),  supporting  EPA/Office of Water's National
Pollutant  Discharge Elimination System (NPDES) Program, are of a mix  of both
facilities and outfall pipes.  The extent to which secondary use requirements and
data  integration  efforts  substantiate the  need for   25 meter  accuracy  in
3/92                                4-1

-------
latitude/longitude data (when the primary use does not) remains to be determined.
Twenty-five meter accuracy is  an agency-wide goal; however, it  should be  met
wherever possible.  Procedures for addressing these issues, planning  for compliance,
and/or obtaining waivers (if necessary) are  presented more fully in the Locational
Data Policy Implementation Guidance ~ Guide to the Policy.1

The second theme that needs to be understood is that there are two fundamental
organizational approaches to geocoding: incremental field-based geocoding,  and
centralized geocoding.   Incremental field-based  geocoding is defined as the
collection of locational data by EPA officials or their representatives during regularly
scheduled  field  work (e.g.,  site inspections,  compliance  monitoring,  site
characterization, remedial or  removal  actions,  etc.).   In contrast, centralized
geocoding is defined as a dedicated locational data collection effort used to populate
a data base in one, unified effort,  This Guide  focuses primarily  on centralized
geocoding.  During  its deliberations, the  LATF stated  its preference for the
incremental approach, although both have  strengths  and weaknesses. Chapter 6
presents more information about these approaches.  This document highlights the
impact of these geocoding approaches on various technical and management issues.
1  Please note that the study which underlies much of the information provided in this document focused almost
   exclusively upon "point" data, or at least the notion of representing regulated entities, regardless of size or geographic
   extent, with one latitude/longitude coordinate pair. Therefore, those organizations seeking guidance on developing
   geographic base files containing polygonal data, also subject to the LDP, may have requirements that fall somewhat
   outside the scope of this Guide.


3/92                                    4-2

-------
        Chapter 5
THE GEOCODING LIFECYCLE


-------
           5.   THE  GEOCODING  LIFE  CYCLE

Geocoding has its own well-defined life cycle, a generic process independent of
specific technologies or methods.   The geocoding life cycle provides a  realistic
approach for estimating the true cost of implementing any geocoding technology. It
also provides a fair basis of comparing geocoding methods. Before describing the six
phases of  the geocoding life cycle, however, it  is important to understand the  two
organizational approaches  to geocoding: incremental field-based  and centralized
geocoding.    This document is more easily  focused on centralized approaches
although  an incremental  approach has its  own benefits  and  limitations.   A
description of each is given below.

5.1  Incremental   Field-Based   Geocoding

Incremental field-based geocoding is a process of collecting locational data during
planned  field work.    EPA personnel, state  partners,  or  other  delegated
representatives already perform field work for various purposes such as facility
inspections, compliance monitoring,  and enforcement actions. Incremental field-
based geocoding becomes an additional function or responsibility of field personnel
and takes  advantage  of the opportunity presented by  their already-established
presence in the field.  This approach to geocoding is incremental in the sense  that
field trips to  regulated facilities and operating units occur intermittently on a
selective basis.

There  are two  primary  benefits  of the incremental field-based  approach to
geocoding:

      •     The overhead costs associated with geocoding are reduced by limiting
            the significant administrative and  travel costs associated  with a more
            centralized approach (see below).

      •     Reliance on existing field personnel to perform geocoding produces
            certain efficiencies, particularly with their enhanced knowledge of the
            targeted facilities or regulated entities. One field crew can potentially
            geocode all regulated entities in a given location during  a single  site
            visit.

The two primary limitations of incremental field-based geocoding are:

      •     The  time  required for completion of a geocoding effort  may be
            extensive  because of the sample-based approach to  site inspections
            typically used by EPA program offices.

      •     Most  EPA field inspectors are  not trained surveyors or geocoding
            technologists, a fact that may impact their productivity or the quality of
            geocoded data.
3/92                                 5-1

-------
5.2  Centralized  Geocoding

In contrast to incremental geocoding, ihesole  objective of centralized geocoding is to
collect locational data for  a  specific geographic region and/or class of entities.
Because it generally is not performed  in conjunction with any other activities,
centralized geocoding  requires an independent organizational infrastructure and
management process apart from  existing programs.  The purpose of centralized
geocoding is to collect all of the targeted locational data in a concentrated effort.

The  ongoing effort of  the  National  Pollutant  Discharge Elimination  System
(NPDES) Program in EPA's Office of Water  to geocode the location of all  NPDES
major industrial  facility dischargers and  their outfall pipes is a prime example of a
centralized geocoding approach.  This effort involved the mass  mailing of several
thousand U.S.  Geological Survey  (USGS) topographic  maps  (with detailed
instructions)  to NPDES facilities, the marking of facility and outfall pipe locations
on these maps by the facility managers, and the digitizing of these data by EPA. The
project has continued for more than a year with the geocoding of more than 7,000
facility and outfall pipes.

The primary benefits of a centralized approach to geocoding are:

      •     Geocoding projects can be accomplished relatively quickly (as compared
            to an incremental field-based approach) with limited impact on other
            programmatic  functions.

      •     A limited  number of personnel  needs  to be dedicated to a geocoding
            effort, and provided  the necessary  technical  training.   Their
            productivity and efficiency will improve steadily as they gain
            experience throughout the geocoding life cycle.

      •     A higher degree of management oversight and quality control can be
            exercised   with  a  centralized  geocoding  project  than  with  an
            incremental field-based approach.

      •     On  larger  geocoding projects, the price-per-point drops over time until
            a "steady state" is reached.

The primary limitations of a centralized approach to geocoding are:

      •     Centralization usually removes the component of  on-site familiarity
            and thus may adversely impact data quality.

      •     Centralized geocoding projects may require a disproportionate  amount
            of resources for management overhead because they are not coupled
            with other activities  that are  being performed.   This management
            overhead  includes activities such  as  the  creating  an organizational
3/92                                 5-2

-------
            work flow, and handling production logistics such as monitoring data
            and maps, quality assurance, and spread sheet design.

5.3  The  Geocoding  Life Cycle

As indicated in Figure 1, the geocoding life cycle has six major components.  Each
component "feeds" the subsequent one until the existing inventory is updated with
new (or  improved) locational data.  This process continues  until all  data  are
collected or more rigorous data quality objectives (DQOs) are established. Higher
expectations for data quality may be a function of improved technology that makes it
cost effective to improve the quality of existing locational data. Organizations which
initially comply with the 25 meter accuracy goal of the LDP may only be required to
follow the geocoding life cycle once. After  the locational data are incorporated into
the data base,  the initial  work is complete.   Updates and additions should be
expected on a routine basis.  The six components of the geocoding life  cycle are
defined below:














Data Quality
^^ Objectives ^V.
Incorporate Take
lnventory Inventory
The
Geocoding
Lifecycle
Geocode QA
Entities Inventory
^s^Reconnoiter ^^
^ Entities

Figure 1
The Geocoding Life Cycle














3/92
5-3

-------
      •     Establish DQOs— Determine the level of locational accuracy necessary
            for the intended primary and secondary use(s) of the data. The LDP has
            an accuracy goal of 25 meters. Each organization should assess its own
            programmatic requirements, consulting the full set of LDP Guidance
            materials, to determine if the agency's 25 meter accuracy goal helps or
            hinders their efforts to fulfill their mission.  It also is crucial for them
            to give additional consideration to secondary users and data integration
            efforts.   These issues  should be  addressed in each program's  LDP
            Implementation Plan.   A waiver from the  LDP for  organizations
            seeking a less rigorous accuracy requirement for their locational data
            may be possible.

      •     Take Inventory — Determine the entities that need to be geocoded.

      •     Take a QA Inventory — Determine whether existing address records are
            current and accurate.  Physical site addresses are needed to perform
            address matching instead of mailing addresses.

      •     Reconnoiter Entities — Determine, in a general way,  where the target
            entities are located (e.g., on which maps, driving directions for  field
            survey teams, etc.).

      •     Geocode Entities — Assign latitude/longitude values to targeted entities
            by  applying  a  specific  geocoding  technology (e.g.,   GPS,  map
            interpolation, address matching, etc.).

      •     Incorporate  Inventory — Perform data QA/QC, check  accuracy of
            geocoded data, and incorporate data into national or programmatic data
            systems.

The geocoding life cycle  applies both  to incremental field-based geocoding and
centralized geocoding.  The way in which the life cycle is implemented may vary,
however.  Some of  the  differences have  been discussed  above  (e.g., time  or
personnel factors). For either approach, it is incumbent upon planners of geocoding
projects to explicitly address how they will implement the geocoding life cycle  cost-
effectively to achieve their established DQOs.
3/92                                  5-4

-------
                  Chapter 6

GEOCODING METHODS, CAPABILITIES, REALISTIC
            COSTS AND ACCURACIES

-------
   6.    GEOCODING  METHODS,  CAPABILITIES,
           REALISTIC COSTS AND  ACCURACIES

Geocoding technology is changing rapidly.  In recent years powerful, new techniques
have become commercially available. Global Positioning Systems (GPS) are the most
highly visible of these techniques.  The LATF recommended GPS  as the preferred
geocoding technology for the agency, although they agreed that alternative methods,
such as  map interpolation or address matching, may be appropriate for  certain
activities (e.g., geocoding  FINDS).  A number of geocoding methods have been
reviewed to estimate their cost and accuracy including: GPS, Photogrammetry, Map
Interpolation, Photo Interpretation/Map Interpolation, Address Matching and ZIP
Code Centroid.   Appendix A presents  descriptions of each geocoding method,
summarizes  their  reasonably achievable  accuracies, provides cost estimates, and
highlights benefits and limitations.


1
0

OCL
10
i
0 1-


Add res
A
1000 1

_j***
>
s Matching
3 Centroids
^

Local
25 mt
i
dap Interpolation i/
	 ^ ^"
Photo Interpret

ynal Data Policy
tergoal
DO 10 1
Accuracy (meters)
• 	 GPS
at ion

/



0.1

Figure 2
Comparative Geocoding Costs and Accuracies



3/92
6-1

-------
Figure 2, on the previous page, succinctly portrays the two primary  evaluation
parameters for selecting a geocoding technology: cost and accuracy.  As the graph
clearly illustrates, only Photo Interpretation and GPS consistently comply with or
exceed EPA's 25 meter accuracy goal.  Of some interest is their equally high costs per
point relative to alternative, less accurate geocoding methods.
3/92                                  6-2

-------
               Chapter 7
CROSS CUTTING IMPLEMENTATION ISSUES

-------
      7.   CROSS  CUTTING  IMPLEMENTATION
                                   ISSUES

A number of issues have been identified that apply to all geocoding methods. Each
of these issues has a  dramatic  impact on programmatic geocoding strategies,
procedures, and the selection/implementation of specific methods.

7.1  The  Need  for a  Geographic  Reference
      Standard

One of  the  fundamental  problems  encountered  in  documenting locational
coordinates during the geocoding study was  the lack of a geographic reference
standard for EPA facilities or other entities to be geocoded.  A geographic reference is
the actual position where  geocoding occurs.  For example, the data analyzed in the
geocoding study was derived from either a (theoretical) centroid, street entrance, or
approximated block face of a given facility/entity. These differences are attributable
to several different factors:

      •    Some geocoding technologies  have  defacto geographic references;
           address matching,  for example, at best approximates the location of an
           entity somewhere along a city block face.

      •    Entities vary  greatly  in size and dimension; an outfall  pipe may be six
           inches in diameter while a large industrial plant may  cover hundreds
           or thousands  of acres.

      •    Access to regulated entities may be difficult and, therefore, may affect
           the location of a geographic reference; high fences, challenging terrain,
           or limits on access to a  property may require geocoding personnel to
           compromise the ideal geocoding position.

An inherent problem posed by  a lack of geographic reference  standard is  the
difficulty of  assessing positional accuracy.  It also handicaps the ability to
subsequently compare locational data that are based on different  geographic
references.  As a practical  matter, however, a single geographic reference standard is
inadequate.  A  workable geographic reference standard must recognize the real
world limitations (as described above) and provide a hierarchy for implementing
and documenting the reference standards employed.  In lieu of a national standard,
a consistent geographic reference, with well defined options, must be employed for
each geocoding project.  These references should be described explicitly in the LDP
implementation plans prepared  by each program.
3/92                                7-1

-------
7.2  Urban  vs.  Rural  Geocoding:     Productivity,
      Accuracy,  and  Cost  Differences

The productivity, cost, and resulting accuracy of geocoding projects depend upon a
number of factors  including the technology employed, type and training of key
personnel, nature of the regulated entities, and the geographic setting (urban versus
rural).  Experience has demonstrated that geocoding in rural and urban settings
presents different problems, impacting productivity, accuracy, and cost, sometimes
independent of the technology employed.

Address matching, GPS, and map interpretation are "sensitive" to urban versus
rural geography for several reasons:

      •     Urban environments are very dense and street address data bases are
            available that support geocoding accuracies to the nearest city block face
            or finer; rural environments often lack numerical addressing schemes
            (e.g., RD#4 vs. 123 Main Street) and street intersections can be "few and
            far between," thus diminishing positional accuracy even where address
            geocoding is technically feasible (which, for many rural areas, it is not).

      •     Urban "canyons," much like real world  canyons, can interfere with
            radio transmissions and severely limit, or even preclude,  GPS-based
            geocoding.

      •     Reconnoitering can be difficult in any setting; remote rural settings can
            be particularly difficult to locate thereby reducing overall productivity
            and raising costs.

      •     National "coverage" maps such as the USGS l:24,000-scale topographic
            maps tend to be more current in urbanized areas.  The presence of
            greater numbers of recognizable landmarks represented on these maps
            contribute to higher-order accuracies in map interpolation.

These issues are only some examples of the differences encountered in urban versus
rural geocoding.  Lessons learned from urban and rural geocoding projects should be
incorporated into future geocoding project plans.  In some cases, different geocoding
methods may be preferable in rural areas than those for urban areas.  For example,
where address matching may satisfy a geocoding requirement in an urban setting,
GPS or photogrammetry may be preferable in  rural settings.  Using  alternative
geocoding  technologies  optimized for unique geographic  settings may be more
productive  and may yield more accurate results.
3/92                                7-2

-------
7.3  Performing  In-House  Geocoding or
      Contracting  for   Specialized  Geocoding
      Services

In recent years, an industry has developed  to support geocoding requirements.
Manufacturers of geocoding technology (e.g.., GPS, analytical stereoplotters, GIS-
based address matching) are selling state-of-the-art tools that enable performance of
a full range of geocoding functions.  In addition, consultants and specialized
geocoding service bureaus have  emerged  so organizations can perform their own
geocoding or contract for  geocoding services.  Two prerequisites to making a
decision to perform geocoding in-house or through a professional contractor are
described below.

      •    Definition of  DQOs  — A DQO statement will  help determine the
           technologies of choice and, indirectly, decide whether service "bureau"
           or consultant options are preferable.

      •    Choice of an Incremental Field-based or a Centralized Geocoding
           Approach ~ A  decision to pursue an incremental approach probably
           implies that the responsible organization (or its agents, such as states)
           will conduct the actual geocoding. A centralized methodology opens
           up the possibility of employing specialized geocoding service providers.

Other issues to consider are the amount of in-house  expertise available to the
organization, the impact on other programmatic activities of assigning personnel to
geocoding tasks, and  the productivity/cost-benefits  of contracting for the service.
Geocoding technology is quite sophisticated  and evolving rapidly. Much of the
productivity derives  from allowing highly  trained and experienced personnel
employ the right tools in the most effective ways.

An in-house geocoding plan (as part of a LDP  Implementation Plan) should be
designed, and full life cycle  cost estimates prepared. For comparison  purposes, a
Request for Information  (RFI) can be sent to geocoding consultants  and  service
bureaus to gather comparative cost data for the same work. Based on the results of
an internal assessment and the responses to the RFI,  a decision can be made on the
most cost-effective  way to proceed with a geocoding project (i.e., in-house or via
contractor).

7.4  EPA's Regulated  Universe:  Who  Carries  the
      Geocoding   Burden?

A survey of EPA data systems was performed to  determine the status of locational
data in those systems.  Responses  were received for the following 23 systems:
3/92                               7-3

-------
      •     305(b) Waterbody System (WBS).
      •     Aerometric Information Retrieval System (AIRS).
      •     Chemical Update System (CUS).
      •     Chemicals in Commerce Information System (CICIS).
      •     Comprehensive Assessment Information Rule Data System (CAIR).
      •     Comprehensive Environmental Response, Compensation, and
            Liability Information System (CERCLIS).
      •     Consolidated Docket System.
      •     Construction Grants GICS (CGGICS).
      •     Facility DSfDex System (FINDS).
      •     Federal Reporting Data System (FRDS).
      •     Hazardous Waste Data Management System (HWDMS).
      •     Management Information Tracking System  (MITS).
      •     National Asbestos Registry System (NARS).
      •     Needs Survey (NEEDS).
      •     Permit Compliance System (PCS).
      •     Preliminary Assessment Information Rule (PAIR).
      •     Resource Conservation Recovery Information.
      •     Storage and Retrieval of Water Quality Information (STORET).
      •     Strategic Planning and Management System (SPMS).
      •     Superfund Enforcement Tracking System (SETS).
      •     Toxics Release Inventory System (TRIS).
      •     Underground Injection Control Tracking System (UIC).
      •     Water Quality Analysis System (WQAS).
Many data systems contained little or no locational data.  The available data were
most often generated from ZIP code centroids and map interpolation. Few systems
have latitude/longitude accuracy standards or perform QA on locational data.
The  findings  of  an OIRM-sponsored regulatory review  for EPA spatial  data
requirements may be summarized as follows:
         "Regulatory  requirements for spatial data  in  EPA  programs are
         limited.   The  requirements  range from  merely  requiring  that the
         location   be  provided  in an  unspecified  manner to specifically
         requiring  latitude and longitude  to  the  nearest  second [UIC
         inventory  requirements  in  40  CFR  144.26(b) (2)].   Seventeen
         regulations  were  identified   that  contained  requirements  for
         locational  information.   Of  these seventeen,  only seven  specified
         latitude  or  longitude as a requirement.  In  most  programs where
         latitude  and longitude  are required, no  accuracy  requirement  was

3/92                                 7-4

-------
         specified;  therefore, the accuracy of the data available from  these
         programs is unknown.   Only two of the seven regulations requiring
         latitude and longitude specified  the  level  of  accuracy (NPDES
         permits in 40 CFR 122.21  and UIC inventory  requirements noted
         above).2"

Table 1  shows that, for the systems with locational data that responded, 71% of the
locational data  is supplied to EPA by  the States.  The  next largest provider of
locational data is the regulated community (12.7%).  EPA Headquarters and regional
offices provide only 8.5% of the locational data.  FINDS survey results were omitted
from the following  table because it is a "secondary" source system (inclusion would
have caused double-counting).

Table 1
EPA Data System's Locational Data Inventory Summary

EPA Central
EPA Region
State
Submitter
Other
(Contractor)
Total number
of records*

Data
Entry
38,690
3.7%
341,938
32.7%
657,734
62.9%
0
0.0%
7,320
0.7%
1,045,682
Provide
Lat/Long
14,404
3.6%
19,390
4.9%
282,696
71.4%
50,255
12.7%
28,966
7.3%
395,711
Perform
QA
186,683
25.3%
224,315
30.4%
306,958
41.6%
0
0.0%
19,923
2.7%
737,879
Perform
Update
298,166
37.2%
280,532
35.0%
118,625
14.8%
0
0.0%
104,198
13.0%
801,521
Total number *
of records
537,943
866,175
1 ,366,013
50,255
160,407
2,980,793
* The number of data records is not equal to the number of regulated entities. For example, if TRIS,
PCS, and AIRS contain data records for a facility, then multiple records are listed for that facility.
The  impact of  these  numbers on  EPA and  individual  program directions for
geocoding national data bases should be carefully considered because much of the
burden for implementing EPA's Locational Data Policy may fall to the States via
grants and performance agreements.
   Battelle Contract No. 68-03-3534, Work Assignment No. H2-51, Task l.a, pg. 10).
3/92
7-5

-------
7.5  Accuracy Checking  of  Locational  Data

There are several methods that can be employed to assess the accuracy of geocoded
locational data.   All of  these methods assume that information in addition to
lat/long (e.g., correct  address) is known about the location in question.  The
additional information  is used to generate a second estimate  of  the site location.
Geocoding accuracy is  then checked by comparing lat/longs  from the  geocoding
effort with the estimated site location.

It should be noted that the accuracy checking methods described below are generally
inappropriate for  higher  accuracy geocoding technologies, such as GPS  or
photogrammetry.  The difficulty in checking the accuracy of GPS, for example, is the
general lack  of availability  of higher accuracy "benchmark"  data with which to
compare the GPS readings.  This issue will need further analysis and guidance as
the agency programs begin to conduct GPS surveys.

The simplest accuracy assessment method is to compare the location in question to a
known location such as  a ZIP Code centroid or a Place-name location from the
USGS Geographic Names Index System (GNIS).  The  distance between the two
points is compared to a threshold distance. If the distance is less than the threshold,
then the location is accepted, otherwise, the location is rejected. The difficulty then
becomes setting that threshold distance. Given the wide range in  ZIP code sizes, a
single threshold is difficult to set. For example, in a nationwide analysis of address-
matched TRIS facility locations and ZIP centroids, 98% of the locations were accepted
using a 10 km threshold.

A  better method of checking locational data  accuracy is to  perform a point-in-
polygon analysis.  This  method compares the location in question (a point) to a
small area polygon such as a ZIP code  or county boundary. If the point falls inside
the correct polygon, then the location is accepted.  Otherwise, the point is rejected.
However, this method  is computationally intensive.  Calculating whether or not a
point falls within a complex polygon is a computationally difficult task (although
straightforward through the use of GIS  technology).

The accuracy of the point-in-polygon assessment method is dependent on the size of
the polygon  and the accuracy of the polygon boundary delineation.  ZIP  code
polygon boundaries tend  to vary  among vendors.   An  approach to mitigate
boundary uncertainty is to apply a buffer around the polygon and then to accept the
additional points that fall outside a polygon, but inside the buffer.  Similar to setting
a  distance threshold  in the previous accuracy checking method,  setting  an
appropriate buffer size  is difficult.  In  an accuracy assessment  exercise of the  TRIS
database, the reviewers found that a 1-2 km buffer around rural ZIP code polygons
generally worked well.

A  simplified approach that approximates the point-in-polygon  accuracy assessment
method, and is nearly as easily computed as the ZIP code centroid method, is the
3/92                                 7-6

-------
bounding box test. To execute this test, one must assume that if a point falls inside a
rectangle that bounds the small area polygon, then the point also falls inside the
polygon.

Determining whether a point is inside a generic polygon is a complex geometric
problem that is computationally intensive. In comparison, determining whether a
point is contained inside a rectangle is computationally simple if the rectangle is
oriented along the north-south axis.  The point's latitude and longitude are simply
compared to the latitude and longitude ranges defined by the rectangle.  If both the
latitude and longitude  are within the  range, the point lies inside the rectangle.
While this  test is not as accurate  as a true point-in-polygon test, it can  be
implemented easily in almost any programming language.  The bounding box test is
susceptible to commission error but the likelihood of error reduces as the  shape of
the polygon becomes closer to the shape of the rectangle (Figure 3).

In analyzing the locational  data from  TRIS, submitted coordinates were usually
either very accurate or very inaccurate.  Based on this observation, the likelihood of
commission error was  reduced.   The test could be improved  by classifying the
bounding boxes based on the area of the commission error zones.  Based on this
classification, statistics could be generated about the test's accuracy.  OPPPTS has
generated bounding boxes for ZIP codes and counties.  They could be used by any
data base management system to identify inaccurate lat/long coordinates.
                                  Bounding Box
                                  Commission
                                   Error Zone
                                    Polygon
                  High probability of
                  commission error
      Low probability of
      commission error
                                   Figure 3
                               Bounding Boxes
3/92
7-7

-------
7.6 Secondary Data  Users,  Multimedia Data
     Integration,  Public  Data Access,  and
     Enforcement  Data  Requirements

The intent of the LDP is to "...extend environmental analyses and allow data to be
integrated based upon location, thereby promoting the enhanced use of EPA's
extensive data resources for cross-media environmental analyses and management
decisions."3

The cross-media benefit  that the policy encourages will allow sharing  of data
collection responsibilities.  Instead of each program collecting its own locational data
for an entity, program offices can "pool their resources" and collect all required data
compliant with the 25 meter accuracy goal. This ability should make  it more feasible
for program  offices to comply with the accuracy goal.
3  U.S EPA Locational Data Policy


3/92                              7-8

-------
                        Chapter 8

      A FRAMEWORK FOR ESTIMATING GEOCODING
                    LIFECYCLE COSTS
AVAv.\v/-wvyAv.vsw.vw.%%w.w.vAv.'Asvww

-------
        8.    A FRAMEWORK  FOR  ESTIMATING
                GEOCODING  LIFE  CYCLE  COSTS

What is geocoding going to cost?  This important question has no simple answer.
One simplistic approach to estimating geocoding costs would be to multiply current
industry cost-per-point estimates (for each technology) against the total number of
regulated entities. Application of a more robust cost estimation methodology, like
the one described below, will provide more reliable results.

8.1  Characterize  Existing   Records

All geocoding efforts should commence with a thorough  review of  existing
databases of entities to be geocoded. A profile of these records should characterize
their number, type, address quality, and urban/rural geographic distribution.  Some
of the reasons for performing this review are described below:

      •     The number of records will  give an  indication of the  level of effort
            required.

      •     Knowledge of types of entities (e.g., outfall pipe,  wellhead, industrial
            facility, underground storage tank, monitoring well, etc.) is important
            for selection of appropriate geocoding methods.  It also will indicate
            potential  geocoding  problems including  accessibility,  standard
            geographic referencing, and field logistics.

      •     Address quality is particularly important. First, accurate addresses are
            required for efficient field logistics (i.e., an entity must be located on the
            ground before it can be geocoded).  Addresses also must represent the
            actual location  of the entity, not a mailing address. Addresses that do
            not  commence  with numerics  (e.g.,  RD #3) are difficult or impossible
            to address-match.   This factor can impede  field logistics and may
            preclude the use of address matching  technology as a geocoding
            solution.

      •     Knowledge of the geographic distribution  of  regulated entities,  in
            particular the  percentage that are  "urban"  or  "rural"  areas, is very
            useful.  Depending upon the definition of urban/rural, organizations
            can predict geocoding productivity (in the sense of being able to assign a
            lat/long coordinate of acceptable accuracy to  a  given entity).  For
            example,  an urban area analysis of a data  base can reveal the
            applicability of using address matching for the portion of the database
            that resides within known coverages of  geocoded street  addresses
            (available from the U.S.  Census Bureau or from commercial data
            purveyors).
3/92                                8-1

-------
8.2  Define  Geocoding  Requirements

After  a profile  of  the  existing records inventory is  completed, geocoding
requirements  and constraints  should  be identified.  Various  methodologies  are
available to guide organizations through the process of requirements analysis.  One
recommended approach is the DQOs methodology documented by EPA's Office of
Research and Development (ORD).

A requirements analysis should answer the following fundamental questions:

      •    What is the intended primary use of these data?

      •    What are the possible secondary uses of these data?

      •    What  are  the  programatically-defined  minimum  accuracy
           requirements?

      •    Can  the 25 meter accuracy goal be achieved? If not, how can a waiver
           be substantiated?  (a waiver process will be overseen by EPA's IRM
           Steering Committee).

      •    What are the economic and time constraints?

8.3  Select/Define  Geocoding  Methodology

Estimating the cost of geocoding  depends upon  a number of factors, including the
methodology or technology employed.  In some cases, multiple  methodologies may
be useful based on entity significance or location.  Characterization of an existing
records inventory, when placed in the context of  an organization's geocoding
requirements,  will help to shed  light on a preferred methodology (assuming there is
broad familiarity with the available technologies). Once a technology is selected and
a preliminary geocoding approach  is defined, the geocoding life  cycle fixed and
variable costs may be estimated. It is critical that the analysis go beyond selecting a
specific technology (e.g., GPS) and that a detailed geocoding implementation plan be
prepared.   This  plan  should  include a decision about the  overall geocoding
approach: incremental field-based or centralized geocoding.   A well-defined
geocoding implementation plan not only will provide a  sound a basis for a
reasonably accurate cost estimate, but also is required by the LDP.

8.4  Estimate  Geocoding  Life Cycle Fixed  and

      Variable  Costs

Estimating geocoding life cycle costs ultimately depends upon defining and
estimating fixed and variable costs.  Whereas each geocoding technology has some
specific fixed costs (e.g., GPS survey stations), these unique costs can be incorporated
into a generic cost model. Figure 4 portrays a generic Geocoding  Cost Estimation


3/92                               8-2

-------
GEOCODING
COST MODEL
Take
Inventory
Locate
Address
Locate
Facility
Geocode
Feature
Update
Inventory
Fixed
Costa
Set-up &
Manage-
ment
Costs
The setup costs to
get a proper list
of "candidate
entities". Cost of
project design,
initial staffing,
client
communications
The setup costs to
get a valid
mailing
"Address"/
phone
The setup costs to
get an accurate
street "Location"
The setup costs
for a precise
"Position"
The setup costs
for updating
the original
source files
Other
Fixed
Costs
Equipment
Software
Other
Infras.

Computer
hardware for
database mgmt.
PCs, IBM 3090
Computer
Systems
Develop Method
Timeline
Program DBMS
Download
Database
DBQC
Procedures
Storage Space for
equipment,
maps, digitizers,
etc.

Business direc-
tories
Map catalogues
Develop Method
Map ID System
Set Up Database
Design Address
Cross-check and
Duplication
Systems

Input/ Output GPS(s)
Devices Inertial
Business directo- Systems
ries Laser
Map catalogue and theodolite
storage facilities
Design/ Adapt
System for
Address matching
Duplicate match-
ing
Map Coordinates
Secure or confirm
Location

Figure 4
Generic Geocoding Cost Estimation
Document op-
erational steps
Write interface
software
Set accuracy
determination
method
Design data
and exception-
reporting forms

Model

Communicatio
ns system to
source
database
Define
Requirements
Design new
data fields/
definitions
Verification of
update
Calculation of
Changes and
Costs

                           (continued on next page)
3/92
8-3

-------
GEOCODING
COST MODEL
Take
Inventory
Locate
Address
Locate
Facility
Ceocode Update
Feature Inventory
Variable
Costs
Profess-
ional Costs
Training
DB
Administrator /
Records Clerk
Address Editing
Valid Location
criteria.
Operator Training
Operator
Training

Ramp-up
Costs
Pilot Tests
Retooling
Costs
Staff
Functions
Quality
Mgmt
Space &
Facilities
Test database of
5% of entities in a
pilot. [Determine
unusable record
rates
Design
Simultaneous
extract process
Data extractor
DB
Administrator
Compare Source
and database

Verify existence
and phone
(timeliness issue
for pilot)
Categorize/ cost
Address
confirmation
steps
Secure New
Directories/
Maps
Mail or Phone
Address
Confirmer
Edit Final List
Review rejects
Workrooms
Project Storage
Mail Facilities
Secure confirmed
street location.
Confirm Location
Identify Map
(Quad)
Identify Street Seg
Rewrite Locational
Procedures/
Systems
Create master
Location coding file
Confirm Location
Identify Map
(Quad)
Identify Street Seg
Cross-check with
ZIP, quad or county
Hard copy storage
Identify and
geocode
specific
features.
Determine
production
rates and
accuracy's
Upload update
fields.
Acceptance
test update
software
New Hardware Debug and
Rework field rework
procedures programs
Equipment
opera tor(s)
Programmer
Edit Positional Compare
Accuracy Source and
/Precision Update
Cross-check
Electronic and
hard copy
storage
Figure 4 (continued)
Generic Geocoding Cost Estimation Model

3/92
8-4

-------
Model that can be used as a framework for any program's geocoding requirements,
independent of the technology employed4.  Actual estimates of geocoding costs
should account for existing capital equipment (e.g., GPS receivers) versus needed
purchases.

All aspects of the model are not applicable in every case.  Applicability depends in
large measure on the technology to be employed and the underlying approach (i.e.,
incremental field-based or centralized geocoding).  The model is flexible enough,
however, to accommodate most geocoding methodologies.

The proposed Geocoding Cost Estimation Model is generic inasmuch  as it is not
dependent upon any one specific geocoding method  or technology.  It defines the
overall processes required for any geocoding methodology. Although the details
often differ  from one method to another, the parameters in Geocoding Cost
Estimation Model capture the full cost of any method employed.

Figure 5, a spreadsheet version of the proposed Geocoding Cost Estimation Model, is
structured as a two dimensional matrix containing a Cost Axis (y) and a Process Axis
(x). The Cost Axis contains two major  cost categories:  Fixed and Variable.  Fixed
Costs  are divided into three subcategories: Set-up Costs, Other Fixed Costs, and
Ramp-up Costs.   The Process Axis contains  5  major categories: Take Inventory,
Locate Address, Locate Facility, Locate Feature, and Update Inventory.

      8.4.1    Fixed Costs

      Investments in a geocoding infrastructure (without  regard to the particular
      geocoding method) are referred to as "fixed costs." Geocoding large numbers
      of geographically distributed facilities and entities is logistically complex and
      information intensive.   In order to achieve high  orders of productivity,
      thereby reducing  unit costs (i.e., the cost of geocoding  a single entity),  a
      geocoding infrastructure, or "system," is absolutely essential.  This "system,"
      while including a specific geocoding technology (e.g., GPS), encompasses  a
      much wider range of activities and overall capabilities (discussed below).

      As noted above, fixed costs fall into three major categories:  set-up costs, other
      fixed costs, and ramp-up costs.

      Set-up and other fixed costs  are  one-time investments  in  the necessary
      components of any geocoding process.  Ramp-up costs include the  costs of
      necessary trial and error (i.e., pilot testing) associated with implementing the
      geocoding "system" established during the set-up phase.  Set-up costs include:

            •     Project planning and logistics.
4  Text contained in each of the matrices are examples of types of geocoding costs incurred and is not intended to be a
   comprehensive list.


3/92                                  8-5

-------

FIXED COSTS

Set-up Costs (hours)
Set-up Costs ($)

Training (hours)
Training ($)

Other Fixed Costs ($)
Equipment ($)
Software ($)
Other Infrastructure ($)

Ramp-up Costs (hours)
Pilot Tests (hours)
Retooling (hours)
Ramp-up Costs ($)

Subtotal (hours)
Subtotal ($)

VARIABLE COSTS

Professional Costs (hours)
Staff Functions (hours)
Quality Management (hrs)
Professional Costs ($)

Other Variable Costs ($)
Transportation ($)
Space & Facilities ($)
Materials ($)

Subtotal (hours)
Subtotal ($)

TOTAL (HOURS)
TOTAL ($)
Take
Inventory





































Locate
Address






































Locate
Facility






































Geocode
Feature







	 „




























Update
Inventory


	
	 » 	

	



™ 	
	
	

	 	





	
	









Figure 5
Spreadsheet Version of Geocoding Cost Estimation Model
3/92
8-6

-------
             •     Staff supervision.

             •     Communications with participating organizations (i.e., contracts,
                  regional offices, regulated facilities).

             •     Implementation of financial management and control systems.

      Other fixed costs include:

             •     Equipment (i.e., computer  hardware, map files, magnetic tapes
                  or diskettes, furniture, etc.).

             •     Software (i.e., data management, spreadsheet, statistical analysis).

             •     Development of rules and procedures for geocoding personnel
                  or regulated  community.

             •     Training.

      8.4.2    Variable Costs

      Variable costs  are the costs of fully implementing  the geocoding "system"
      over  time. Major variable cost parameters include  personnel/functions,
      materials, quality  management,   transportation,  space,  and  facilities.
      Although variable costs are expected to become consistent as the number of
      facilities or entities increases, there are many factors that can greatly affect the
      per-entity cost (i.e., communications,  logistics, timing, public support).

      Together, fixed and variable costs provide a well-rounded  measure of the
      "true" costs of  geocoding.  The geocoding cost model  encompasses systemic
      costs associated with geocoding that are much broader than  costs  associated
      just to applying a given method (e.g.,  digitizing a point on a paper map).

      The process of  identifying the location of a facility and, more importantly, an
      entity (e.g., smoke stack, outfall pipe), is more cumbersome and complex than
      it may appear upon first consideration.  This fact is independent of the specific
      geocoding technology employed.  The geocoding cost model accounts for five
      sequential and interdependent process steps, discussed below.

      8.4.3    Process Parameters

      To geocode accurately, one needs to know where the object to be geocoded is
      located before  one can define where it is (to whatever degree of precision
      required or possible).  To illustrate the point, take the  example of sending a
      team from  EMSL-LV into  the field in the San Gabriel Basin to conduct a GPS
      survey of regulated entities.  First, a list of features  to be geocoded was
      compiled "take  inventory." Second, addresses for the candidate features were
3/92                                  8-7

-------
      identified "locate address." Third, candidate features were mapped to support
      field logistics "locate facility." Fourth, the GPS survey team located the facility
      and, subsequently, located the candidate features on the ground "locate
      feature."   The  final step, "update  inventory,"  or successfully loading
      locational data into the host system, has yet to be completed.

      As the Geocoding Cost Estimation Model indicates, the five step geocoding
      process incurs fixed and variable costs.  By comparing the cost parameters
      with the process parameters, a fairly comprehensive understanding of cost
      emerges.  In addition, the cost model  provides a framework for  comparing
      the estimated (and actual) costs of alternative geocoding technologies.
3/92                                 8-8

-------
 Chapter 9
SUMMARY

-------
                          9.   SUMMARY

This Guide To  Selecting Latitude/Longitude  Collection Methods is useful  in
informing EPA program offices, states, and other parties affected by the LDP about
the geocoding lifecycle.  In order to understand what geocoding is and the different
methods available, a manager or technical specialist must be able to understand the
capabilities,  realistic  costs,  and  accuracies  of  different  geocoding
methods/technologies.  In order to implement the effective use of geocoding, cross-
cutting geocoding issues must be identified,  a framework for  designing and
estimating the cost of a geocoding project or program consistent  with EPA IRM
policy  must be provided, and locational data accuracy checking methods must be
reviewed.

Different  organizations operate differently and have  varying requirements.
Therefore, the extent to which any one geocoding method will be employed may
vary.  In  conjunction with the LDP, however, this Guide will provide  Federal
agencies, states,  and other parties with guidelines and techniques to  implement
useful  latitude/longitude collection methods.
3/92                                9-1

-------

WMMJWJ^
                   Appendix A
 FACT SHEETS ON GEOCODING METHODS
    Global Positioning Systems (GPS)	A-l
    Photogrammetry	A-3
    Map Interpolation	A-5
    Photo Interpretation/Map Interpolation	A-7
    Address Matching	A-9
    ZIP Code Centroids	A-ll

-------
              Global  Positioning Systems (GPS)


Description
GPS is an earth-surface positioning system that utilizes a constellation of earth-
orbiting satellites deployed and maintained by the Department of Defense (DOD).
GPS produces latitude/longitude coordinates by relying on established trigonometric
principles, timing, range measurements, and several statistical  models.  Analogous
to traditional land surveying, GPS requires a minimum of  four  simultaneous
satellite observations to precisely  position a point on  the earth.   Typical  GPS
surveying steps include:

      •     Plan detailed field logistics.

      •     Assemble  GPS survey team.

      •     Travel to the site of entities.

      •     Establish GPS base station that collects data  simultaneously with, and
            from the same satellite constellation as the GPS units in the field.

      •     Take GPS "reading" and move to the next entity until all readings are
            completed.

      •     Perform post-processing/QA on raw GPS data in an office setting.

      •     Enter locational data into inventory or programmatic data system.

Reasonably Achievable Accuracy
Under normal conditions, using differential GPS surveying techniques (i.e., using
two receivers simultaneously, one from  a known position and  one from the entity
being geocoded), 5 to 20 meter accuracies can be achieved.  Accuracies ranging from 5
to 100 meters  are possible depending upon a  variety of technical and geographical
factors (discussed below).  GPS is one  of  the few geocoding  techniques that can
consistently produce locational data that complies with EPA's 25 meter accuracy goal
if properly executed.

Cost
The cost of GPS-produced locational data  is influenced  primarily by  equipment,
labor, and transportation.  GPS receivers  cost as little as $2,500 and as much  as
$100,000, although prices are dropping quickly due to commercial competition and
technological breakthroughs.  Base stations, computers, and other field equipment
increase costs  significantly. Labor costs are incurred for survey planning, training,
field work, and post processing. Finally, transportation and related travel costs can
be relatively high for extended field surveys.  For centrally planned, dedicated GPS
surveys, the average cost per point ranges from $75 to $125. at a 2 to 5 meter accuracy


3/92                                A-l

-------
The cost per point of incremental GPS surveys (i.e., performed by field staff already
"on-site" conducting other duties) is expected to be less expensive ($35 to $70 within
a 2 to 5 meter accuracy) than centrally planned and executed GPS surveys.

Benefits

            GPS  is  the  LATF's recommended  technology  because of  its
            demonstrated  ability  to geocode data that comply with, or  are more
            accurate than, the Agency's 25 meter accuracy goal.

      •     GPS  rapidly  is becoming a standard  geocoding technology in the
            surveying community; many of the early problems associated with this
            emerging technology are being resolved through extensive field work.

      •     GPS material costs are dropping rapidly due to high product demand
            and  commercial competitiveness.

      •     Many states are beginning to implement GPS base station networks.

Limitations

      •     Field access to some EPA regulated entities by GPS survey crews may be
            impossible or difficult at best; this limitation is true for all field-based
            geocoding methods.

      •     DOD has authority to invoke Selective Availability (SA) of its satellites
            and, therefore, degrade  the  achievable accuracy of GPS surveys to
            approximately 100 meters.

      •     Reception problems in urban areas and areas of high physical relief can
            limit GPS  accuracies to 200 meters  and,  in some cases,  preclude
            readings entirely.

      •     Post processing of GPS survey readings is usually necessary to achieve
            appropriate orders of accuracy, and  is usually  time-consuming  and
            complex.

      •     Data exchange between different brands of receivers may be difficult,
            time consuming, or, in some cases, impossible.

      •     Standard procedures for collecting reliable GPS data are yet to be
            developed.
3/92                                 A-2

-------
                         Photogrammetry


Description
Photogrammetry is defined  as  the "...art  and science of  obtaining  reliable
measurements from photographs.*"  Photogrammetric sciences are a fundamental
part of modern  map making and most  small and medium scale  maps are made
from  aerial  photographs. The aerial photographic holdings  in  EPA and other
agencies of the Federal government are a wealth of spatial and temporal data about
environmental conditions and  processes.  The Environmental Monitoring  Systems
Laboratory  in  Las Vegas (EMSL-LV)  currently provides  information  that is
interpreted from aerial photographs to characterize hazardous waste sites, analyze
wetlands, identify  ecological  resources  and meet  a number of environmental
monitoring needs.  EMSL-LV  has now  acquired  the capability to supply highly
accurate metric, or measurement, information  for similar  applications.

Photogrammetric data  is produced on very  precise photo measurement devices
called "analytical stereoplotters." These devices are  typically calibrated to the micron
level and enable  the scientist to create complex mathematical models that correct for
known distortions  in the  photographs.  From  these three dimensional  photo
models, highly  accurate measurements and  positional  data can be derived for
mapping and analytical purposes.  This data can be produced in digital format
directly for input in a Geographic Information System (GIS).

Cartographic information can be produced from aerial photographs to the locational
specifications of the U.S.  National Map Accuracy Standards.  These can be traditional
map features such as roads or hydrology, or  special map layers such as historical
hazardous waste site activity or fractures in the bedrock.  Any information  that can
be derived from  an aerial photo can be accurately mapped in a digital format. Once
the photo model is established, thematic information represented  by points, lines,
and polygons can be input directly  into digital format without transfer to a hard-
copy map and digitizing from the map base.  This saves time and reduces spatial
error propagation.

Exact measurements can be accomplished on an analytical  stereoplotter to help
characterize  activity of environmental interest.  For example, in studying hazardous
waste sites,  volumes of waste accumulations and changes  in such volumes are
needed to evaluate remedial options.  Also, precise distance and area measurements
can be utilized for risk assessment and other site characterization activities.

Cartographic information that depicts the elevation of the land surface, such as the
contour map or  the digital elevation model (DEM) can routinely  be produced by
photogrammetric techniques.  The resolution of this data can be tailored to the
specific needs of  the project.
* (ASPRS, 1991)


3/92                                A-3

-------
Any feature that is observable on an aerial photograph can be accurately referenced
to a coordinate system.  Photogrammetry can be extremely useful for collecting and
recording the coordinate data that is required by the LDP. Conversely, information
that is not readily visible on photographs, such as property boundaries or pipeline
locations, can be digitally superimposed onto the  photo model for special mapping
or interpretive purposes.

Reasonably Achievable Accuracy
Photogrammetric accuracies are  dependent on film scale, the quality of ground
control data, and a number of other factors, but sub-meter accuracies are routinely
achievable from standard photo products.

Cost
The cost  of photogrammetrically-derived locations can vary considerably depending
on a number of factors, such as the scale of the photos and the quality of the ground
control.   However,  price  ranges  from  $25  to  100 per point  are reasonable
expectations.

Benefits

      •     Photos represent permanent records of environmental conditions.

            Photogrammetry can produce extremely high accuracies.

      •     Photogrammetry is  time-tested and legally defensible in  the  most
            rigorous  courtroom setting.

            Any feature on a photograph can be precisely geocoded with ease. The
            coordinate  definition  for linear and polygonal features can be
            determined with the same ease as point features.

      •     Full Photogrammetric capabilities now exist within the EPA.

      •     QA/QC parameters are inherent in the mathematical  models that are
            used in the photogrammetric process.

      •     Photogrammetry  does  not require a field visit to the actual entities
            being geocoded and can be utilized when site access is impeded.
Limitations
            Photogrammetry requires capital equipment and trained technicians.
            The process cannot be easily performed outside a laboratory.
3/92                                 A-4

-------
                         Map   Interpolation
Description
Map interpolation involves direct measurement from existing paper maps.  Typical
map interpolation steps include:

      •     Develop a candidate list of regulated entities to be geocoded, including
            addresses and facility IDs.

      •     Sort the candidate list by map upon which they are expected to be
            located.

      •     Acquire the appropriate maps (e.g., USGS l:24,000-scale topographic
            maps).

      •     Identify map symbols that represent the candidate entities or that
            provide sufficient contextual information to geocode required entities.

      •     Measure the location of candidate entities using various methods (i.e.,
            bar scale, engineer's scale or graduated ruler, electronic digitizer).

      •     Enter locational data into inventory or programmatic data system.

Reasonably Achievable Accuracy
Achievable accuracies on USGS l:24,000-scale quadrangles  (available nationally)
range from 12 to 50 meters for  easily identifiable features.  Larger scale (i.e., higher
resolution) maps, such as those produced  for hazardous waste site investigations,
support achievable accuracies in the 5 to 25 meter range.

Cost
Dedicated  map interpolation managed and performed in an  office setting costs
between $40 to $60 per point.  The cost of manual geocoding using maps directly in
the field (by field "inspectors") should be considerably less, approximately $28 to $40
per point.
Benefits
            Maps at a consistent scale (1:24,000) are available nationally from the
            U.S. Geological Survey.

            Maps are inexpensive when compared to other geocoding technologies.

            Map  interpolation can occur in the field, in the office, or on the
            telephone talking  to  a site  operator;  costs  and accuracies differ
            significantly.
3/92                                 A-5

-------
            Non-rectified air photographs can provide useful ancillary information
            to existing paper maps during manual map interpolation.
Limitations
            Map interpolation does not always yield coordinate data that achieve
            EPA's 25 meter accuracy goal for locational data.

            Not all entities are identifiable on maps; studies have shown that as
            many as 40% or more are not identifiable on a single map source.

            Accuracy of map  interpolation depends upon map source currency and
            scale, the object being geocoded, identifiable landmarks, existence  of
            street names and  address ranges, and skill of the interpolator. These
            factors are highly variable and, therefore, produce inconsistent
            geocoding results.

            Misidentification  of an entity on the map can result in inaccuracies of
            greater than 100 meters.
3/92                                  A-6

-------
        Photo   Interpretation/Map   Interpolation


Description
Photo Interpretation/Map Interpolation (PI/MI) is actually an integrated technique
by which the user transfers information from an aerial photograph on to a map base
and  then  extracts coordinate information via  standard manual or digital map
interpolation techniques.  Current and  historical aerial  photographs  contain a
wealth of environmental data that are  not  routinely placed  on standard map
products.  However, a  'raw'  aerial photo contains  no  inherent coordinate
information.  By 'interpreting' the information on an aerial photo, such as the
location of a hazardous waste site, and then transferring that information to a
standard map base, coordinate data can be extracted.

Photographic Interpretation is performed by  viewing aerial  photographs through
microscopes or  stereoscopes.  Stereoscopic  viewing creates a perceived  three-
dimensional effect which, when combined with  viewing at various magnifications,
enables the  analyst to identify signatures  associated with different features  and
environmental conditions. The term "signature" refers to a combination of visible
characteristics (such as  color, tone,  shadow, texture, size, shape,  pattern,  and
association)  which permit a specific object or condition to be recognized on aerial
photography.

By correlating observable features on the photograph, such  as  the road network,
with the same set of features on a standard map,  the analyst can transfer the location
of other photographic elements on  to the map base for geo-referencing. The typical
PI/MI process would involve the following steps:

            Obtain aerial photograph(s) and maps of the area of interest.

            Interpret photographs for  specific 'signatures'  of environmentally
            significant activity or location.

      •     Transfer data to the map.

            Interpolate coordinate location(s)  from map.

      •     Enter locational data into  inventory  or programmatic data system.

Reasonably Achievable Accuracy
Accuracies of PI/MI are dependent the spatial accuracy of the basic map product and
the skill of the interpreter in transferring the information. However, from standard
maps such as the USGS 1:24,000 series, accuracies of 10-15 meters are routine.

Cost
Costs for this method generally will be  from $40 to $100 per point.


3/92                                A-7

-------
Benefits

            Field visits are not required.

      •     Aerial photos and maps are common resources in EPA projects.

      •     PI/MI amenable to office environment.

      »     Relatively current aerial photography routinely available.

      •     Permanent record of  activities and  other programmatic  purposes
            served by photos.

Limitations

      •     Moderate to high level  of manual effort is required.

      •     Final accuracy is dependent on photo and map scale.

      •     Formal  training in photographic  interpretation  techniques may be
            required, depending on the specific information being extracted.
3/92                                 A-8

-------
                        Address  Matching


Description
Address  matching is  a   set  of semi-automated  operations for  deriving
latitude/longitude coordinates from street addresses. Address matching is achieved
by "comparing" a tabular file of street addresses with a digital cartographic street
network file (e.g., Bureau of Census's GBF/DIME and TIGER) that contains  street
names, address ranges, and latitude/longitude  coordinates of the street network.
Street address geocodes (lat/long) are generated by matching  the equivalent street
name in the network  file  and interpolating along  its associated address range.
Numerous commercial  Geographic Information System (GIS) software packages
(e.g., ARC/INFO) provide  address matching capabilities, and several commercial
firms perform address matching  services.   Typical  steps required for address
matching  include:

      •     Compile address file for target entities.
      •     Check for correct and consistently formatted addresses.
      •     Submit the "clean" address fite to a service bureau.
      •     Enter locational data into inventory  or programmatic data  system.

Alternatively:

      •     Compile address file for target entities.

      •     Check for correct and consistently formatted addresses.

      •     Load appropriately formatted digital cartographic data base into a GIS.

      •     Run  addresses  through address  matching  software  in batch  or
            interactive mode;  recheck  and correct  addresses  which  cannot  be
            processed.

      •     Enter locational data into inventory  or programmatic data  system.

Reasonably Achievable  Accuracy
The accuracy of address-matching depends upon the accuracy of the street network
file and the length of the street segment upon  which interpolation is performed.
Accuracies are higher  in  urban areas, and significantly  lower in  rural areas.
Reasonably achievable accuracy for address  matching ranges from 50 to 500 meters.

Cost
The cost of address matching differs if it is performed "in-house" versus through a
service bureau. Average commercial address  matching costs have been documented
in the  $1.25  to $4.00 per  point range.  Preprocessing costs for compiling and
normalizing address files adds an incremental cost of anywhere  from $2.00 to $6.00 a


3/92                                A-9

-------
point, resulting in an estimated per point cost of $3.25 to $10.00.  The cost of in-
house address matching is considerably higher per point if up-front investments are
considered.  For example, the cost of building the properly formatted street network
file in a GlS is significant.
Benefits
            Address matching is a relatively low cost, batch-oriented method of
            producing latitude/longitude data from address data bases.

            Address matching can be performed in the office or through a number
            of qualified address matching service providers.
Limitations
            Address matching  may  not yield measurements that  comply  with
            EPA's locational accuracy  goal of 25 meters.

            At  best, address matching produces locational  coordinates  that
            approximate the property parcel centerline of a given facility; pipes,
            stacks,  underground storage tanks,  and wellheads do not  have
            addresses per se ,  which limits the utility of address matching for  these
            types of entities.

            Address matching  in rural areas remains  a  somewhat  unreliable
            method, because  of the distance between  street intersections and the
            extensive use of ZIP code centroids as default  locational  data.
            Furthermore, rural addresses may not exist in  some jurisdictions.
3/92                                 A-10

-------
                        ZIP  Code  Centroids
Description:  ZIP code centroid geocoding is an automated operation for deriving
latitude/longitude coordinates from ZIP codes contained in street addresses.  This
geocoding method assigns the same lat/long of the ZIP code centroid location to all
entities within  the same ZIP  code.   The term "ZIP code  centroid" is often
misleading.  ZIP code centroid geocodes are procured from a  vendor and not from
the U.S. Postal Service (USPS).  Each vendor's geocodes are based on the vendor's
interpretation of where ZIP code boundaries lie.  The USPS  defines ZIP codes in
terms of optimal carrier routes, as opposed to explicitly drawing boundaries on a
map.  The lack of fixed ZIP code boundaries creates some discrepancies between
different vendor's  centroids.  Typical  steps required for  ZIP centroid  geocoding
include:

            Acquire a ZIP code centroid file from a vendor.
      •     Check for correct ZIP codes in street address records.
      •     Run ZIP codes through matching software in batch mode.
            Submit alternatively "clean" address  file to a service bureau.
            Enter locational data into inventory or programmatic data system.

Reasonably Achievable Accuracy
The accuracy of ZIP code centroids depends upon the size of the area served by the
ZIP code. ZIP codes are extremely variable in size, being defined by mail volume
and population or mail drop density.  Therefore, the areal extent of urban ZIP codes
is rather small (in some cases the 5-digit ZIP codes is merely a city block), while rural
ZIP codes are either extremely large (150,000 square miles in one Alaska ZIP code) or
are served by Post Office boxes. As a result, ZIP code accuracies are higher in urban
areas and significantly less so in rural areas. Reasonably achievable accuracy for ZIP
code centroid ranges from 50 to 500 meters in urban areas to many kilometers in
rural areas.

Cost
The cost of ZIP centroid geocoding differs if it is  performed "in-house"  versus
through a service bureau.  Average per point ZIP centroid matching costs has been
documented  in the $0.01  to  $0.60 range.   Preprocessing  costs for compiling and
normalizing address files adds an incremental cost of anywhere from $0.01 to $0.10 a
point, resulting in an estimated per point cost of $0.02 to $0.70.   The major costs
associated with in-house generation of lat/longs as ZIP code centroids is  the cost of
procuring  a ZIP  code  centroid  file  from  a  vendor  and   the cost  of
correcting/normalizing street address files.  EPA's Office of Information Resources
Management (OIRM) currently maintains a license to a commercial ZIP code
centroid file.
3/92                                A-ll

-------
Benefits
            ZIP code geocoding is a relatively low-cost, efficient centralized method
            of producing locational data from existing address files.

            ZIP code geocoding has been widely employed at EPA for many years
            and is a well-understood technique.
Limitations
            ZIP code geocoding fails to meet EPA's 25 meter accuracy goal for most
            locational data.

            At best, ZIP code  centroids produce locational data coordinates that
            approximate the centroid of the  block  where the facility is located;
            pipes, stacks, underground storage tanks, and wellheads do not have
            addresses, which greatly limit the utility of ZIP centroids for these types
            of regulated entities.

            ZIP code geocoding in rural areas produces higher-order inaccuracies
            than in urbanized areas because of the typically larger spatial extent of
            rural ZIP codes.

            ZIP code boundaries cannot be reliably produced from vendor to
            vendor because of the way ZIP code boundaries are defined by the U.S.
            Postal Service.
3/92                                 A-12

-------
 Appendix B
REFERENCES

-------
                             References

      Battelle, 9/1990. Final Letter Report on Regulatory Review for EPA Spatial
Data Requirements to U.S. Environmental Protection Agency, Office of Information
Resources Management. Contract No. 68-03-3534, Work  Assignment No. H2-51,
Task l.a.

      Bissex, D. A., C. J. Franks, and A. Heitkamp., 1990. Quality Assurance for
Geographic Information Systems.   Urban  and Regional Information  Systems
Proceedings, Edmonton, Alberta, Vol. 2, pp.106-118.

      Bolstad, P. V., P. Gessler, and T. M. Lillesand., 1990. Potential Uncertainty in
Manually Digitized Map Data.   International Journal  of  Geographic Information
Systems, Vol. 4(4), pp.399-412.

      Croswell, P. L., 1987.  Map Accuracy:   What is  it, Who Needs it and How
Much  is  Enough.    Urban and Regional  Information Systems  Proceedings,
Edmonton, Alberta, Vol. 2, pp.48-62.

      Fitzsimmons, C. K. 1988.   Evaluation of Selected Methods for Determining
Geographic Coordinates.  Unpublished Report for Exposure Assessment Division,
Environmental Monitoring Systems  Laboratory Las Vegas, NV. Prepared under
EPA Contract CR 812189-02.

      Hunter, G.  J.,  and  I.  P.  Williamson,  1990.   The  Need  for a  Better
Understanding of the Accuracy of Spatial Databases.  Urban and Regional
Information Systems Proceedings, Edmonton, Alberta, Vol. 4, pp.120-128.

      Hurt, J., 1989.  GPS: A Guide to the Next Utility.  Trimble Navigation Ltd.,
Sunnyvale, CA.

      Kruczynski, L. R., 1990.  An Introduction to the Global Positioning System
and its use in Urban GIS Applications.  Urban and Regional Information Systems
Proceedings, Edmonton, Alberta, Vol. 3, pp.87-91.

      Palmer, D., 1989.  Designing a  GIS/LIS:   Some  Accuracy  and  Cost
Considerations. Urban and Regional Information Systems Proceedings, Edmonton,
Alberta, Vol. 2, pp.52-56.

      Slonecker, E. T., and J. A. Carter, 1990. GIS Applications of Global Positioning
System Technology.  GPS World, Vol. 1(3), pp.50-55.

      Slonecker, E. T., and  M.  J. Hewitt III, 1991.  Evaluating Locational Point
Accuracy in a GIS Environment.  Geo Info Systems, Vol. 1(6), pp.36-44.

      Slonecker, E.T., and N.  Tosta, 1992.  National Map Accuracy Standards: Out of
Sync, Out of Time. Geo Info Systems, Jan. 1992, pp. 20-26.
                                    B-l

-------
      US EPA. 1991.  TRI Location Data Quality Assurance for CIS Final Report
(unpublished report for US EPA Office of Toxic Substances, Washington, D.C.)

      From Photogrammetric Engineering and Remote Sensing:

      Acceptance Tests

      Testing Land-Use Map Accuracy: Another Look, Michael C. Ginevan, 10: Oct
79:1372

      Accuracy

      Map Accuracy, EJ. Schlatter, 10: 206

      map, Funk, 24: 392

      map evaluation at Army Map Service, Coulthart, 23: 855

      testing, USGS, 20: 181

      map, Weg, 27: 148

      Effects of Interpretation Techniques on Land-Use Mapping Accuracy, Floyd M.
Henderson, 3: Mar 80: 359

      Accuracy Specifications

      Map Accuracy Specifications, Adopted by ASP in 1940 50th Anniversary
Highlights, 2: Feb 84: 237

      Accuracy Test

      The  Minimum Accuracy Value as an Index  of Classification Accuracy, Stan
Aronoff, 1: Jan 85: 99

      Analysis of Variance

      Analysis  of Variance of Thematic Mapping Data, George  H. Rosenfield,  12:
Dec 81:1685

      Cartography

      Spatial Accuracy Specifications for  Large  Scale Topographic Maps, Dean C.
Merchant, 7: Jul 87: 958-961

      Category Variances

      Spatial Correlation   Effects Upon Accuracy  of Supervised  Classification  of
Land Cover, James B. Campbell, 3: Mar 81: 355


                                      B-2

-------
      Classification Accuracy

      Accuracy  Assessment: A  User's  Perspective, Michael  Story and Russell G.
Congalton, 3: Mar 86: 397

      Interpolation

      least squares, Kraus, 38: 487,1016

      methods, Schut, 40: 1447

      Interpolation of a Function of Many Variables, II, Arthur, 39: 261, 31: 348

      Performance  Evaluation  of Two  Bivariate Processes for  DEM Data  Using
Transfer Functions, Mohamed  Shawki  Elghazali and  Mohsen Mostafa Hassan, 8:
Aug 86:1213

      Interpolation, Height

      Experience with Height  Interpolation  by Finite  Elements,  H.  Ebner and F.
Reiss, 2: Feb 84:177

      Interpolation, Raster

      A Comparative Analysis  of Polygon to Raster Interpolation  Methods, Keith C.
Clark, 5: May 85:575

      Map Accuracy

      The Map Accuracy Report: A User's View, Stan Aronoff, 8: Aug 82: 1309

      Accuracy  Specifications  for  Large-Scale Maps, The  Committee  for
Specification and Standards, American Society of Photogrammetry, 2: Feb 85: 195

      Aerial Verification of Polygonal Resource Maps: A Low-Cost Approach  to
Accuracy Assessment, Thomas H. George, 6: Jun 86: 839

      A Comparison  of  Sampling Schemes  Used in Generating Error Matrices for
Assessing the  Accuracy of Maps Generated from  Remotely  Sensed Data, Russell G.
Congalton, 5: May 88: 593-600

      Using Spatial Autocorrelation Analysis  to  Explore  the  Errors  in   Maps
Generated from Remotely Sensed Data, Russell G. Congalton, 5: May 88: 587-592

      ASPRS Interim Accuracy Standards for Large-Scale Maps, The  American
Society  for Photogrammetry & Remote Sensing, 7: Jul 89: 1038-1040
                                      B-3

-------
      Accuracy  Assessment of a  Landsat-Assisted Vegetation  Map of  the  Coastal
Plain of the Arctic National Wildlife Refuge, Nancy A. Felix and Daryl L. Binney, 4:
Apr 89: 475-478
                                         B-4
                    *U.S. Government Printing Office: 1992-617-003/67042

-------